Back Midas Rome Roody Rootana
  Midas DAQ System, Page 115 of 136  Not logged in ELOG logo
New entries since:Wed Dec 31 16:00:00 1969
ID Date Author Topic Subjectdown
  1598   08 Jul 2019 Konstantin OlchanskiBug ReportFrontend killed at stop of run
> After a long overnight run to check that the frontend runs smoothly for a longer
> time, I stopped the run and the frontend was killed by midas.

run the frontend inside gdb and post the stack trace after the crash?

if there is no crash (the program is stopped by exit()), you may need
to set a breakpoint in exit() or _exit() (not sure what it's latest name is)
then with luck your stack trace will show who/what called it from where.

if it is hard to start the frontend inside gdb, you can start it normally,
and attach gdb later, using a "gdb frontend.exe pid" command.

K.O.



> I am not sure why
> this happened, as the end_of_run function returned successfully (at least the
> print statement right before "return SUCCESS;" appeared right away). So
> something else must have timed-out and caused it to be killed, I guess?
> 
> Any suggestions on where to look to find out what causes this?
> 
> Thanks in advance for your help!
  1601   08 Jul 2019 Vinzenz BildsteinBug ReportFrontend killed at stop of run
> run the frontend inside gdb and post the stack trace after the crash?
> 
> if there is no crash (the program is stopped by exit()), you may need
> to set a breakpoint in exit() or _exit() (not sure what it's latest name is)
> then with luck your stack trace will show who/what called it from where.
> 

If I remember correctly from the last time I tried that, it doesn't use the exit
function but gdb just reports that the program was terminated and no longer exists. I
can't set a breakpoint on SIGKILL as the point of SIGKILL is to kill the program and
gdb can't set a break at that point afaik.
  1602   08 Jul 2019 Konstantin OlchanskiBug ReportFrontend killed at stop of run
> > run the frontend inside gdb and post the stack trace after the crash?
> > 
> > if there is no crash (the program is stopped by exit()), you may need
> > to set a breakpoint in exit() or _exit() (not sure what it's latest name is)
> > then with luck your stack trace will show who/what called it from where.
> > 
> 
> If I remember correctly from the last time I tried that, it doesn't use the exit
> function but gdb just reports that the program was terminated and no longer exists. I
> can't set a breakpoint on SIGKILL as the point of SIGKILL is to kill the program and
> gdb can't set a break at that point afaik.

For SIGKILL, my gdb reports "Program terminated with signal SIGKILL, Killed." and there is no stack 
trace. Is this what you see?

If your program stops "normally", not from receiving some signal, set breakpoints on "exit" and 
"_exit".

The normal stop sequence is to call exit(), which runs all the atexit() functions (the midas atexit() 
function prints the message about "cm_disconnect_experiment not called at end of program") and 
calls _exit() to stop the program.

So if you see the midas message "cm_disconnect_experiment not called at end of program", it is a 
good indication that somebody (not mfe.c) called exit() on you. A breakpoint on "exit" should catch 
who does it.

Good luck,
K.O.
  1603   08 Jul 2019 Vinzenz BildsteinBug ReportFrontend killed at stop of run
> > > run the frontend inside gdb and post the stack trace after the crash?
> > > 
> > > if there is no crash (the program is stopped by exit()), you may need
> > > to set a breakpoint in exit() or _exit() (not sure what it's latest name is)
> > > then with luck your stack trace will show who/what called it from where.
> > > 
> > 
> > If I remember correctly from the last time I tried that, it doesn't use the exit
> > function but gdb just reports that the program was terminated and no longer exists. I
> > can't set a breakpoint on SIGKILL as the point of SIGKILL is to kill the program and
> > gdb can't set a break at that point afaik.
> 
> For SIGKILL, my gdb reports "Program terminated with signal SIGKILL, Killed." and there is no stack 
> trace. Is this what you see?

Yes, that is exactly what I remember seeing.

> 
> If your program stops "normally", not from receiving some signal, set breakpoints on "exit" and 
> "_exit".
> 
> The normal stop sequence is to call exit(), which runs all the atexit() functions (the midas atexit() 
> function prints the message about "cm_disconnect_experiment not called at end of program") and 
> calls _exit() to stop the program.
> 
> So if you see the midas message "cm_disconnect_experiment not called at end of program", it is a 
> good indication that somebody (not mfe.c) called exit() on you. A breakpoint on "exit" should catch 
> who does it.
> 
> Good luck,
> K.O.

So far I haven't seen the issue with the message "cm_disconnect_experiment not called at end of program"
again. Now I just have to restart the frontend after the run has (failed?) to stop. After restarting the
frontend everything seems to work again. 

I haven't been writing data while doing these tests, so I can't say if there is any data missing or if the
runs were actually stopped properly (with a second dump of the ODB at the end).
  1604   08 Jul 2019 Konstantin OlchanskiBug ReportFrontend killed at stop of run
> > 
> > For SIGKILL, my gdb reports "Program terminated with signal SIGKILL, Killed." and there is no stack 
> > trace. Is this what you see?
> 
> Yes, that is exactly what I remember seeing.
> 

Where would a SIGKILL come from?!?

Look in the syslog (/var/log/messages). If the program was killed by the linux kernel, it would be logged there,
the usual cause is the machine runs out of memory and programs are killed by the OOM killer, this is logged
into the syslog, always.

MIDAS also can issue a SIGKILL sometimes, again this is always logged in midas.log. see src/midas.c, search for SIGKILL to see 
the exact messages printed before it is sent out.

K.O.
  1605   08 Jul 2019 Vinzenz BildsteinBug ReportFrontend killed at stop of run
> > > 
> > > For SIGKILL, my gdb reports "Program terminated with signal SIGKILL, Killed." and there is no stack 
> > > trace. Is this what you see?
> > 
> > Yes, that is exactly what I remember seeing.
> > 
> 
> Where would a SIGKILL come from?!?
> 
> Look in the syslog (/var/log/messages). If the program was killed by the linux kernel, it would be logged there,
> the usual cause is the machine runs out of memory and programs are killed by the OOM killer, this is logged
> into the syslog, always.
> 
> MIDAS also can issue a SIGKILL sometimes, again this is always logged in midas.log. see src/midas.c, search for SIGKILL to see 
> the exact messages printed before it is sent out.
> 
> K.O.

I haven't been able to reproduce the error from the overnight run so far. I will try and leave this running in gdb overnight to see
if I can get that error again. 
  1608   10 Jul 2019 Vinzenz BildsteinBug ReportFrontend killed at stop of run
> > > > 
> > > > For SIGKILL, my gdb reports "Program terminated with signal SIGKILL, Killed." and there is no stack 
> > > > trace. Is this what you see?
> > > 
> > > Yes, that is exactly what I remember seeing.
> > > 
> > 
> > Where would a SIGKILL come from?!?
> > 
> > Look in the syslog (/var/log/messages). If the program was killed by the linux kernel, it would be logged there,
> > the usual cause is the machine runs out of memory and programs are killed by the OOM killer, this is logged
> > into the syslog, always.
> > 
> > MIDAS also can issue a SIGKILL sometimes, again this is always logged in midas.log. see src/midas.c, search for SIGKILL to see 
> > the exact messages printed before it is sent out.
> > 
> > K.O.
> 
> I haven't been able to reproduce the error from the overnight run so far. I will try and leave this running in gdb overnight to see
> if I can get that error again. 

I was able to reproduce the error after an overnight run. gdb reported that the program received a SIGKILL, but no sign of it in 
/var/log/messages. I've tried finding a current midas.log file, but it seems we don't have one? The most recent one was last updated 
on May 24th this year.
  1610   10 Jul 2019 Konstantin OlchanskiBug ReportFrontend killed at stop of run
> ... finding a current midas.log file

On the "help" page, see "midas.log".

Same information is in ODB, the midas log file name is concatenation of "/Logger/Data dir" and "message file".

K.O.
  1613   11 Jul 2019 Vinzenz BildsteinBug ReportFrontend killed at stop of run
> > ... finding a current midas.log file
>
> On the "help" page, see "midas.log".
>
> Same information is in ODB, the midas log file name is concatenation of "/Logger/Data dir" and "message file".
>
> K.O.

Sorry, should have found that myself ...

Anyway, the output from midas is
Tue Jul  9 07:24:06 2019 [mhttpd,INFO] Run #13456 started
Wed Jul 10 06:23:58 2019 [mhttpd,ERROR] [system.c:4580:ss_recv_net_command,ERROR] timeout receiving network 
command header
Wed Jul 10 06:23:58 2019 [mhttpd,ERROR] [midas.c:10322:rpc_client_call,ERROR] call to "fedescant" on 
"grsmid00.triumf.ca" RPC "rc_transition": timeout waiting for reply
Wed Jul 10 06:24:02 2019 [mhttpd,ERROR] [midas.c:5495:cm_shutdown,ERROR] Client 'fedescant' not responding to 
shutdown command
Wed Jul 10 06:24:02 2019 [mhttpd,ERROR] [midas.c:5497:cm_shutdown,ERROR] Killing and Deleting client 'fedescant' 
pid 31482
Wed Jul 10 06:24:02 2019 [Logger,INFO] Client 'fedescant' on buffer 'SYSMSG' removed by cm_watchdog because 
process pid 31482 does not exist
Wed Jul 10 06:24:02 2019 [fegrifip09,INFO] Client 'fedescant' on buffer 'SYSTEM' removed by cm_watchdog because 
process pid 31482 does not exist
Wed Jul 10 06:24:03 2019 [mhttpd,INFO] Run #13456 stopped

And I think I tracked down where this comes from with help from Thomas Lindner. It is a problem in the communication via the A3818 card from CAEN. This seems to block the frontend, even though it still reacts normal to a shutdown. So no issue with midas, even if it seemed that way at first. Thanks for all your help!
  1614   11 Jul 2019 Konstantin OlchanskiBug ReportFrontend killed at stop of run
> Wed Jul 10 06:23:58 2019 [mhttpd,ERROR] [system.c:4580:ss_recv_net_command,ERROR] timeout receiving network  command header
> Wed Jul 10 06:23:58 2019 [mhttpd,ERROR] [midas.c:10322:rpc_client_call,ERROR] call to "fedescant" on  "grsmid00.triumf.ca" RPC "rc_transition": timeout waiting for reply

We should have started debugging from here. The error messages mean: your frontend is not responding to run transition (RPC timeout).

> problem in the communication via the A3818 card from CAEN.

Yes, this has been problematic before.

K.O.
  181   14 Dec 2004 Jan WoutersForumFrontend index
What is the api call to determine the index of the frontend when specifying the
-i parameter during execution of the frontend? 
  183   15 Dec 2004 Stefan RittForumFrontend index
> What is the api call to determine the index of the frontend when specifying the
> -i parameter during execution of the frontend? 

INT get_frontend_index();

- Stefan
  1183   06 Jul 2016 Zhe WangSuggestionFrontend crush on high event rate
Dear friends,

We have some questions on using midas.
We use a Caen digitizer V1751 to take waveforms.
When testing with caen provided programs, we roughly know it can work fine at 1000 Hz event rate, and 30 M/s data can be written to disk.
The test with Midas, however, is a little confusing. We use CAENDigitizer library with Midas. First, it works, data were taken, and there seems no error.
The only problem is we cannot go to a higher event rate, for example we can only work on a rate of 40 Hz, and only 3 M/s data recording. Otherwise it will crush.

We may miss something really simple. Would you please give some suggestions? for example, other people's discussions or documents?

Thank you very much.
  1184   09 Jul 2016 Zhe WangSuggestionFrontend crush on high event rate
Dear friends,

I may add a little more information.
For polling event, we check the data-ready register for the status of the digitizer.
In the readout routine, we create a bank, readout the data and write it out.

We commented out or made some replacement for each part of the subroutines to figure our where exactly goes wrong.
for example, replace the readout from the digitizer with a random generation of some fake events.
By replacing the readout by a random generation, the program runs fine and reach a very high event rates.

Any suggestions or ideas from experts?

Thank you very much.

--
Best regards,
Zhe Wang


> Dear friends,
> 
> We have some questions on using midas.
> We use a Caen digitizer V1751 to take waveforms.
> When testing with caen provided programs, we roughly know it can work fine at 1000 Hz event rate, and 30 M/s data can be written to disk.
> The test with Midas, however, is a little confusing. We use CAENDigitizer library with Midas. First, it works, data were taken, and there seems no error.
> The only problem is we cannot go to a higher event rate, for example we can only work on a rate of 40 Hz, and only 3 M/s data recording. Otherwise it will crush.
> 
> We may miss something really simple. Would you please give some suggestions? for example, other people's discussions or documents?
> 
> Thank you very much.
  1185   10 Jul 2016 Zhe WangSuggestionFrontend crush on high event rate
Dear friends,

In case anyone need the source code, it is attached. 
We use optic fiber to connect to a VME controler, which talks to V1751 via VME bus.

--
Zhe Wang

> Dear friends,
> 
> I may add a little more information.
> For polling event, we check the data-ready register for the status of the digitizer.
> In the readout routine, we create a bank, readout the data and write it out.
> 
> We commented out or made some replacement for each part of the subroutines to figure our where exactly goes wrong.
> for example, replace the readout from the digitizer with a random generation of some fake events.
> By replacing the readout by a random generation, the program runs fine and reach a very high event rates.
> 
> Any suggestions or ideas from experts?
> 
> Thank you very much.
> 
> --
> Best regards,
> Zhe Wang
> 
> 
> > Dear friends,
> > 
> > We have some questions on using midas.
> > We use a Caen digitizer V1751 to take waveforms.
> > When testing with caen provided programs, we roughly know it can work fine at 1000 Hz event rate, and 30 M/s data can be written to disk.
> > The test with Midas, however, is a little confusing. We use CAENDigitizer library with Midas. First, it works, data were taken, and there seems no error.
> > The only problem is we cannot go to a higher event rate, for example we can only work on a rate of 40 Hz, and only 3 M/s data recording. Otherwise it will crush.
> > 
> > We may miss something really simple. Would you please give some suggestions? for example, other people's discussions or documents?
> > 
> > Thank you very much.
Attachment 1: frontend.c
/*****************************************************************\

Name:         frontend.c
Created by: 	Zhe Wang 
Date:         03/16/2015 

Modified by: Mohan Li
Date: 07/04/2016

Contents:     Experiment specific readout code (user part) of Midas frontend.
Supported VME modules:
CAEN V2718 VME-CONET Bridge
CAEN V1751 10-Bits 1-GHz Flash ADC

Experiment: Dark noise

Currently: Use CAEN_Digitizer lib. Use Ramdom number to avoid disconnection. 

$Id: $

\********************************************************************/

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/time.h>
#include <time.h>
#include "midas.h"
#include "mcstd.h"
#include "mvmestd.h"
#include "experim.h"
#include "v1751.h"
#include "v775n.h"
#include "v785n.h"
#include "v1751Infc.h"
#include "v775nInfc.h"
#include "CAENDigitizer.h"

/* make frontend functions callable from the C framework */
#ifdef __cplusplus
extern "C" {
#endif

	/*-- Globals -------------------------------------------------------*/

	/* The frontend name (client name) as seen by other MIDAS clients   */
	char *frontend_name = "Frontend";
	/* The frontend file name, don't change it */
	char *frontend_file_name = __FILE__;

	/* frontend_loop is called periodically if this variable is TRUE    */
	BOOL frontend_call_loop = FALSE;

	/* a frontend status page is displayed with this frequency in ms */
	INT display_period = 500;

	/* maximum event size produced by this frontend */
	//INT max_event_size = 10000;
	INT max_event_size = 100000; //modified according to feov1721.cxx

	/* maximum event size for fragmented events (EQ_FRAGMENTED) */
	INT max_event_size_frag = 5 * 1024 * 1024;

	/* buffer size to hold events */
	INT event_buffer_size = 200 * 100000;

#define NFADC 1
#define NMax 4
	int hFADC[NFADC];

	/* VMEBaseAddress */
	uint32_t FADCBA[NMax] = {0x000C0000,0,0,0};  // FADC base address 0x80000000

	uint32_t EvtCounterFadc[NMax];

	/* Time in second*/
	uint32_t TimeInSec;

	/* initiate variables */

	FILE* logfile;

	//CAENComm_ErrorCode sCAENc;

	int l=0, d=0, h=0, Nh;
	uint32_t i, lcount, temp, lam, reg, data[50000];
	int Nmodulo=10; //print transmission information every Nmodulo events
	int tcount=0, eloop=0;
	DWORD  eStored, eSize;
	DWORD eventReady;
	DWORD BLTNB;
	DWORD recordlength;
	uint32_t recordsize = 0x1000;
	int loop, Nloop=10;
	int bshowData=0; // 1 to enable data print
	int debug = 0;
	uint32_t pct=0, ct;
	struct timeval t1;
	int   dt1, savelcount=0;
	float trg_rate =0;
	int data_test = 0; // 1 for stored data check
	int simulation = 0;// 1 for simulation mode
	

	/*-------------CAEN Digitier vairables----------*/
	int card=0;
	CAEN_DGTZ_ErrorCode sCAEN;
	CAEN_DGTZ_BoardInfo_t BoardInfo;
	char *buffer = NULL; //pointer to the read out buffer
	int c = 0;
	uint32_t size; //buffer allocated for reading data
	uint32_t bsize;
#define INTERRUPT_TIMEOUT 20000 //20000ms = 20s
#define VME_INTERRUPT_LEVEL 1
#define VME_INTERRUPT_STATUS_ID 0xAAAA
#define IRQ_EVENT_NUMBER 1
	unsigned int counter = 0;
	unsigned int preScaler = 100;



	/*-- Function declarations -----------------------------------------*/

	INT frontend_init(); 
	INT frontend_exit();
	INT begin_of_run(INT run_number, char *error);
	INT end_of_run(INT run_number, char *error);
	INT pause_run(INT run_number, char *error);
	INT resume_run(INT run_number, char *error);
	INT frontend_loop();

	INT read_trigger_event(char *pevent, INT off);
	INT frontend_config();

	/*-- Equipment list ------------------------------------------------*/

#undef USE_INT
//#define USE_INT

	EQUIPMENT equipment[] = {

		{"Trigger",               /* equipment name */
			{1, 0,                   /* event ID, trigger mask */
				"SYSTEM",               /* event buffer */
#ifdef USE_INT
				EQ_INTERRUPT,           /* equipment type */
#else
				EQ_POLLED,              /* equipment type */
#endif
				//  LAM_SOURCE(CRATE, LAM_STATION(SLOT_ADC)), /* event source */
				LAM_SOURCE(0, 0xFFFFFF),   /* event source crate 0, all stations, by Li*/
				"MIDAS",                /* format */
				TRUE,                   /* enabled */
				RO_RUNNING |            /* read only when running */
					RO_ODB,                 /* and update ODB */
				500,                    /* poll for 500ms */
				0,                      /* stop run after this event limit */
				0,                      /* number of sub events */
				0,                      /* don't log history */
				"", "", "",},
			read_trigger_event,      /* readout routine */
		},

		{""}
	};

#ifdef __cplusplus
}
#endif

/********************************************************************\
  Callback routines for system transitions

  These routines are called whenever a system transition like start/
  stop of a run occurs. The routines are called on the following
occations:

frontend_init:  When the frontend program is started. This routine
should initialize the hardware.

frontend_exit:  When the frontend program is shut down. Can be used
to releas any locked resources like memory, commu-
nications ports etc.

begin_of_run:   When a new run is started. Clear scalers, open
rungates, etc.

end_of_run:     Called on a request to stop a run. Can send
end-of-run event and close run gates.

pause_run:      When a run is paused. Should disable trigger events.

resume_run:     When a run is resumed. Should enable trigger events.

\********************************************************************/

/*-- Frontend Init -------------------------------------------------*/
INT frontend_init()
{
	// Open FADC digitizer
	for( card=0; card<NFADC; card++ )  {
		sCAEN = CAEN_DGTZ_OpenDigitizer(CAEN_DGTZ_PCI_OpticalLink, 0, card, FADCBA[card], &hFADC[card]);
		if(sCAEN != CAEN_DGTZ_Success) {
			printf("Can't open digitizer\n");
			sCAEN = CAEN_DGTZ_CloseDigitizer(hFADC[card]);
		}else{
			printf("Open Device successfully.\n");
			frontend_config();
		}
	}
	return SUCCESS;
}

INT frontend_config()
{
	/* ------FADC configuration------ */
	for( card=0; card<NFADC; card++ )  {  

		//Print Board Info
		sCAEN = CAEN_DGTZ_GetInfo(hFADC[card], &BoardInfo);
		printf("\nConnected to CAEN Digitizer Model %s, recognized as board %d\n", BoardInfo.ModelName, card);
		printf("\tROC FPGA Release is %s\n", BoardInfo.ROC_FirmwareRel);
		printf("\tAMC FPGA Release is %s\n", BoardInfo.AMC_FirmwareRel);
		//Reset Digitizer
		sCAEN = CAEN_DGTZ_Reset(hFADC[card]);
		//Calibrate temperatire
		sCAEN = CAEN_DGTZ_Calibrate(hFADC[card]);
		//Set the lenght of each waveform (in samples)
		sCAEN = CAEN_DGTZ_SetRecordLength(hFADC[card], 1792);
		//Generate a global trigger by AND opend channels. Set trigger on channel 0 to be ACQ_ONLY
		sCAEN = CAEN_DGTZ_SetChannelSelfTrigger(hFADC[card], CAEN_DGTZ_TRGMODE_ACQ_AND_EXTOUT, 0x01);
		//Enable channel 0
		sCAEN = CAEN_DGTZ_SetChannelEnableMask(hFADC[card], 0x01); 
		//Set selfTrigger threshold 0x3a7=-4mV
		sCAEN = CAEN_DGTZ_SetChannelTriggerThreshold(hFADC[card], 0, 0x3a9);
		//Trigger under threshold
		sCAEN = CAEN_DGTZ_SetTriggerPolarity(hFADC[card], 0, CAEN_DGTZ_TriggerOnFallingEdge);
		//Post trigger
		sCAEN = CAEN_DGTZ_SetPostTriggerSize(hFADC[card], 20);
		//DC offset
		sCAEN = CAEN_DGTZ_SetChannelDCOffset(hFADC[card], 0, 0x3333);
		//Set the acquisition mode
		sCAEN = CAEN_DGTZ_SetAcquisitionMode(hFADC[card], CAEN_DGTZ_SW_CONTROLLED);
		//IO Level
		sCAEN = CAEN_DGTZ_SetIOLevel(hFADC[card], CAEN_DGTZ_IOLevel_NIM);
		//Analog Monitor
		//sCAEN = CAEN_DGTZ_SetAnalogMonOutput(hFADC[card], CAEN_DGTZ_AM_BUFFER_OCCUPANCY);
		//sCAEN = CAEN_DGTZ_ReadRegister(hFADC[card], V1751_FRONT_PANEL_IO_CONTROL, &temp);
		//printf("V1751_FRONT_PANEL_IO_CONTROL = %d\n", temp);
		sCAEN = CAEN_DGTZ_WriteRegister(hFADC[card], V1751_FRONT_PANEL_IO_CONTROL, 0x3C);
		sCAEN = CAEN_DGTZ_WriteRegister(hFADC[card], V1751_FRONT_PANEL_TRIGGER_OUT_ENABLE_MASK, 0xFF);
		sCAEN = CAEN_DGTZ_ReadRegister(hFADC[card], V1751_FRONT_PANEL_IO_CONTROL, &temp);
		printf("V1751_FRONT_PANEL_IO_CONTROL = %d\n", temp);
		//Interrupt configuration
		sCAEN = CAEN_DGTZ_SetInterruptConfig(hFADC[card], CAEN_DGTZ_ENABLE, VME_INTERRUPT_LEVEL, VME_INTERRUPT_STATUS_ID, IRQ_EVENT_NUMBER, CAEN_DGTZ_IRQ_MODE_RORA);
		//Set the max number of events to transfer in a sigle readout
		sCAEN = CAEN_DGTZ_SetMaxNumEventsBLT(hFADC[card], 3);
		//Set the behaviour when a Software tirgger arrives
		//sCAEN = CAEN_DGTZ_SetSWTriggerMode(hFADC[card], CAEN_DGTZ_TRGMODE_ACQ_ONLY);

		//---------------------------------------------------------//
		//----- Last step: Allociate memory for readout buffer-----//
		//---------------------------------------------------------//
		sCAEN = CAEN_DGTZ_MallocReadoutBuffer(hFADC[card], &buffer, &size);


		if(sCAEN != CAEN_DGTZ_Success) {
			printf("Errors during Digitizer Configuration.\n");
			sCAEN = CAEN_DGTZ_FreeReadoutBuffer(&buffer);
			sCAEN = CAEN_DGTZ_CloseDigitizer(hFADC[card]);
		}else{
			printf("Digitizer Configuration Successfully.\n");
		}
	}//end of FADC Configuration
	return SUCCESS;
}

/*-- Frontend Exit -------------------------------------------------*/

INT frontend_exit()
{
	//Stop DAQ
	for (card=0;card<NFADC;card++) {
		sCAEN = CAEN_DGTZ_SWStopAcquisition(hFADC[card]);
	}
	//Free memory
	sCAEN = CAEN_DGTZ_FreeReadoutBuffer(&buffer);
	//Close digitizer
	for (card=0;card<NFADC;card++) {
		sCAEN = CAEN_DGTZ_CloseDigitizer(hFADC[card]);
	}
	if(sCAEN == CAEN_DGTZ_Success){
		printf("FADC Modules stopped.\n");
	}else{
		printf("FADC Modules can not be stopped.\n");
	}
	return SUCCESS;
}

/*-- Begin of Run --------------------------------------------------*/
... 200 more lines ...
  1186   13 Jul 2016 Zhe WangSuggestionFrontend crush on high event rate
Somehow I don't understand why people's reply is only in my mail box.
So I pasted them here. I hope they don't mind and these information may be useful for others.

The following is some discussion.
==========================================================================================
> In read_trigger_event(), you creating a secondary bank with time in
> second. For your information, this time in second is already written in
> the event header. You can retrieve the time using macros from the
> midas.h   time = TIME_STAMP(pevent)

Removed.

>
> In frontend_init() you loop over NFADC (1) and call for each loop
> frontend_config() after opening the device on that card. In
> frontend_config() you redo a loop over NFADC, meaning that in case of
> more than one card you will find the second one not open on the first
> frontend_config (ok for one card though).
>

Corrected.

> In frontend_config() what is the return sCAEN from MallocReadoutBuffer()?
> What is the size of the requested allocated buffer?

The return size of allocated buffer is 134936.

>
> What is the value of the sCAEN from the ReadData() function in
> read_trigger_event()?

It is always 0 for success until it crashes.
However, even for the event it crashes, it also appears as 0.

>
> I didn't check all the config parameters!
>
> What is the value of count in the poll_event(). It is true if the test
> in poll_event() is too short, it cause timing corruption during
> calibration. 

Do you mean Midas timing calibration for poll_event() before all finally start up?
We havn't observed corruption at this stage.

> This never happen during CAMAC time... to be fixed!
> The alternative is to include a ss_sleep(1) instead of the prescale.
> a 1ms delay between every poll is short enough to ensure your 1KHz trigger.

We tried ss_sleep(1) in poll_event(), and it doesn't help.
We also tried add a ss_sleep(10) in the read_trigger_event().
This may work. But we can only reach 100 Hz and 1 MB/s rate. Still low.

>
> How long do you spend in the read_trigger_event()? To be measured.

We add some timers in this part of the program.
The time spent on CAEN_DGTZ_ReadData is about 100 us.
To sleep 1 ms in read_trigger_event may delay the crush, but just one minute.
To sleep 10 ms works.

>
> I still don't understand your setup as you mention using optic fiber to
> access the VME controller? do you have a A3818 or similar to the
> controller? If so why don't you connect directly the optic to the VX1751
> and prevent the use of the VME backplane?

Our connect is:
A2818 (PCI) - fiber - V2718 (Bridge) - VME - V1751
We probably need to configure other vme boards through VME at the same time,
however, these boards don't have a fiber connection.

We also tested direct fiber connect for V1751 today.
But it crashes with the same symptom.
==========================================================================================
Attachment 1: frontend.c
/*****************************************************************\

Name:         frontend.c
Created by: 	Zhe Wang 
Date:         03/16/2015 

Modified by: Mohan Li
Date: 07/04/2016

Contents:     Experiment specific readout code (user part) of Midas frontend.
Supported VME modules:
CAEN V2718 VME-CONET Bridge
CAEN V1751 10-Bits 1-GHz Flash ADC

Experiment: Dark noise

Currently: Use CAEN_Digitizer lib. Use Ramdom number to avoid disconnection. 

$Id: $

\********************************************************************/

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/time.h>
#include <time.h>
#include "midas.h"
#include "mcstd.h"
#include "mvmestd.h"
#include "experim.h"
#include "v1751.h"
#include "CAENDigitizer.h"

/* make frontend functions callable from the C framework */
#ifdef __cplusplus
extern "C" {
#endif

	/*-- Globals -------------------------------------------------------*/

	/* The frontend name (client name) as seen by other MIDAS clients   */
	char *frontend_name = "Frontend";
	/* The frontend file name, don't change it */
	char *frontend_file_name = __FILE__;

	/* frontend_loop is called periodically if this variable is TRUE    */
	BOOL frontend_call_loop = FALSE;

	/* a frontend status page is displayed with this frequency in ms */
	INT display_period = 500;

	/* maximum event size produced by this frontend */
	//INT max_event_size = 10000;
	INT max_event_size = 100000; //modified according to feov1721.cxx

	/* maximum event size for fragmented events (EQ_FRAGMENTED) */
	INT max_event_size_frag = 5 * 1024 * 1024;

	/* buffer size to hold events */
	INT event_buffer_size = 200 * 100000;

#define NFADC 1
#define NMax 4
	int hFADC[NFADC];

	/* VMEBaseAddress */
	uint32_t FADCBA[NMax] = {0x000C0000,0,0,0};  // FADC base address 0x80000000

	uint32_t EvtCounterFadc[NMax];

	/* Time in second*/
	uint32_t TimeInSec;

	/* initiate variables */

	FILE* logfile;

	//CAENComm_ErrorCode sCAENc;

	int l=0, d=0, h=0, Nh;
	uint32_t i, lcount, temp, lam, reg, data[50000];
	int Nmodulo=10; //print transmission information every Nmodulo events
	int tcount=0, eloop=0;
	DWORD  eStored, eSize;
	DWORD eventReady;
	DWORD BLTNB;
	DWORD recordlength;
	uint32_t recordsize = 0x1000;
	int loop, Nloop=10;
	int bshowData=0; // 1 to enable data print
	int debug = 0;
	uint32_t pct=0, ct;
	struct timeval t1;
	int   dt1, savelcount=0;
	float trg_rate =0;
	int data_test = 0; // 1 for stored data check
	int simulation = 0;// 1 for simulation mode
	

	/*-------------CAEN Digitier vairables----------*/
	int card=0;
	CAEN_DGTZ_ErrorCode sCAEN;
	CAEN_DGTZ_BoardInfo_t BoardInfo;
	char *buffer = NULL; //pointer to the read out buffer
	int c = 0;
	uint32_t size; //buffer allocated for reading data
	uint32_t bsize;
#define INTERRUPT_TIMEOUT 20000 //20000ms = 20s
#define VME_INTERRUPT_LEVEL 1
#define VME_INTERRUPT_STATUS_ID 0xAAAA
#define IRQ_EVENT_NUMBER 1

	/*-- Function declarations -----------------------------------------*/

	INT frontend_init(); 
	INT frontend_exit();
	INT begin_of_run(INT run_number, char *error);
	INT end_of_run(INT run_number, char *error);
	INT pause_run(INT run_number, char *error);
	INT resume_run(INT run_number, char *error);
	INT frontend_loop();

	INT read_trigger_event(char *pevent, INT off);
	INT frontend_config();

	/*-- Equipment list ------------------------------------------------*/

#undef USE_INT
//#define USE_INT

	EQUIPMENT equipment[] = {

		{"Trigger",               /* equipment name */
			{1, 0,                   /* event ID, trigger mask */
				"SYSTEM",               /* event buffer */
#ifdef USE_INT
				EQ_INTERRUPT,           /* equipment type */
#else
				EQ_POLLED,              /* equipment type */
#endif
				//  LAM_SOURCE(CRATE, LAM_STATION(SLOT_ADC)), /* event source */
				LAM_SOURCE(0, 0xFFFFFF),   /* event source crate 0, all stations, by Li*/
				"MIDAS",                /* format */
				TRUE,                   /* enabled */
				RO_RUNNING |            /* read only when running */
					RO_ODB,                 /* and update ODB */
				500,                    /* poll for 500ms */
				0,                      /* stop run after this event limit */
				0,                      /* number of sub events */
				0,                      /* don't log history */
				"", "", "",},
			read_trigger_event,      /* readout routine */
		},

		{""}
	};

#ifdef __cplusplus
}
#endif

/********************************************************************\
  Callback routines for system transitions

  These routines are called whenever a system transition like start/
  stop of a run occurs. The routines are called on the following
occations:

frontend_init:  When the frontend program is started. This routine
should initialize the hardware.

frontend_exit:  When the frontend program is shut down. Can be used
to releas any locked resources like memory, commu-
nications ports etc.

begin_of_run:   When a new run is started. Clear scalers, open
rungates, etc.

end_of_run:     Called on a request to stop a run. Can send
end-of-run event and close run gates.

pause_run:      When a run is paused. Should disable trigger events.

resume_run:     When a run is resumed. Should enable trigger events.

\********************************************************************/

/*-- Frontend Init -------------------------------------------------*/
INT frontend_init()
{
	// Open FADC digitizer
	for( card=0; card<NFADC; card++ )  {
                // through V2718
	        //sCAEN = CAEN_DGTZ_OpenDigitizer(CAEN_DGTZ_PCI_OpticalLink, 0, 0, FADCBA[card], &hFADC[card]);
	        // through fiber
	        sCAEN = CAEN_DGTZ_OpenDigitizer(CAEN_DGTZ_OpticalLink, 0, 0, 0, &hFADC[card]);
		if(sCAEN != CAEN_DGTZ_Success) {
			printf("Can't open digitizer\n");
			//sCAEN = CAEN_DGTZ_CloseDigitizer(hFADC[card]);
		}
	}

	frontend_config();
	
	return SUCCESS;
}

INT frontend_config()
{
	/* ------FADC configuration------ */
	for( card=0; card<NFADC; card++ )  {  

		//Print Board Info
		sCAEN = CAEN_DGTZ_GetInfo(hFADC[card], &BoardInfo);
		printf("\nConnected to CAEN Digitizer Model %s, recognized as board %d\n", BoardInfo.ModelName, card);
		printf("\tROC FPGA Release is %s\n", BoardInfo.ROC_FirmwareRel);
		printf("\tAMC FPGA Release is %s\n", BoardInfo.AMC_FirmwareRel);
		//Reset Digitizer
		sCAEN = CAEN_DGTZ_Reset(hFADC[card]);
		//Calibrate temperatire
		sCAEN = CAEN_DGTZ_Calibrate(hFADC[card]);
		//Set the lenght of each waveform (in samples)
		sCAEN = CAEN_DGTZ_SetRecordLength(hFADC[card], 1792);
		//Generate a global trigger by AND opend channels. Set trigger on channel 0 to be ACQ_ONLY
		sCAEN = CAEN_DGTZ_SetChannelSelfTrigger(hFADC[card], CAEN_DGTZ_TRGMODE_ACQ_AND_EXTOUT, 0x01);
		//Enable channel 0
		sCAEN = CAEN_DGTZ_SetChannelEnableMask(hFADC[card], 0x01); 
		//Set selfTrigger threshold 0x3a7=-4mV
		sCAEN = CAEN_DGTZ_SetChannelTriggerThreshold(hFADC[card], 0, 0x3a9);
		//Trigger under threshold
		sCAEN = CAEN_DGTZ_SetTriggerPolarity(hFADC[card], 0, CAEN_DGTZ_TriggerOnFallingEdge);
		//Post trigger
		sCAEN = CAEN_DGTZ_SetPostTriggerSize(hFADC[card], 20);
		//DC offset
		sCAEN = CAEN_DGTZ_SetChannelDCOffset(hFADC[card], 0, 0x3333);
		//Set the acquisition mode
		sCAEN = CAEN_DGTZ_SetAcquisitionMode(hFADC[card], CAEN_DGTZ_SW_CONTROLLED);
		//IO Level
		sCAEN = CAEN_DGTZ_SetIOLevel(hFADC[card], CAEN_DGTZ_IOLevel_NIM);
		//Set the max number of events to transfer in a sigle readout
		sCAEN = CAEN_DGTZ_SetMaxNumEventsBLT(hFADC[card], 1);
		//Set the behaviour when a Software tirgger arrives
		//sCAEN = CAEN_DGTZ_SetSWTriggerMode(hFADC[card], CAEN_DGTZ_TRGMODE_ACQ_ONLY);

		//---------------------------------------------------------//
		//----- Last step: Allociate memory for readout buffer-----//
		//---------------------------------------------------------//
		sCAEN = CAEN_DGTZ_MallocReadoutBuffer(hFADC[card], &buffer, &size);
		printf("MallocReadoutBuffer returned with status %d and size %d.\n", sCAEN, size);

		if(sCAEN != CAEN_DGTZ_Success) {
			printf("Errors during Digitizer Configuration.\n");
			sCAEN = CAEN_DGTZ_FreeReadoutBuffer(&buffer);
			sCAEN = CAEN_DGTZ_CloseDigitizer(hFADC[card]);
		}else{
			printf("Digitizer Configuration Successfully.\n");
		}
	}//end of FADC Configuration
	return SUCCESS;
}

/*-- Frontend Exit -------------------------------------------------*/

INT frontend_exit()
{
	//Stop DAQ
	for (card=0;card<NFADC;card++) {
		sCAEN = CAEN_DGTZ_SWStopAcquisition(hFADC[card]);
	}
	//Free memory
	sCAEN = CAEN_DGTZ_FreeReadoutBuffer(&buffer);
	//Close digitizer
	for (card=0;card<NFADC;card++) {
		sCAEN = CAEN_DGTZ_CloseDigitizer(hFADC[card]);
	}
	if(sCAEN == CAEN_DGTZ_Success){
		printf("FADC Modules stopped.\n");
	}else{
		printf("FADC Modules can not be stopped.\n");
	}
	return SUCCESS;
}

/*-- Begin of Run --------------------------------------------------*/

INT begin_of_run(INT run_number, char *error)
{
	//Create log file
	logfile = fopen("log.txt","w");
	//Start FADC
	for (card=0;card<NFADC;card++) {
		sCAEN = CAEN_DGTZ_ClearData(hFADC[card]);
		sCAEN = CAEN_DGTZ_SWStartAcquisition(hFADC[card]);
	}  
	printf("begin of run.\n");
	return SUCCESS;
}

/*-- End of Run ----------------------------------------------------*/
... 194 more lines ...
  1187   13 Jul 2016 Zhe WangSuggestionFrontend crush on high event rate
Suggestion from John and my reply.

> We have achieved very high rates, but only with some care.

> The biggest issue was to make sure when you compile the CAEN driver for the A3818 board that you turn on the MIDAS switch.  Without that problems occur with some 
> probability given by the number of bytes processed - which translates into very soon if you have a high rate.  (The underlying cause is that both MIDAS and the A3818
> use unix Alarm signals, but the CAEN folks have a compile option to turn this off.)

> We use as little as possible of the CAENDigitizerLibrary - instead we program the registers directly on the board.

> There is still some kind of memory leak which we have not yet tracked down, so every few hours we shut down the frontend then restart it. 

We use A2818 (PCI) - fiber - V2718 (Bridge) - VME - V1751.
I actually didn't find a MIDAS switch in the Makefile.
  1188   13 Jul 2016 Zhe WangSuggestionFrontend crush on high event rate

More suggestions from John and my reply.

> we also don't use the VME back plane - it's just too slow - mixing VME commands to plain modules and digitizer modules is unreliable....

> We use CAEN fiberoptic version 2 to talk to the digitizers directly, we have upto 12 digitizers, and can use all channels for several hours, and can fill to about 75% 
of the A3818 bandwidth... 

So far we are limitted to 30 MB/s, if tested with CAEN examples, for example, the wavedump program by CAEN.
I think is kind of the limit by IDE hard drive.
Unfortunately we are still far from that limit, only ~ 1 MB/s now.  :(
  1207   30 Sep 2016 Konstantin OlchanskiSuggestionFrontend crush on high event rate
> 
> More suggestions from John and my reply.
> 
> > we also don't use the VME back plane - it's just too slow - mixing VME commands to plain modules and digitizer modules is unreliable....
> 
> > We use CAEN fiberoptic version 2 to talk to the digitizers directly, we have upto 12 digitizers, and can use all channels for several hours, and can fill to about 75% 
> of the A3818 bandwidth... 
> 
> So far we are limitted to 30 MB/s, if tested with CAEN examples, for example, the wavedump program by CAEN.
> I think is kind of the limit by IDE hard drive.
> Unfortunately we are still far from that limit, only ~ 1 MB/s now.  :(
>

From writing MIDAS frontends for many years, I am starting to form an opinion that this type of problem is undebuggable
in the current midas frontend framework - it is impossible to separate problems in vendor-supplied libraries and linux kernel modules
from problems with midas (i.e. incorrectly created data banks, too-small event buffers getting full) from problems with
bad interaction (collision over the SIGALARM handlers).

I am pondering on a new scheme for midas frontend writing. Perhaps such a new scheme should have a "no midas" mode where you can
compile and link a midas frontend "without midas", leaving you to debug just your code and the vendor code and their interactions.

K.O.
  172   04 Nov 2004 Jan WoutersForumFrontend code and the ODB
I would like to know whether all parameters used by the frontend code have to be in the "Experiment/
Run Parameters" section.  This section can become big and difficult to maintain, because it is one single 
big section of experim.h (EXP_PARAM_DEFINED).  I have parameters the various frontends read at the 
beginning of each run, which set the hardware settings of various devices.  I would like to place these in 
a section all their own, organized by device.  Is this doable? 
ELOG V3.1.4-2e1708b5