Back Midas Rome Roody Rootana
  Midas DAQ System, Page 79 of 142  Not logged in ELOG logo
    Reply  26 Apr 2017, Francesco Renga, Forum, Problem with logger at run start 
Dear Stefan,
           thank you very much for your reply. We could finally fix the problem by replacing "scarlett" with "scarlett.localdomain" in our
hostname configuration file /etc/hostname (under debian).

Best Regards,
        Francesco

> Dear Francesco,
> 
> Your error (No route to host) typically means that you have a network problem outside of MIDAS. Your computer has to "find itself" and 
> this is probably broken. Try to do a "ping scarlett" or "nslookup scarlett" and you will see that the DNS server can't be reached or is 
> wrongly configured. Sometimes it helps to put scarlett explicitly into /etc/hosts
> 
> Stefan
> 
> 
> > Dear experts,
> >     we have a problem when trying to run a MIDAS DAQ which worked in the past on the same PC (but on a different
> > network). We get the following error messages when starting a new run:
> > 
> > Wed Apr 26 23:03:12 2017 [mhttpd,ERROR] [midas.c:9106:rpc_client_connect,ERROR] cannot connect to host "scar
> > lett", port 44858: connect() returned -1, errno 113 (No route to host)
> > Wed Apr 26 23:03:12 2017 [mhttpd,ERROR] [midas.c:3539:cm_transition_call,ERROR] cannot connect to client "Lo
> > gger" on host scarlett, port 44858, status 503
> > 
> > (scarlett is indeed the hostname of the PC). The error occurs even if the PC is disconnected from the network.
> > 
> > Any suggestion?
> > 
> > Best Regards,
> >         Francesco
    Reply  02 May 2017, Konstantin Olchanski, Forum, Problem with logger at run start 
> Wed Apr 26 23:03:12 2017 [mhttpd,ERROR] [midas.c:9106:rpc_client_connect,ERROR] cannot connect to host "scar
> lett", port 44858: connect() returned -1, errno 113 (No route to host)

Forgot to reply to this: if you read the error messages, you will see the actual problem is "no route to host". Next step
is to ping the same hostname or try "telnet hostname 22" (cut-and-paste the hostname from the error message
to avoid the common pitfall of not seeing a typo, i.e. ping host00 works while midas connect to hostOO does not (zero vs capital-o)).
In your case you had the wrong hostname ("foo" and "foo.localdomain" resolve to different IP addresses, one works the other
one does not). You can also try to use the IP address instead of hostname, this will avoid hostname resolution problems
(inconsistency between /etc/hosts and hostnames in DNS is very easy to have when using self-made private networks).

K.O.
Entry  14 Oct 2014, Konstantin Olchanski, Bug Report, Problem with EQ_USER 
If you use EQ_USER in mfe.c and have multiple threads writing into the ring buffer, you will have a big 
problem - the thread locking in the ring buffer code only works for a single writer thread and a single 
reader thread.

Presently, it is not clear how to have multiple multithreaded equipments inside one frontend.

During the Summer of 2013 code briefly existed in mfe.c to have an array of ring buffers and each 
multithreaded equipment could write into it's own buffer.

But this code is now removed and mfe.c can only read from a single ring buffer and as I noted above, ring 
buffer locking requires that only a single thread writes into it.
K.O.
    Reply  15 Oct 2014, Stefan Ritt, Bug Report, Problem with EQ_USER 
Sure, each thread needs its own ring buffer for writing.

So I see that we need back the multiple-ring-buffer-readout-scheme even before MEG will start. So what you need is something like

for (i=0 ; rb[i] != 0 ; i++) {
  read event from rb[i];
}

as it was before. What I do not like is that rb is a global variable, we should better use the encapsulation functions and extend get_event_rb() to 
get_event_rb(i) so you can have n ring buffers.

Give me one day, I will extend the current code to make it work again and to implement N threads.

Cheers,
Stefan
    Reply  16 Oct 2014, Stefan Ritt, Bug Report, Problem with EQ_USER 
I restructured the front-end code to enable multiple readout threads for EQ_USER equipment. Last summer I was definitively interrupted during 
that work and left it in an half finished state, sorry for that.

The way it works now is illustrated in mtfe.c. You create N ring buffers and N threads via

   for (int i=0 ; i<N ; i++) {
      create_event_rb(i);
      ss_thread_create(trigger_thread, (void*)(PTYPE)i);
   }

then each readout thread accesses its own readout buffer

thread(...)
{
   index = (int)(PTYPE)param;
   signal_readout_thread_active(index, TRUE);
   rbh = get_event_rbh(index);
   
   while (is_readout_thread_enabled()) {
      ... read event and put it into ring buffer ...
   }

   signal_readout_thread_active(index, FALSE);
}

The is_readout_thread_enabled() and signal_readout_thread_active() are used by the framework to shut down gracefully threads correct at the end 
of the program. This way each thread can close any hardware correctly.

Note that no other thread management is done by the framework. In the old days with interrupt equipment, the framework disabled interrupts 
when reading out periodic events, since that was necessary when using a single CAMAC crate for ADCs and scalers. This is obsolete now and not 
needed any longer. It is now the responsibility of the user code to resolve hardware access conflicts between different threads (like using a local 
mutex to access the same hardware). There is also no "readout when running" handling. If events should not be read out when the run is stopped, 
the readout thread has to check to run status, or better the EOR routine should disable the hardware trigger and the BOR routine should re-enable 
it. The readout threads will then poll for new events and just go to sleep if nothing is there.

I testes the mtfe.c program with 100 Hz and 1 MHz event rate on a dummy experiment (no hardware access) and it worked without problem.

Let me know if there is any issue left over.

/Stefan 
Entry  10 Aug 2012, Carl Blaksley, Forum, Problem with CAMAC controlled by CES8210 and read out by CAEN V1718 VME controller 
Hello all,

I am trying to put together a system to read out several camac adc. The camac is
read by a ces8210 camac to vme controller. The vme is then interfaced to a
computer through a CAEN v1718 usb control module. As anyone gotten the latter to
work?

Previous users seemed to indicate that they had here:

https://ladd00.triumf.ca/elog/Midas/493

but I am having problems to get this example frontend to compile. What is set as
the driver in the makefile for example? If I put v1718 there then I recieve
numerous errors from the CAENVMElib files. 

If someone else has gotten the V1718 running, I would be grateful for their
insight. 

Thanks, 
-Carl
Entry  22 Feb 2016, ZiyiGuo, Forum, Problem with BLTRead 
Dear all,

I'm using MIDAS system and CAEN V1721 to digitize the waveform from photomultipliers ( 
and the link bridge to PC is V2718 ). I use BLTRead to read data of the digitizer, but 
I found that if the event counting rate is high ( about 100KB/s ), the communication 
of V1721 and PC would be suspended randomly, and I get an error code of -2. Could you 
give me some suggestion? Thanks a lot.
    Reply  23 Feb 2016, Pierre-Andre Amaudruz, Forum, Problem with BLTRead 
> Dear all,
> 
> I'm using MIDAS system and CAEN V1721 to digitize the waveform from photomultipliers ( 
> and the link bridge to PC is V2718 ). I use BLTRead to read data of the digitizer, but 
> I found that if the event counting rate is high ( about 100KB/s ), the communication 
> of V1721 and PC would be suspended randomly, and I get an error code of -2. Could you 
> give me some suggestion? Thanks a lot.

Hi, 

Can you provide the BLTread call fragment code and the PC /var/log/messages at the time of 
the hang up.
What is needed to restart the daq?

PAA
    Reply  02 Mar 2016, ZiyiGuo, Forum, Problem with BLTRead 
> > Dear all,
> >
> > I'm using MIDAS system and CAEN V1721 to digitize the waveform from photomultipliers (
> > and the link bridge to PC is V2718 ). I use BLTRead to read data of the digitizer, but
> > I found that if the event counting rate is high ( about 100KB/s ), the communication
> > of V1721 and PC would be suspended randomly, and I get an error code of -2. Could you
> > give me some suggestion? Thanks a lot.
>
> Hi,
>
> Can you provide the BLTread call fragment code and the PC /var/log/messages at the time of
> the hang up.
> What is needed to restart the daq?
>
> PAA

Hi Pierre-Andre,

Sorry for my late reply, because the data acquisition system now is running other experiment.
Here is my code. Is there something wrong? Thanks!




/* Read FADC data */
int NByteOfOneEvent = HeadSize + SampSize * NChannel;
int NDWordOfOneEvent = NByteOfOneEvent/4;


/* 1. Create FADC bank. One bank for one branch of a tree or one array branch with length. */
bk_create(pevent, "FADC", TID_DWORD, (void**)&pdata);

uint32_t size_remaining_dwords;
int dwords_read;

/* 2. Read out the event and assign them to pdata (bank buffer) */
//read size of event to be read
sCAEN = CAENComm_Read32(hFADC[card], V1721_EVENT_SIZE, &size_remaining_dwords);

if( size_remaining_dwords < NDWordOfOneEvent ) {
printf("\r\nSize of available data is less than the required size of one event.\r\n");
}

/* Read */
DWORD *pFadcData;
sCAEN = CAENComm_BLTRead(hFADC[card], V1721_EVENT_READOUT_BUFFER, pdata, NDWordOfOneEvent, &dwords_read);

// These code in "if" is for restart communication and save the time information if the communication was suspended

if(sCAEN != 0)
{
//printf("sCAEN =%d \n", sCAEN);
time_t t = time(0);
char tmp[64];
strftime(tmp,sizeof(tmp),"%Y/%m/%d %X %Z",localtime(&t));
fprintf(logfile,tmp);
fprintf(logfile,"\n Here met communication error \n");
printf(" Here met communication error \n");

//re-establish communication
sCAEN = CAENComm_CloseDevice(hFADC[card]);
fprintf(logfile,"sCAEN =%d, device closed **********\n", sCAEN);

ss_sleep(2000);

sCAEN = CAENComm_OpenDevice(CAENComm_PCIE_OpticalLink, l, d, FADCBA[card], &(hFADC[card]));

if (sCAEN == CAENComm_Success) {
fprintf(logfile,"re-establish communication, handle:%d, sCAEN=%d \n", hFADC[card], sCAEN);
}
else {
sCAEN = CAENComm_OpenDevice(CAENComm_PCIE_OpticalLink, l, d, FADCBA[card], &(hFADC[card]));
fprintf(logfile,"try open device again sCAEN= %d\n", sCAEN); }

//pause ongoing reading process
sCAEN = ov1721_AcqCtl(hFADC[card], V1721_RUN_STOP);
sCAEN = CAENComm_Read32(hFADC[card], V1721_EVENT_STORED, &eStored);

//discard FADC buffer
sCAEN = CAENComm_Write32(hFADC[card], V1721_SW_CLEAR, 0);
fprintf(logfile," number of %d events discarded \n\n", eStored);
sCAEN = ov1721_AcqCtl(hFADC[card], V1721_RUN_START);
}

//dwords_read: Number of the words that actually read from the device.
if( dwords_read != NDWordOfOneEvent ) {
printf("\r\nSize of data read out doesn't equal to the required size of one event. \r\n");
}

EvtCounterFadc[card] = *(pdata+2) & 0x00ffffff;

/* 3. Update bank pointer position */
pdata += dwords_read;

/* 4. Finish one bank */
bk_close(pevent, pdata);
Entry  26 Jun 2019, Hassan, Forum, Problem transferring fetest data from the remote frontend to the backend 
Hi again, we now have Midas installed on the Rpi (remote frontend machine) and
have managed to run Fetest on it. Now we are at a stage where we want to send
the Fetest data over to the Data Acquisition machine, which also has Midas
installed. We want this data to be read into the Webserver Status page. We have
tried commands such as: (but Fetest then doesn't run)

./fetest -h DAQ-system-ip-address
./fetest -e sampleexpt -h DAQ-system-Ip-address
./fetest -e sampleexpt -h DAQ-system-Ip-address-with-webserver-port

our experiment name is sampleexpt on Rpi and DAQ machine in their respective
exptab files. Maybe the Rpi is getting confused as to whether it should be
running the experiment on Rpi or the DAQ. We need it to run on the DAQ.

Does the mserver have any role in this?

Thanks you for your kind help (we summer interns are really stuck!)
    Reply  26 Jun 2019, Konstantin Olchanski, Forum, Problem transferring fetest data from the remote frontend to the backend 
> Hi again, we now have Midas installed on the Rpi (remote frontend machine) and
> have managed to run Fetest on it. Now we are at a stage where we want to send
> the Fetest data over to the Data Acquisition machine ...
> 
> Does the mserver have any role in this?
> 

Yes. mserver runs on your daq machine and handles connections from frontends running on frontend machines. It needs to be configured 
correctly before it will work: in odb on your daq machine, non-local rpc has to be enabled and the frontend machine has to be added to the 
midas rpc access control list.

Read this:
https://midas.triumf.ca/MidasWiki/index.php/Quickstart_Linux#Running_with_one_or_more_REMOTE_frontends

And this:
https://midas.triumf.ca/MidasWiki/index.php/Security#MIDAS_programs_on_remote_machines

K.O.
Entry  12 Mar 2019, Francesco Renga, Forum, Problem stopping every second run 
Dear all,
         I'm running a DAQ frontend and it works well if one single run is
taken. If I try to take a second run right after, the run is performed but, when
stopping it, I get the error messages below. Any hint?

Thank you for your help,
      Francesco


11:42:24.012 2019/03/12 [mhttpd,ERROR] [midas.c:6022:cm_shutdown,ERROR] Killing
and Deleting client 'cygnus_daq' pid 12472

11:42:24.012 2019/03/12 [mhttpd,ERROR] [midas.c:6019:cm_shutdown,ERROR] Cannot
connect to client 'cygnus_daq' on host 'localhost', port 46341

11:42:24.012 2019/03/12 [mhttpd,ERROR] [midas.c:9539:rpc_client_connect,ERROR]
cannot connect to host "localhost", port 46341: connect() returned -1, errno 111
(Connection refused)

11:42:24.012 2019/03/12 [mhttpd,ERROR] [midas.c:10821:rpc_client_call,ERROR]
call to "cygnus_daq" on "localhost" RPC "rc_transition": error,
ss_recv_net_command() status 411

11:42:24.012 2019/03/12 [mhttpd,ERROR] [system.c:4715:ss_recv_net_command,ERROR]
error receiving network command header, see messages

11:42:24.011 2019/03/12 [mhttpd,ERROR] [system.c:4661:recv_tcp2,ERROR]
unexpected connection closure

11:42:23.994 2019/03/12 [cygnus_daq,ERROR] [midas.c:1951:,ERROR]
cm_disconnect_experiment not called at end of program
    Reply  13 Mar 2019, Konstantin Olchanski, Forum, Problem stopping every second run 
>          I'm running a DAQ frontend and it works well if one single run is
> taken. If I try to take a second run right after, the run is performed but, when
> stopping it, I get the error messages below. Any hint?

Sure. I will read the error messages for you: note that they come in reverse time order - oldest message is at the bottom. I 
will reverse them in my reading:

> 11:42:23.994 2019/03/12 [cygnus_daq,ERROR] [midas.c:1951:,ERROR]
> cm_disconnect_experiment not called at end of program

your frontend has exited (this error message is printed by the atexit() code, so you did not crash but called exit(), 
somehow).

> 11:42:24.011 2019/03/12 [mhttpd,ERROR] [system.c:4661:recv_tcp2,ERROR]
> unexpected connection closure

mhttpd reports that the socket connection to your frontend has closed (because the frontend stopped, closing all it's 
sockets)

> 11:42:24.012 2019/03/12 [mhttpd,ERROR] [system.c:4715:ss_recv_net_command,ERROR]
> error receiving network command header, see messages

mhttpd network code next layer reports this same error again

> 11:42:24.012 2019/03/12 [mhttpd,ERROR] [midas.c:10821:rpc_client_call,ERROR]
> call to "cygnus_daq" on "localhost" RPC "rc_transition": error,
> ss_recv_net_command() status 411

mhttpd network code next layer reports this same error again, now we see that it was trying
to execute a "run start" (or "stop") RPC call to your frontend. But your frontend unexpectedly
shutdown (instead of replying to the RPC).

Next messages after that is mhttpd decided that your frontend is faulty (does not respond to RPC correctly) and tried to 
shut it down, but failed (cannot connect, etc). Last message is mhttpd cleaning up your frontend from ODB (because the 
frontend did not cleanup after itself - it did not call cm_disconnect_experiment(), per the very first error message).


So this is what we see from the midas messages - your frontend unexpectedly exited during the run transition - if as you 
say the run was stopping at the time, it would be in your end_of_run() function.

To debug this, I would do:

a) put some printf() statements in end_of_run() and see what they say during the crash
b) run the frontend inside the debugger, you may need to set a breakpoint on exit() or something like that.


Good luck,
K.O.





> 
> Thank you for your help,
>       Francesco
> 
> 
> 11:42:24.012 2019/03/12 [mhttpd,ERROR] [midas.c:6022:cm_shutdown,ERROR] Killing
> and Deleting client 'cygnus_daq' pid 12472
> 
> 11:42:24.012 2019/03/12 [mhttpd,ERROR] [midas.c:6019:cm_shutdown,ERROR] Cannot
> connect to client 'cygnus_daq' on host 'localhost', port 46341
> 
> 11:42:24.012 2019/03/12 [mhttpd,ERROR] [midas.c:9539:rpc_client_connect,ERROR]
> cannot connect to host "localhost", port 46341: connect() returned -1, errno 111
> (Connection refused)
> 
> 11:42:24.012 2019/03/12 [mhttpd,ERROR] [midas.c:10821:rpc_client_call,ERROR]
> call to "cygnus_daq" on "localhost" RPC "rc_transition": error,
> ss_recv_net_command() status 411
> 
> 11:42:24.012 2019/03/12 [mhttpd,ERROR] [system.c:4715:ss_recv_net_command,ERROR]
> error receiving network command header, see messages
> 
> 11:42:24.011 2019/03/12 [mhttpd,ERROR] [system.c:4661:recv_tcp2,ERROR]
> unexpected connection closure
> 
> 11:42:23.994 2019/03/12 [cygnus_daq,ERROR] [midas.c:1951:,ERROR]
> cm_disconnect_experiment not called at end of program
    Reply  02 Feb 2007, Exaos Lee, Bug Fix, Problem solved by Re-define _syscall0(...) 
OK, I searched and found that my kernel doesn't support "_syscall0" any more. So I patched the system.c as the following (from line 954):

#if defined(OS_DARWIN)
// blank
#elif defined(OS_LINUX)

#include <sys/syscall.h>
#include <unistd.h>
#undef _syscall0
#define _syscall0(type, name) \
  type name(void) \
  {\
    return syscall(__NR_##name); \
  }

_syscall0(pid_t,gettid)
#endif


My kernel version:
exaos@memes midas>$ uname -a
Linux memes 2.6.17-10-generic #2 SMP Tue Dec 5 22:28:26 UTC 2006 i686 GNU/Linux

Maybe it's not the perfect way, but it works. Smile
    Reply  06 Feb 2007, Stefan Ritt, Bug Fix, Problem solved by Re-define _syscall0(...) 

Exaos Lee wrote:
Maybe it's not the perfect way, but it works. Smile


I changed it to:
#ifdef OS_UNIX

   return syscall(SYS_gettid);

#endif                          /* OS_UNIX */
[/code1]

without any #define.

Does this work for you?

- Stefan
Entry  14 Oct 2014, Konstantin Olchanski, Bug Report, Problem in mfe multithread equipments 
In the ALPHA experiment at CERN I found a problem in mfe.c handling of multithreaded equipments. This problem was in 
some forms introduced around May 2013 and around Aug 2013 (commit 
https://bitbucket.org/tmidas/midas/src/45984c35b4f7/src/mfe.c) (I hope I got it right).

The effect was very odd - if event rate of multithreaded equipment was more than 100 Hz, the event counters on the midas 
status page would not increment and the frontend will crash on end of run. Other than that, all the events from the 
multithreaded equipment seem to appear in the SYSTEM buffer and in the data file normally.

This happened: in mfe.c::receive_trigger_event() a loop was introduced (previously,
there was no loop there - there was and still is a loop outside of receive_trigger_event()):

while (1)
   wait 10 ms for an event
   process event, loop back
   if there is no event, exit
}

Obviously, if the event rate is more than 100 Hz (repetition rate less than 10 ms),
the 10 ms wait will always return an event and we will never exit this loop.

So the mfe.c main loop is now stuck here and will not process any periodic activity
such as updating the equipment statistics (event counters on the midas status page)
or running periodic equipments in the same front end program.

The crash at the end of run will be caused by a timeout in responding to the "end of run" RPC call.

I have a patch in testing that solves this problem by restoring receive_trigger_event() to the original configuration, i.e. 
https://bitbucket.org/tmidas/midas/src/6899b96a4f8177d4af92035cd84aadf5a7cbc875/src/mfe.c?at=develop

K.O.
    Reply  14 Oct 2014, Konstantin Olchanski, Bug Report, Problem in mfe multithread equipments 
For my reference:
good version: https://bitbucket.org/tmidas/midas/src/6899b96a4f8177d4af92035cd84aadf5a7cbc875/src/mfe.c?at=develop
first breakage: https://bitbucket.org/tmidas/midas/src/c60259d9a244bdcd296a8c5c6ab0b91de27f9905/src/mfe.c?at=develop
second breakage: https://bitbucket.org/tmidas/midas/src/45984c35b4f7257f90515f29116dec6fb46f2ebc/src/mfe.c?at=develop

The "first breakage" may actually be okey, because there the badnik loop loops over ring buffers, not infinite. But I cannot test it anymore.
K.O.
    Reply  15 Oct 2014, Stefan Ritt, Bug Report, Problem in mfe multithread equipments 
You are absolutely correct, the code is certainly wrong. It looks to me like the 

while (rbh)

was put in there for some testing, and I forgot to remove it. The only thing I could imagine is that we want to have a while loop there for performance reason. Like

readout_start = ss_millitime();
while (ss_millitime - readout_start < (DWORD) eq_info->period) {
  read event
  return 0 if no event found
}

You find this code also in the check_polled_events() routine. It ensures that the routine does not return after every single event, but after the period defined in the 
equipment (which is usually 100 ms for polled events). This way the code is more efficiently, since we do not check for RPC calls between every event, but just 10 times 
per second. This way you can shovel more events through the system, while still being responsive to run stops.

I don't have any hardware right now to test this, so please put my code above into the routine and commit it if it works.

I notice also a difference in both codes concerning the read buffer handles. The old code uses rbh2, while the new (wrong) code uses rbh. In your case probably both 
handles are the same, so it works, but in other experiments, which might use several ring buffers, it will fail. So please use rbh instead rbh2.

Let me know if it works for you, and if you see any difference in speed between the versions with and without the while loop (actually you will see this only if your trigger 
rate maxes out the DAQ).

Cheers,
Stefan
    Reply  15 Oct 2014, Stefan Ritt, Bug Report, Problem in mfe multithread equipments 
Please disregard my previous posting, you don't need the while loop, since it's already in the scheduler (around lines 2160 under /*---- send interrupt events ----*/). 

But now I remember the rationale behind it. The loop over the rb[i] is because in MEG I have n calibration threads, each one running on a separate CPU core. So the receive_trigger_event() routine has to collect events from all the 
threads, each of them having one ring buffer. In the process of implementing EQ_USER, I changed this somehow, and apparently broke the code by making the while() loop looping forever if the event rate is over 100 Hz.

So for the moment please remove the while loop completely, and I will worry later of putting it back correctly when MEG will start again next year.

/Stefan
    Reply  16 Oct 2014, Stefan Ritt, Bug Report, Problem in mfe multithread equipments 
> while (1)
>    wait 10 ms for an event
>    process event, loop back
>    if there is no event, exit
> }


This code has been rewritten now and should work for event rates >100 Hz.

/Stefan
ELOG V3.1.4-2e1708b5