Back Midas Rome Roody Rootana
  Midas DAQ System  Not logged in ELOG logo
Entry  16 Mar 2019, Gennaro Tortone, Forum, assertion failed 
    Reply  18 Mar 2019, Konstantin Olchanski, Forum, assertion failed 
       Reply  19 Mar 2019, Gennaro Tortone, Forum, assertion failed 
          Reply  28 Mar 2019, Konstantin Olchanski, Forum, assertion failed 
Message ID: 1509     Entry time: 28 Mar 2019     In reply to: 1502
Author: Konstantin Olchanski 
Topic: Forum 
Subject: assertion failed 
For the record, I am stumped by this problem. We have definitely ruled out any data overflow inside the midas message code (there are no 
long messages sent). My only guess is that the frontend itself is corrupting the midas message buffer, but this corruption
must be unlikely lucky to corrupt just the "_" character (and maybe what follows it) from the "MSG_" header inside the buffer.

If indeed this is memory corruption inside the frontend, to find and fix it, one would have to roll out valgrind and other malloc() debugging 
tools and good luck...

K.O.



> > > [dfe01,INFO] Slow control equipment initialized
> > > dfe: src/midas.c:838: cm_msg_flush_buffer: Assertion `rp[3]=='_'' failed.
> > > if I remove line 838 from midas.c (fixing message length) the problem disappear...
> > 
> > Thank you for reporting this problem.
> > 
> > It is very strange, the check is for message start "MSG_", why "M", "S" and "G" are there
> > but "_" is missing? And you remove the check for "_" and the rest of the message is also okey?
> > Very odd.
> 
> if I remove the check for "_" then the first message is empty and next messages are ok...
> If I don't remove the check the frontend fails at start and I find these lines in midas.log:
> 
> 14:46:29.719 2019/03/19 [dfe01,INFO] Program dfe01 on host lxaria02 started
> 14:46:29.731 2019/03/19 [dfe01,INFO] Dome FE initialized
> 14:46:29.737 2019/03/19 [dfe01,ERROR] [system.c:4709:recv_tcp2,ERROR] unexpected connection closure
> 14:46:29.737 2019/03/19 [dfe01,ERROR] [midas.c:12814:recv_event_server,ERROR] recv_tcp2(header) returned -1
> 14:46:29.737 2019/03/19 [dfe01,ERROR] [midas.c:14699:rpc_server_receive,ERROR] recv_event_server() returned -1, abort
> 14:46:29.737 2019/03/19 [dfe01,TALK] Program 'dfe01' on host 'lxaria02' aborted
>  
> > You can also add this code "assert(4+3*sizeof(int)+len < 1020)" in cm_msg_buffer() right before
> > rb_increment_wp() - it this assert fails, we definitely determine that we have a buffer overflow.
> 
> I added assert you suggested in cm_msg_buffer() function before rp_increment_wp() and 
> result is always the same at same line:
> 
> [dfe01,INFO] Dome FE initialized
> Dome0001-rc:
> [dfe01,INFO] Slow control equipment initialized
> dfe: src/midas.c:839: cm_msg_flush_buffer: Assertion `rp[3]=='_'' failed.
> Aborted
> 
> Regards,
> Gennaro
ELOG V3.1.4-2e1708b5