Back Midas Rome Roody Rootana
  Midas DAQ System, Page 65 of 143  Not logged in ELOG logo
ID Date Authordown Topic Subject
  2298   29 Oct 2021 Kushal KapoorBug ReportUnknown Error 319 from client
I’m trying to run MIDAS using a frontend code/client named “fetiglab”. Run stops 
after 2/3sec with an error saying “Unknown error 319 from client “fetiglab” on 
localhost.

Frontend code compiled without any errors and MIDAS reads the frontend 
successfully, this only comes when I start the new run on MIDAS, here are a few 
more details from the terminal-

11:46:32 [fetiglab,ERROR] [odb.cxx:11268:db_get_record,ERROR] struct size 
mismatch for "/" (expected size: 1, size in ODB: 41920)

11:46:32 [Logger,INFO] Deleting previous file 
"/home/rcmp/online3/run00621_000.root"

11:46:32 [ODBEdit,ERROR] [midas.cxx:5073:cm_transition,ERROR] transition START 
aborted: client "fetiglab" returned status 319

11:46:32 [ODBEdit,ERROR] [midas.cxx:5246:cm_transition,ERROR] Could not start a 
run: cm_transition() status 319, message 'Unknown error 319 from client 
'fetiglab' on host "localhost"'

TR_STARTABORT transition: cleanup after failure to start a run

‌

I’ve also enclosed a screenshot for the same, any suggestions would be highly 
appreciated. thanks
Attachment 1: Screenshot_2021-10-26_114015.png
Screenshot_2021-10-26_114015.png
  2519   23 May 2023 Kou OishiBug ReportEvent builder fails at every 10 runs
Dear MIDAS experts,

Greetings! 
I am currently utilizing MIDAS for our experiment and I have encountered an issue with our event builder, which was developed based on the example code 'eventbuilder/mevb.cxx'. I'm uncertain whether this is a genuine bug or an inherent feature of MIDAS.

The event builder fails to initiate the 10th run since its startup, requiring us to relaunch it. Upon investigating the code, I have identified that this issue stems from line 8404 of mfe.cxx (the version's hash is db94df6fa79772c49888da9374e143067a1fff3a). According to the code, the 10-run limit is imposed by the variable MAX_EVENT_REQUESTS in midas.h. While I can increase this value as the code suggests, it does not provide a complete solution, as the same problem will inevitably resurface. This complication unnecessarily hampers our data collection during long observation periods.

Despite the code indicating 'BM_NO_MEMORY: too many requests,' this explanation does not seem logical to me. In fact, other standard frontends do not encounter this problem and can start new runs as required without requiring a frontend relaunch.

I apologize for not yet fully grasping the intricate implementation of midas.cxx and mfe.cxx. However, I would greatly appreciate any suggestions or insights you can offer to help resolve this issue.

Thank you in advance for your kind assistance.
  Draft   02 Jun 2023 Kou OishiBug ReportEvent builder fails at every 10 runs
Dear 

> > The event builder fails to initiate the 10th run since its startup, 
> > 'BM_NO_MEMORY: too many requests,'
> 
> Hi Kou,
> 
> It sounds like you might be calling bm_request_event() when starting a run, but not calling bm_delete_request() when the run stops. So you end up "leaking" event requests and eventually reach the limit of 10 open requests.
> 
> In examples/eventbuilder/mevb.c the request deletion happens in source_unbooking(), which is called as part of the "run stopping" logic. I've just updated the midas repository so the example compiles correctly, and was able to start/stop 15 runs without crashing.
> 
> Can you check the end-of-run logic in your version to ensure you're calling bm_delete_request()?
  2524   02 Jun 2023 Kou OishiBug ReportEvent builder fails at every 10 runs
Dear Ben,

Hello. Thank you for your attention to this problem!

> It sounds like you might be calling bm_request_event() when starting a run, but not calling bm_delete_request() when the run stops. So you end up "leaking" event requests and eventually reach the limit of 10 open requests.

I understand. Thanks for the description.

> In examples/eventbuilder/mevb.c the request deletion happens in source_unbooking(), which is called as part of the "run stopping" logic. I've just updated the midas repository so the example compiles correctly, and was able to start/stop 15 runs without crashing.
> 
> Can you check the end-of-run logic in your version to ensure you're calling bm_delete_request()?

I really appreciate your update.
Although I am away at the moment from the DAQ development, I will test it and report the result here as soon as possible.

Best regards,
Kou
  142   26 Jul 2003 Konstantin Olchanski more ODB checks in src/odb.c
Add more checks to db_validate_key() for pkey->total_size, item_size and
num_values. Automatically correct total_size to be item_size*num_values (we
saw this corruption and tested this fix).

K.O.

For your enjoyment, here is the diff:

RCS file: /usr/local/cvsroot/midas/src/odb.c,v
retrieving revision 1.64
diff -r1.64 odb.c
718a719,744
>   /* check key sizes */
>   if ((pkey->total_size < 0)||(pkey->total_size > pheader->key_size))
>     {
>     cm_msg(MERROR, "db_validate_key", "Warning: invalid key \"%s\"
total_size: %d", path, pkey->total_size);
>     return 0;
>     }
> 
>   if ((pkey->item_size < 0)||(pkey->item_size > pheader->key_size))
>     {
>     cm_msg(MERROR, "db_validate_key", "Warning: invalid key \"%s\"
item_size: %d", path, pkey->item_size);
>     return 0;
>     }
> 
>   if ((pkey->num_values < 0)||(pkey->num_values > pheader->key_size))
>     {
>     cm_msg(MERROR, "db_validate_key", "Warning: invalid key \"%s\"
num_values: %d", path, pkey->num_values);
>     return 0;
>     }
> 
>   /* check and correct key size */
>   if (pkey->total_size != pkey->item_size*pkey->num_values)
>     {
>     cm_msg(MINFO,  "db_validate_key", "Warning: corrected key \"%s\" size:
total_size=%d, should be %d*%d=%d", path, pkey->total_size, pkey->item_size,
pkey->num_values, pkey
->item_size*pkey->num_values);
>     pkey->total_size = pkey->item_size*pkey->num_values;
>     }
> 
  141   26 Jul 2003 Konstantin Olchanski use "odbedit -C" to connect to corrupted ODB
Add switch "-C" to odbedit to allow it to connect to corrupted ODB. Then,
depending on corruption, the user can manually remove or correct the
corrupted entries. Also, some corruption is automatically fixed by "odbedit"
itself. I use this functionality to debug and fix broken ODBs.

K.O.

For your enjoyment, here is the diff:

diff -r1.64 odbedit.c
3058a3059
> BOOL          corrupted;
3063c3064
<   debug = cmd_mode = FALSE;
---
>   debug = corrupted = cmd_mode = FALSE;
3077a3079,3080
>     else if (argv[i][0] == '-' && argv[i][1] == 'C')
>       corrupted = TRUE;
3104c3107,3108
<         printf("               [-c Command] [-c @CommandFile] [-s size]
[-g (debug)]\n\n");
---
>         printf("               [-c Command] [-c @CommandFile] [-s size]\n");
>         printf("               [-g (debug)] [-C (connect to corrupted
ODB)]\n\n");
3123c3127,3133
<   if (status != CM_SUCCESS)
---
>   else if ((status == DB_INVALID_HANDLE)&&corrupted)
>     {
>     cm_get_error(status, str);
>     puts(str);
>     printf("ODB is corrupted, connecting anyway...\n");
>     }
>   else if (status != CM_SUCCESS)
  139   29 Jul 2003 Konstantin Olchanski Have to link with -lpthread?
It appears that all midas applications are now required to link with the
pthreads library even if they do not use threads. This is caused by a
pthread_create() call from ss_thread_create() in system.c.

Is this the intended behaviour?

K.O.
  135   11 Aug 2003 Konstantin Olchanski mhttpd crash on corrupted ODB /RunInfo
Invalid values of ODB /RunInfo/State cause mhttpd crash in
show_status_page() because of an out of bounds access to the array of state
names. Suggest this fix: remove array of state names, use existing ladder of
if/else statements to explicitely set state name. Verified the fix works for
TWIST. Will commit this into MIDAS CVS unless get feedback.

src/mhttpd.c:show_status_page() {
  ...
  rsprintf("<tr align=center><td>Run #%d", runinfo.run_number);

  if (runinfo.state == STATE_STOPPED)
    rsprintf("<td colspan=1 bgcolor=#FF0000>Stopped");
  else if (runinfo.state == STATE_PAUSED)
    rsprintf("<td colspan=1 bgcolor=#FFFF00>Paused");
  else if (runinfo.state == STATE_RUNNING)
    rsprintf("<td colspan=1 bgcolor=#00FF00>Running");
  else
    rsprintf("<td colspan=1 bgcolor=#FFFFFF>Unknown");

  if (runinfo.requested_transition)
  ...

K.O.
  2   11 Aug 2003 Konstantin Olchanski Alarm on no ping?
I want midas alarms to go off when I cannot ping arbitrary remote hosts. Is
there is easy/preferred way to do this? K.O.
  136   10 Oct 2003 Konstantin Olchanski mhttpd crash on corrupted ODB /RunInfo
There was no feedback. This code has been commited. K.O.

> Invalid values of ODB /RunInfo/State cause mhttpd crash in
> show_status_page() because of an out of bounds access to the array of state
> names. Suggest this fix: remove array of state names, use existing ladder of
> if/else statements to explicitely set state name. Verified the fix works for
> TWIST. Will commit this into MIDAS CVS unless get feedback.
> 
> src/mhttpd.c:show_status_page() {
>   ...
>   rsprintf("<tr align=center><td>Run #%d", runinfo.run_number);
> 
>   if (runinfo.state == STATE_STOPPED)
>     rsprintf("<td colspan=1 bgcolor=#FF0000>Stopped");
>   else if (runinfo.state == STATE_PAUSED)
>     rsprintf("<td colspan=1 bgcolor=#FFFF00>Paused");
>   else if (runinfo.state == STATE_RUNNING)
>     rsprintf("<td colspan=1 bgcolor=#00FF00>Running");
>   else
>     rsprintf("<td colspan=1 bgcolor=#FFFFFF>Unknown");
> 
>   if (runinfo.requested_transition)
>   ...
> 
> K.O.
  129   12 Oct 2003 Konstantin Olchanski Array overruns in mhttpd.c::submit_elog()
While adding new functionality to submit_elog() (add the message text to the
outgoing email), I noticed that the email text is being stored into an array
of size 256, mail_text[256], without any checks for array overrun. This
cannot be good. How should this be corrected?
K.O.
  130   12 Oct 2003 Konstantin Olchanski Array overruns in mhttpd.c::submit_elog()
> While adding new functionality to submit_elog() (add the message text to the
> outgoing email), I noticed that the email text is being stored into an array
> of size 256, mail_text[256], without any checks for array overrun. This
> cannot be good. How should this be corrected?
> K.O.

Similar problem exists in midas.c::el_submit(). The array "message[10000]" is
easy to overrun by submitting a long elog message.

K.O.
  125   12 Oct 2003 Konstantin Olchanski mhttpd: add Elog text to outgoing email.
This commit adds the elog message text to the outgoing email message. This
functionality has been requested a logn time ago, but I guess nobody got
around to implement it, until now. I also added assert() traps for the most
common array overruns in the Elog code.

Here is the cvs diff:

Index: src/mhttpd.c
===================================================================
RCS file: /usr/local/cvsroot/midas/src/mhttpd.c,v
retrieving revision 1.252
diff -r1.252 mhttpd.c
768a769
> #include <assert.h>
3740c3741
< char   mail_to[256], mail_from[256], mail_text[256], mail_list[256],
---
> char   mail_to[256], mail_from[256], mail_text[10000], mail_list[256],
3921a3923,3925
>         // zero out the array. needed because later strncat() does not
always add the trailing '\0'
>         memset(mail_text,0,sizeof(mail_text));
> 
3931a3936,3945
> 
>         assert(strlen(mail_text) + 100 < sizeof(mail_text)); // bomb out
on array overrun.
> 
>         strcat(mail_text+strlen(mail_text),"\n");
>         // this strncat() depends on the mail_text array being zeroed out:
>         // strncat() does not always add the trailing '\0'
>        
strncat(mail_text+strlen(mail_text),getparam("text"),sizeof(mail_text)-strlen(mail_text)-50);
>         strcat(mail_text+strlen(mail_text),"\n");
> 
>         assert(strlen(mail_text) < sizeof(mail_text)); // bomb out on
array overrun.
Index: src/midas.c
===================================================================
RCS file: /usr/local/cvsroot/midas/src/midas.c,v
retrieving revision 1.192
diff -r1.192 midas.c
604a605
> #include <assert.h>
16267a16269,16270
> 
>   assert(strlen(message) < sizeof(message)); // bomb out on array overrun.

K.O.
  133   12 Oct 2003 Konstantin Olchanski Refuse to set run number zero
I am debugging the frequent problem where the run number is mysteriously
reset to zero. As a first step, I am commiting changes to mhttpd.c and midas.c:
- abort on obviously corrupted "run number < 0"
- abort on cm_transition() to run 0 (the only place where the run number is
explicitely written to ODB)
- in the mhttpd "Start run" form, reject user setting the run number to <= 0.

Here is the CVS diff:

===================================================================
RCS file: /usr/local/cvsroot/midas/src/mhttpd.c,v
retrieving revision 1.253
diff -r1.253 mhttpd.c
2451a2452,2457
>   if (run_number < 0)
>     {
>     cm_msg(MERROR, "show_elog_new", "aborting on attempt to use invalid
run number %d",run_number);
>     abort();
>     }
> 
2506a2513,2519
> 
>     if (run_number < 0)
>       {
>       cm_msg(MERROR, "show_elog_new", "aborting on attempt to use invalid
run number %d",run_number);
>       abort();
>       }
> 
3582a3596,3602
> 
>   if (run_number < 0)
>     {
>     cm_msg(MERROR, "show_form_query", "aborting on attempt to use invalid
run number %d",run_number);
>     abort();
>     }
> 
5730a5751,5756
>   if (rn < 0) // value "zero" is okey
>     {
>     cm_msg(MERROR, "show_start_page", "aborting on attempt to use invalid
run number %d",rn);
>     abort();
>     }
> 
9684a9711,9719
>       if (i <= 0)
>         {
>         cm_msg(MERROR, "interprete", "Start run: invalid run number %d",i);
>         memset(str,0,sizeof(str));
>         snprintf(str,sizeof(str)-1,"Invalid run number %d",i);
>         show_error(str);
>         return;
>         }
> 
Index: src/midas.c
===================================================================
RCS file: /usr/local/cvsroot/midas/src/midas.c,v
retrieving revision 1.193
diff -r1.193 midas.c
3786c3786
<         status = cm_transition(_requested_transition | TR_DEFERRED, 0,
str, 256, SYNC, FALSE);
---
>         status = cm_transition(_requested_transition | TR_DEFERRED, 0,
str, sizeof(str), SYNC, FALSE);
3906a3907,3912
>   if (run_number <= 0)
>     {
>     cm_msg(MERROR, "cm_transition", "aborting on attempt to use invalid
run number %d",run_number);
>     abort();
>     }
> 
16069a16076,16081
>     }
> 
>   if (run_number < 0)
>     {
>     cm_msg(MERROR, "el_submit", "aborting on attempt to use invalid run
number %d", run_number);
>     abort();

K.O.
  134   12 Oct 2003 Konstantin Olchanski Refuse to set run number zero
> I am debugging the frequent problem where the run number is mysteriously
> reset to zero. As a first step, I am commiting changes to mhttpd.c and midas.c:
> - abort on obviously corrupted "run number < 0"
> - abort on cm_transition() to run 0 (the only place where the run number is
> explicitely written to ODB)
> - in the mhttpd "Start run" form, reject user setting the run number to <= 0.

- abort on cm_transition() from run 0 to 1 during auto restart in mlogger.

Cvs diff:

RCS file: /usr/local/cvsroot/midas/src/mlogger.c,v
retrieving revision 1.65
diff -r1.65 mlogger.c
3277a3278,3283
>         if (run_number <= 0)
>           {
>           cm_msg(MERROR, "main", "aborting on attempt to use invalid run
number %d", run_number);
>           abort();
>           }
> 

K.O.
  127   13 Oct 2003 Konstantin Olchanski mhttpd: add Elog text to outgoing email.
> > around to implement it, until now. I also added assert() traps for the most
> > common array overruns in the Elog code.
> 
> In addition to the assert() one should use strlcat() and strlcpy() all over 
> the code to avoid buffer overruns. The ELOG standalone code does that already 
> properly.
> 
> - Stefan

Yes, the original authors should have used strlcat(). Now that I uncovered this source of mhttpd 
memory corruption, maybe some volunteer will fix it up properly.

K.O.
  132   13 Oct 2003 Konstantin Olchanski Array overruns in mhttpd.c::submit_elog()
> > > While adding new functionality to submit_elog() ....
> 
> The whole elog functionality in mhttpd will be replaced (sometime) ...

I humbly submit that this has been the standard reply for the last 2 years since I was aware of 
the "last N days does not always work" problem (just saw it again yesterday).

K.O.
  122   15 Oct 2003 Konstantin Olchanski test
test
test
test
  123   15 Oct 2003 Konstantin Olchanski test
> test
> test
> test

another test

K.O.
  114   31 Oct 2003 Konstantin Olchanski mana.c without ROOT and HBOOK
Stephan, why did you prohibit building mana.c without ROOT and HBOOK
support? I think such a configuration is valid and should be allowed.

Also, this prohibition broke the Midas Makefile, it now bombs building
mana.c. The Makefile is setup for building hmana.c with HBOOK support,
rmana.c with ROOT support (if ROOTSYS is set) and mana.c without HBOOK and
ROOT support (currently bombs on #error in mana.c).

K.O.
ELOG V3.1.4-2e1708b5