Back Midas Rome Roody Rootana
  Midas DAQ System, Page 60 of 154  Not logged in ELOG logo
New entries since:Wed Dec 31 16:00:00 1969
    Reply  15 Apr 2017, Konstantin Olchanski, Bug Report, badly managed case in history_schema.cxx: dat file empty 
> For an unknown reason, Logger died few days ago while writing the history. The
> file mhf_1489577446_20170315_system.dat was created, but was empty.

I ran into same problem installing new midas in the alpha experiment at cern. It should be fixed now:
https://bitbucket.org/tmidas/midas/commits/788021d9cb39a348a40e36f1b35b1440e06aa744

K.O.

> 
> When trying to restart Logger, I would get a seg fault without any special error
> message.
> 
> I tracked the issue to the "read_file_schema" function in history_schema.cxx
> 
> * L4731, a pointer to HsFileSchema *s is declared.
> * L4747, We enter a while(1) loop.
> * L4749, get char on the filename.
> In our case, the file was empty, so the variable "b" gets NULL and the loop breaks.
> 
> Problem: the memory allocation for "s" is later in the loop, L4768.
> Upon exiting the loop, L4854, we try to access record_size on a NULL pointer ==>
> SegFault. 
> 
> It would be nice to at least have a message before breaking the loop...
Entry  30 Nov 2003, Konstantin Olchanski, , bad call to cm_cleanup() in fal.c 
fal.c does not compile: it calls cm_cleanup() with one argument when there
should be two arguments. K.O.
    Reply  30 Nov 2003, Stefan Ritt, , bad call to cm_cleanup() in fal.c 
> fal.c does not compile: it calls cm_cleanup() with one argument when there
> should be two arguments. K.O.

Fixed and committed.
Entry  22 Oct 2013, Konstantin Olchanski, Info, audit of db_get_record() 
Record-oriented ODB functions db_create_record(), db_get_record(), db_check_record() and 
db_set_record() require special attention to the consistency between their "C struct"s (usually defined in 
midas.h), their initialization strings (usually defined in midas.h) and the contents of ODB.

When these 3 items become inconsistent, the corresponding midas functions tend to break.

Unlike ODB internal structures and event buffer internal structures, these record-oriented functions are 
not part of the midas binary-compatibility abi and they are not protected by db_validate_sizes().

From time to time, new items are added to some of these data structures. Usually this does not cause 
problems, but recently we had some difficulty with the runinfo and equipment structures, prompting this 
audit.

db_check_record: note: (C) means that this record is created there

alarm.c: alarm_odb_str(C)
mana.c: skipped
mfe.c: equipment_common_str, equipment_statistics_str(C), event_descrip(C), bank_list(C)
mhttpd.cxx: cgif_label_str(C), cgif_bar_str(C), runinfo_str(C), equipment_common_str(C)
mlogger.cxx: ch_settings_str(C)
sequencer.cxx: sequencer_str(C)

db_create_record:

alarm.c: alarm_odb_str, alarm_periodic_str, alarm_class_str
fal.c: skipped
mfe.c: equipment_common_str
midas.c: program_info_str (maybe)
odb.c: (maybe)
lazylogger.cxx: lazy_settings, lazy_statistics
mhttpd.cxx: runinfo_str
mlogger.cxx: chn_settings_str

db_get_record: (hard to do with grep, will have to check every db_get_record by hand)

alarm.c: alarm, class, program_info
fal.c: skipped
mana.c: skipped
midas.c: program_info
odb.c: (maybe)
lazylogger.cxx: lazyst
mhttpd.cxx: runinfo, equipment, ?hkeytemp?, chn_settings, chn_stats, ?label?, ?bar?
mlogger.cxx: ?, ?, chn_stats, chn, settings
sequencer.cxx: hkeyseq

db_set_record:

alarm.c: hkeyalarm, hkeyclass, ???, program_info
fal.c: skipped
mana.c: skipped
mfe.c: equipment_info, ?event structure?
odb.c: (maybe)
lazelogger.cxx: lazyst
mlogger.cxx: chn_stat
sequencer.cxx: seq

db_open_record: note: (W) means MODE_WRITE

fal.c: skipped
mana.c: skipped
mfe.c: equipment_info, equipment_stats(W)
midas.c: requested_transition
odbedit.c: key_update - generic test of hotlink
lazylogger.cxx: runstate, lazyst(W), lazy?
mlogger.cxx: history, chn_statistics, chn_settings
sequencer.cxx: seq

K.O.
Entry  16 Mar 2019, Gennaro Tortone, Forum, assertion failed 
Hi,
I'm developing a Slow Control equipment on a Linux board that send data on a remote server
running 'mserver'; the build goes fine, but  when I run the executable it seems that an assertion in 
midas.c failed:

[dfe01,INFO] Slow control equipment initialized
dfe: src/midas.c:838: cm_msg_flush_buffer: Assertion `rp[3]=='_'' failed.

if I remove line 838 from midas.c (fixing message length) the problem disappear...

Regards,
Gennaro
    Reply  18 Mar 2019, Konstantin Olchanski, Forum, assertion failed 
> [dfe01,INFO] Slow control equipment initialized
> dfe: src/midas.c:838: cm_msg_flush_buffer: Assertion `rp[3]=='_'' failed.
> if I remove line 838 from midas.c (fixing message length) the problem disappear...

Thank you for reporting this problem.

It is very strange, the check is for message start "MSG_", why "M", "S" and "G" are there
but "_" is missing? And you remove the check for "_" and the rest of the message is also okey?
Very odd.

I look at the code in midas.c and I also see is that the ring buffer has no protection against
overflow, it is created for max message length of around 1000 bytes, but I look at the code 
that feeds messages into it (cm_msg_format()) and it also has a buffer overrun
possibility (sprintf() instead snprintf()). Actually I just ran into this buffer overrun
myself when adding memory leak debugging code into mhttpd.

Anyhow, before this message that causes the crash, maybe perchance you have a long
message before it? Of length 1000 bytes of longer? (1000 bytes is 12 lines of 80 chars).

Or this is the very first message you generate? (other than the normal messages generated by mfe.c?)

Is there anything in midas.log around the time of the crash? (post it here or email it to me?) Also I would
like to see everything printed by your frontend from the start time all the way to the crash.

You can also add this code "assert(4+3*sizeof(int)+len < 1020)" in cm_msg_buffer() right before
rb_increment_wp() - it this assert fails, we definitely determine that we have a buffer overflow.

Thank in advance,
K.O.
    Reply  19 Mar 2019, Gennaro Tortone, Forum, assertion failed 
> > [dfe01,INFO] Slow control equipment initialized
> > dfe: src/midas.c:838: cm_msg_flush_buffer: Assertion `rp[3]=='_'' failed.
> > if I remove line 838 from midas.c (fixing message length) the problem disappear...
> 
> Thank you for reporting this problem.
> 
> It is very strange, the check is for message start "MSG_", why "M", "S" and "G" are there
> but "_" is missing? And you remove the check for "_" and the rest of the message is also okey?
> Very odd.

if I remove the check for "_" then the first message is empty and next messages are ok...
If I don't remove the check the frontend fails at start and I find these lines in midas.log:

14:46:29.719 2019/03/19 [dfe01,INFO] Program dfe01 on host lxaria02 started
14:46:29.731 2019/03/19 [dfe01,INFO] Dome FE initialized
14:46:29.737 2019/03/19 [dfe01,ERROR] [system.c:4709:recv_tcp2,ERROR] unexpected connection closure
14:46:29.737 2019/03/19 [dfe01,ERROR] [midas.c:12814:recv_event_server,ERROR] recv_tcp2(header) returned -1
14:46:29.737 2019/03/19 [dfe01,ERROR] [midas.c:14699:rpc_server_receive,ERROR] recv_event_server() returned -1, abort
14:46:29.737 2019/03/19 [dfe01,TALK] Program 'dfe01' on host 'lxaria02' aborted
 
> You can also add this code "assert(4+3*sizeof(int)+len < 1020)" in cm_msg_buffer() right before
> rb_increment_wp() - it this assert fails, we definitely determine that we have a buffer overflow.

I added assert you suggested in cm_msg_buffer() function before rp_increment_wp() and 
result is always the same at same line:

[dfe01,INFO] Dome FE initialized
Dome0001-rc:
[dfe01,INFO] Slow control equipment initialized
dfe: src/midas.c:839: cm_msg_flush_buffer: Assertion `rp[3]=='_'' failed.
Aborted

Regards,
Gennaro
    Reply  28 Mar 2019, Konstantin Olchanski, Forum, assertion failed 
For the record, I am stumped by this problem. We have definitely ruled out any data overflow inside the midas message code (there are no 
long messages sent). My only guess is that the frontend itself is corrupting the midas message buffer, but this corruption
must be unlikely lucky to corrupt just the "_" character (and maybe what follows it) from the "MSG_" header inside the buffer.

If indeed this is memory corruption inside the frontend, to find and fix it, one would have to roll out valgrind and other malloc() debugging 
tools and good luck...

K.O.



> > > [dfe01,INFO] Slow control equipment initialized
> > > dfe: src/midas.c:838: cm_msg_flush_buffer: Assertion `rp[3]=='_'' failed.
> > > if I remove line 838 from midas.c (fixing message length) the problem disappear...
> > 
> > Thank you for reporting this problem.
> > 
> > It is very strange, the check is for message start "MSG_", why "M", "S" and "G" are there
> > but "_" is missing? And you remove the check for "_" and the rest of the message is also okey?
> > Very odd.
> 
> if I remove the check for "_" then the first message is empty and next messages are ok...
> If I don't remove the check the frontend fails at start and I find these lines in midas.log:
> 
> 14:46:29.719 2019/03/19 [dfe01,INFO] Program dfe01 on host lxaria02 started
> 14:46:29.731 2019/03/19 [dfe01,INFO] Dome FE initialized
> 14:46:29.737 2019/03/19 [dfe01,ERROR] [system.c:4709:recv_tcp2,ERROR] unexpected connection closure
> 14:46:29.737 2019/03/19 [dfe01,ERROR] [midas.c:12814:recv_event_server,ERROR] recv_tcp2(header) returned -1
> 14:46:29.737 2019/03/19 [dfe01,ERROR] [midas.c:14699:rpc_server_receive,ERROR] recv_event_server() returned -1, abort
> 14:46:29.737 2019/03/19 [dfe01,TALK] Program 'dfe01' on host 'lxaria02' aborted
>  
> > You can also add this code "assert(4+3*sizeof(int)+len < 1020)" in cm_msg_buffer() right before
> > rb_increment_wp() - it this assert fails, we definitely determine that we have a buffer overflow.
> 
> I added assert you suggested in cm_msg_buffer() function before rp_increment_wp() and 
> result is always the same at same line:
> 
> [dfe01,INFO] Dome FE initialized
> Dome0001-rc:
> [dfe01,INFO] Slow control equipment initialized
> dfe: src/midas.c:839: cm_msg_flush_buffer: Assertion `rp[3]=='_'' failed.
> Aborted
> 
> Regards,
> Gennaro
Entry  07 Mar 2008, Randolf Pohl, Bug Report, array overflows and other bugs out.0.make
Hi,

I have just compiled MIDAS svn 4132 on a fresh SuSE 10.3 x86_64 system and gcc 
found a bunch of bugs, I guess.

> uname -a
Linux pc 2.6.22.17-0.1-default #1 SMP 2008/02/10 20:01:04 UTC x86_64 x86_64 
x86_64 GNU/Linux

> gcc --version
gcc (GCC) 4.2.1 (SUSE Linux)

I see loads of warnings during compile, most of which I know from earlier 
compiles:
* warning: dereferencing type-punned pointer will break strict-aliasing rules
* warning: pointer targets in passing argument 3 of 'getsockname' differ in
           signedness

But then there is a new one (in fact lots of this one), and brief
inspection suggests this is a true bug with the possibility of truly
nasty consequences.

(1)=========================
src/midas.c:7398: warning: array subscript is above array bounds
Inspection of midas.c:

   if (i == MAX_DEFRAG_EVENTS) {
      /* no buffer available -> no first fragment received */
7398: free(defrag_buffer[i].pevent);
      memset(&defrag_buffer[i].event_id, 0, sizeof(EVENT_DEFRAG_BUFFER));
      cm_msg(MERROR, "bm_defragement_event",
             "Received fragment without first fragment (ID %d) Ser#:%d",
             pevent->event_id & 0x0FFF, pevent->serial_number);
      return;
   }

And midas.c line 7297:
EVENT_DEFRAG_BUFFER defrag_buffer[MAX_DEFRAG_EVENTS];

So, if(i==MAX_DEFRAG_EVENTS) free(defrag_buffer[i]).
I guess this is an off-by-one bug.

(2)==========================
src/midas.c:2958: warning: array subscript is above array bounds

   for (i = 0; i < 13; i++)
2958  if (trans_name[i].transition == transition)
         break;

Holy cow, hard-coded "13" in the code! Should be a #define, shouldn't it?

Now look at midas.c lines 94ff:
struct {
   int transition;
   char name[32];
} trans_name[] = {
   {
   TR_START, "START",}, {
   TR_STOP, "STOP",}, {
   TR_PAUSE, "PAUSE",}, {
   TR_RESUME, "RESUME",}, {
   TR_DEFERRED, "DEFERRED",}, {
0, "",},};

There is no trans_name[12].

The trans_name[12] shows up in line 2894 and 2790, too.

(3)=============================
mfe.c:
src/mfe.c:412: warning: array subscript is above array bounds
src/mfe.c:311: warning: array subscript is above array bounds
src/mfe.c:340: warning: array subscript is above array bounds

412: device_drv->dd(CMD_GET_DEMAND, device_drv->dd_info, i, 
          &device_drv->mt_buffer->channel[i].array[CMD_GET_DEMAND]);


   for (cmd = CMD_GET_FIRST; cmd <= CMD_GET_LAST; cmd++) {
 (..)
311:  device_drv->mt_buffer->channel[current_channel].array[cmd] = value;

   for (cmd = CMD_GET_FIRST; cmd <= CMD_GET_LAST; cmd++) {
 (..)
340:  device_drv->mt_buffer->channel[i].array[cmd] = value;


CMD_GET_DEMAND is in include/midas.h:
#define CMD_GET_DEMAND               CMD_GET_DIRECT  // = 20

I haven't even tried to understand mfe.c, nor did I read it. 
But I suspect the thing should always be something like
....array[cmd-CMD_GET_FIRST]
so array[] is indexed from [0], not from an arbitrary number that
depends on the number of commands you insert before line 698 in
midas.h. But please could the author of this check this very carefully?


(4)=========================
src/lazylogger.c:1957: warning: array subscript is below array bounds

if ((channel < 0) && (lazyinfo[channel].hKey != 0))

That is lazyinfo[something below zero].


(5)=============================
More warnings an expert might want to have a look at:

* warning: deprecated conversion from string constant to 'char*'

* src/fal.c:106: warning: non-local variable '<anonymous struct> out_info'
                 uses anonymous type
* src/fal.c:3064: warning: non-local variable '<anonymous struct> eb' uses
                  anonymous type

I attach the full output of make.
Could someone knowledgeable please have a look at these warnings and fix them?

They make me a bit nervous when thinking about data integrity, and
there are now so many that they actually start to hide serious stuff
like the ones I presented.

Oh, I got rid of the "dereferencing type-punned pointer" thing by adding
"-fno-strict-aliasing" as a compiler flag. Was suggested on the Web. Seemed to 
have worked during data taking (the data look reasonable :-). Is that a 
possible fix/workaround?


Cheers,

Randolf
    Reply  07 Mar 2008, Stefan Ritt, Bug Report, array overflows and other bugs 
> I have just compiled MIDAS svn 4132 on a fresh SuSE 10.3 x86_64 system and gcc 
> found a bunch of bugs, I guess.

Ahh, great! gcc is getting more and more clever. Each time gcc is updated, it finds
a few new issues.

Indeed some are real bugs, and I will work down the list as time permits. I see
however no immediate thread (you are not using fragmented events, a transition 12
never occurs, etc.). Issue #4 from your list has to be checked by Pierre-Andre. 
    Reply  10 Mar 2008, Stefan Ritt, Bug Report, array overflows and other bugs 
There were some trivial and some non-trivial issues. Glad the compiled picked up on
this!

> I see loads of warnings during compile, most of which I know from earlier 
> compiles:
> * warning: dereferencing type-punned pointer will break strict-aliasing rules
> * warning: pointer targets in passing argument 3 of 'getsockname' differ in
>            signedness

I ignore these for the moment until I have a gcc 4.2 myself (we use Scientific
Linux 5 which has gcc 4.1 for the moment). As Randolph pointed out correctly you
can make gcc shut up by a proper flag there. The warnings have no influence on the
stability of midas.

> (1)=========================
> src/midas.c:7398: warning: array subscript is above array bounds
> Inspection of midas.c:
> 
>    if (i == MAX_DEFRAG_EVENTS) {
>       /* no buffer available -> no first fragment received */
> 7398: free(defrag_buffer[i].pevent);
>       memset(&defrag_buffer[i].event_id, 0, sizeof(EVENT_DEFRAG_BUFFER));
>       cm_msg(MERROR, "bm_defragement_event",
>              "Received fragment without first fragment (ID %d) Ser#:%d",
>              pevent->event_id & 0x0FFF, pevent->serial_number);
>       return;
>    }

The free() was just wrong at that place, I removed it.

> (2)==========================
> src/midas.c:2958: warning: array subscript is above array bounds
> 
>    for (i = 0; i < 13; i++)
> 2958  if (trans_name[i].transition == transition)
>          break;

Fixed that by
     
     for (i=0 ;; i++)
        if (trans_name[i].name[0] == 0 || trans_name[i].transition == transition)
           break;

Since trans_name[i].name = "" indicates the end of the list.

> (3)=============================
> mfe.c:
> src/mfe.c:412: warning: array subscript is above array bounds
> src/mfe.c:311: warning: array subscript is above array bounds
> src/mfe.c:340: warning: array subscript is above array bounds
> 
> 412: device_drv->dd(CMD_GET_DEMAND, device_drv->dd_info, i, 
>           &device_drv->mt_buffer->channel[i].array[CMD_GET_DEMAND]);

The code at 412 was wrong there, the demand value is queried later by the device
driver directly. For the other two occurences (311 and 340) I had to really
increase the array size by one. This issue can cause segfaults if you have a slow
control front-end which uses multithreading (not many people use it except me).

> (4)=========================
> src/lazylogger.c:1957: warning: array subscript is below array bounds
> 
> if ((channel < 0) && (lazyinfo[channel].hKey != 0))
> 
> That is lazyinfo[something below zero].

This has to be fixed by Pierre. I guess an or instead of an and would do it, but
I'm not 100% sure.

> (5)=============================
> More warnings an expert might want to have a look at:
> 
> * warning: deprecated conversion from string constant to 'char*'
> 
> * src/fal.c:106: warning: non-local variable '<anonymous struct> out_info'
>                  uses anonymous type
> * src/fal.c:3064: warning: non-local variable '<anonymous struct> eb' uses
>                   anonymous type
> 
> I attach the full output of make.
> Could someone knowledgeable please have a look at these warnings and fix them?

Uahhh. Especially the "const char*" vs. "char*" is in principle right, but will
cause a major rework. Probably hundreds of occations have to be fixed. Many strings
must be declared const, others not. It will help the programmer to find some errors
during compile which would later show up only during runtime (like writing into a
fixed string), but I only will go through that when I have gcc 4.2 installed
myself, and have two free days to work on this ;-)

> They make me a bit nervous when thinking about data integrity, and
> there are now so many that they actually start to hide serious stuff
> like the ones I presented.

Except the slow control stuff (which only is an issue for multithreaded frontends)
none of the above things will have an influence on the data integrity. But I agree
that they should be fixed.

- Stefan
Entry  22 May 2007, Randolf Pohl, Bug Report, analyzer_init called by odb_load 
Hi,

I wonder why mana.c:odb_load() calls analyzer_init(). This way analyzer_init 
is called TWICE or more times:
first from mana.c:mana_init(), for each invocation of the analyzer, and 
second from mana.c:odb_load(), for each run to be analyzed

Isn't this a bug? It can mess up several things (like mallocs) if you don't 
take the necessary precautions. Other module_init functions are correctly 
called only once, before all runs are analyzed.

I have the feeling, that odb_load should NOT call analyzer_init. Or am I wrong 
(probably, but please explain to me)? Do I have to live with it and make sure 
that my beautiful global initialization in analyzer_init is only done once?
:-)

Cheers,

Randolf

And here is the annotated log using the ROOT example experiment 
(several modules changed/added to print their respective names)

:~/midas/examples/root> ./analyzer -e exa_root -i run%05d.mid -r 1 3
 
analyzer_init        <-- ok

Root server listening on port 9090...
adc_calib_init       <-- ok
adc_summing_init     <-- ok
scaler_init          <-- ok
Running analyzer offline. Stop with "!"
Set run number 1 in ODB
Load ODB from run 1...
analyzer_init        <-- not ok, or is it?

OK
run00001.mid:777  events, 0.00s
Set run number 2 in ODB
Load ODB from run 2...
analyzer_init        <-- not ok, or is it?

OK
run00002.mid:7227  events, 0.03s
Set run number 3 in ODB
Load ODB from run 3...
analyzer_init        <-- not ok, or is it?

OK
run00003.mid:13866  events, 0.06s
adc_calib_exit
adc_summing_exit
scaler_exit

analyzer_exit
    Reply  22 May 2007, Stefan Ritt, Bug Report, analyzer_init called by odb_load 
The reason to call analyzer_init in odb_load is the following:

Assume you run the analyzer offline, analyzing many files in series. Then assume
that you have /Experiment/Run Parameters, which is actively used by the analyzer
(like beam settings etc.). In this case you do a db_open_record() to map
/Experiment/Run Parameters to the exp_param C structure. For this mapping to work,
the ODB structure and the C structure have to be exactly the same. Now assume that
you changed your run parameters over time, like you added some comment later. Now
you want to analyzer several runs, some before and some after the modification.
Both sets have a different structure in /Experiment/Run Parameters, which is a
problem, since the compiled analyzer can only have a single C structure. My "poor"
solution was to call analyzer_init after each loading of the ODB from the *.mid
file. The db_create_record() call matches the C structure to the ODB structure by
modifying the ODB structure if necessary. So if you added one parameter later, this
(modified) structure gets loaded by odb_load, but then it gets adjusted in
analyzer_init().

I understand now that this case might not happen so often, and you are more
bothered by the fact that analyzer_init gets called several time. There must
however be a hook for offline analysis that the user code can correct the ODB
structure. So I propose to add a flag to analyzer_init, such as

INT analyzer_init(BOOL bFirst)
{
}

If bFirst equals TRUE, the function got called from mana_init(), if FALSE, it got
called from odb_load. Then you can put code like

INT analyzer_init(BOOL bFirst)
{
   if (bFirst) {
      p = malloc()
      ...
   }
}

If you agree, I will modify the code and commit the change.

- Stefan
    Reply  22 May 2007, Randolf Pohl, Bug Report, analyzer_init called by odb_load 
Thanks for the quick reply, Stefan.

Please don't change anything in the code unless you find it really important. I guess 
changing the analyzer_init prototype will break a lot of code out there?

In fact, I think I do understand this behavior now.
And even without your suggested fix there is a simple workaround: I add a static 
variable to my analyzer_init.cxx file, and do something similar to your bFirst fix.

In conclusion, commit your fix if it does not harm others. Postpone this commit to a 
future new version of midas which breaks a lot of things anyway...

A last question, for me to understand: Why not call db_open_record in 
ana_begin_of_run then?

Cheers,

Randolf
    Reply  22 May 2007, Stefan Ritt, Bug Report, analyzer_init called by odb_load 
> Thanks for the quick reply, Stefan.
> 
> Please don't change anything in the code unless you find it really important.
I guess 
> changing the analyzer_init prototype will break a lot of code out there?
> 
> In fact, I think I do understand this behavior now.
> And even without your suggested fix there is a simple workaround: I add a static 
> variable to my analyzer_init.cxx file, and do something similar to your bFirst
fix.
> 
> In conclusion, commit your fix if it does not harm others. Postpone this
commit to a 
> future new version of midas which breaks a lot of things anyway...
> 
> A last question, for me to understand: Why not call db_open_record in 
> ana_begin_of_run then?

I fully agree with you that db_open_record would better go into ana_begin_of_run
(and
analyzer_init not being called in odb_load), and I fully agree with you that
changing the
code would break many experiments. ;-)

So I guess we leave it as it is right now as you suggested.
Entry  27 Feb 2017, William Moore, Suggestion, analyzer failing to load ODB parameters 
Hi,

I am attempting to compile and run analysis code on a completely different,
unconnected system than the DAQ computer for the experiment. The analyzer was
developed previously and my goal is to get it running and then update it to
achieve my needs. Before compiling the analyzer, I load a backup ODB file in
odbedit, and compile experim.h. I then compile the analyzer with that experim.h
file. When I run the analyzer I get the following output:

> MIDAS version 2.1ROOT version 5.34/36Root server listening on port 9090...
> Running analyzer offline. Stop with "!"
> Configuration file "/somedir/switches.odb" loaded
> [Analyzer,INFO] Set run number 1290 in ODB
> Load ODB from run 1290...[Analyzer,INFO] cannot load value "Client Notify":
write protected
> [Analyzer,INFO] cannot load value "Prompt": write protected
.
.
.
> [Analyzer,INFO] cannot load value "LANSCE-ops": write protected
> MIDAS version 2.1ROOT version 5.34/36OK
> Configuration file "/somedir/switches.odb" loaded
> Data_Raw/run01290.mid.gz:16355  Data_Analyzed/run01290.root:15208  events, 0.43s

I have confirmed all files being used have read/write access to all users. The
analyzer does populate a .root output file with filled histograms, however not
all histograms are filled. I believe this is because histograms that relied on
an ODB paramater that failed to load did not populate. Any idea as to what I am
doing wrong or how I could resolve this issue are greatly appreciated.

Thanks,
William Moore
Entry  04 Feb 2008, Robert Pattie, Forum, analyzer crashes at high rates 
I'm using midas to read data from a waveform digitizer at event rates of
10-30kHz. To accomplish this the digitizer is read via Block transfers and the
raw data put into a single MIDAS event.  Thus a MIDAS event could contain upto
250 physical events and at maximum 350kBytes.  In the analyzer modules I had
been analyzing the first physics event contained in a MIDAS event with no
problem.  Recently I tried to analyze all the physical events.  At low rates,
100hz-1khz, this was no problem, 1-5 physical events in a MIDAS event.  At
higher rates 10-20kHz, where there are about 40physical events per MIDAS event,
 the analyzer keeps up for a  few seconds then seg faults with " 'shared object
read from target memory' has disappear; keeping it symbols".  Any suggestions as
to why the analyzer is crashing would be very helpful.

Thanks,

Robert     
    Reply  05 Feb 2008, Stefan Ritt, Forum, analyzer crashes at high rates 
> I'm using midas to read data from a waveform digitizer at event rates of
> 10-30kHz. To accomplish this the digitizer is read via Block transfers and the
> raw data put into a single MIDAS event.  Thus a MIDAS event could contain upto
> 250 physical events and at maximum 350kBytes.  In the analyzer modules I had
> been analyzing the first physics event contained in a MIDAS event with no
> problem.  Recently I tried to analyze all the physical events.  At low rates,
> 100hz-1khz, this was no problem, 1-5 physical events in a MIDAS event.  At
> higher rates 10-20kHz, where there are about 40physical events per MIDAS event,
>  the analyzer keeps up for a  few seconds then seg faults with " 'shared object
> read from target memory' has disappear; keeping it symbols".  Any suggestions as
> to why the analyzer is crashing would be very helpful.

I personally have never seen this error message. The analyzer is designed such that
it produces "back pressure" if the data rate is higher than the analysis rate and
you have "request all events" on. The only thing I can image are the following two
issues:

- At higher rate where you have more than 40 physical events per MIDAS event, there
is some bug in your analysis code which gets exploited only in that case. Maybe some
temporary array which is only 35 entries long or something like this.

- The back pressure mentioned above will slow down the frontend. If your computer
busy logic is not working correctly, you might get more triggers than you can
acquire. Maybe then the data gets screwed up and the analyzer chokes on it.

Finding the exact reason is not simple. For sure you have to run the analyzer inside
the debugger, to see exactly where the segfault happens. You then maybe have to
produce some dummy data in the frontend (like always sending the same event) to
disentangle some possible trigger problems from other problems.

Best regards,

  Stefan
Entry  28 Jan 2013, Robert Pattie, Forum, analyzer cannot connect to the statistics database 
I've managed to put the analyzer into state where it cannot connect to the 
statistics database.  The error message suggests another analyzer is connected.  
I've recompiled MIDAS and the user code, restarted the computer etc..., and the 
analyzer cannot connect.  If I run "odbedit -c clean", I can start the analyzer, 
but get the same error when exiting or starting a run.  I've commented out all the
user code in the analyzer.c and its associated analyzer module's, and read event
code in the frontend and nothing resolves this issue.  Any suggestion?

The output from attempting to run the analyzer is:

Connect to experiment nnbarxwnr...[odb.c:1013:db_open_database,ERROR] Removed ODB
client 'Analyzer', index 0 because process pid 31982 does not exists
Deleted entry '/System/Clients/31982' for client 'Analyzer' because it is not
connected to ODB
OK
Root server listening on port 9090...
Loading previous online histos from ./data/last.root
ss_mutex_wait_for: pthread_mutex_lock() returned errno 22 (Invalid argument),
aborting...


When attempting to clean up the Analyzer tree in the ODB I receive the message
:"deletion of key not allowed."  

It appears that running the analyzer sets the permissions of the Statistics tree of
my analyzer module into RWDE.  

Adding the following lines to my start up script eliminate the above problem:
odbedit -c clean
odbedit -c "chmod 7 Analyzer/"
odbedit -c "rm /Analyzer/fADCs/Statistics"

Now when starting a run the analyzer crashes with this error:analyzer:
src/midas.c:11443: rpc_execute: Assertion `return_buffer' failed.
Aborted (core dumped)

and the messages in the odb are :

[system.c:4295:recv_tcp,ERROR] header: recv returned 0, n_received = 0, unexpected
connection closure
[midas.c:10042:rpc_client_call,ERROR] recv_tcp() failed, routine = "rc_transition",
host = "LANL-FADC-DAQ"
[midas.c:4130:cm_transition,ERROR] Could not start a run: cm_transition() status 503,
message 'Unknown error 503 from client 'Analyzer' on host LANL-FADC-DAQ'
Deleted entry '/System/Clients/1001' for client 'Analyzer' because process pid 1001
does not exists
[midas.c:8893:rpc_client_check,ERROR] Connection broken to "Analyzer" on host
LANL-FADC-DAQ
Run #180 start aborted
Error: Unknown error 503 from client 'Analyzer' on host LANL-FADC-DAQ

20:05:02 [Logger,INFO] Deleting previous file "./data/run00180.mid"

20:05:02 [ODBEdit,ERROR] [system.c:4295:recv_tcp,ERROR] header: recv returned 0,
n_received = 0, unexpected connection closure

20:05:02 [ODBEdit,ERROR] [midas.c:10042:rpc_client_call,ERROR] recv_tcp() failed,
routine = "rc_transition", host = "LANL-FADC-DAQ"

20:05:02 [ODBEdit,ERROR] [midas.c:4130:cm_transition,ERROR] Could not start a run:
cm_transition() status 503, message 'Unknown error 503 from client 'Analyzer' on host
LANL-FADC-DAQ'

20:05:02 [ODBEdit,INFO] Deleted entry '/System/Clients/1001' for client 'Analyzer'
because process pid 1001 does not exists

20:05:02 [ODBEdit,ERROR] [midas.c:8893:rpc_client_check,ERROR] Connection broken to
"Analyzer" on host LANL-FADC-DAQ

20:05:02 [ODBEdit,INFO] Run #180 start aborted
20:05:03 [mdump,INFO] Client 'Analyzer' on buffer 'SYSTEM' removed by cm_watchdog
because process pid 1001 does not exist
20:05:11 [mhttpd,INFO] Client 'Analyzer' (PID 1001) on database 'ODB' removed by
cm_watchdog (idle 10.1s,TO 10s)


Thanks,
Robert Pattie
    Reply  01 Feb 2013, Randolf Pohl, Forum, analyzer cannot connect to the statistics database 
The simplest thing is probably to delete all files .[A-Z]*.SHM in the odb directory (the
one you specified in /etc/exptab).
This wipes the ODB, shared memory and all the other obscure stuff, giving you a clean,
fresh start.

Of course it wipes all the valuable stuff, too. That's why it's handy to sometimes open
odbedit and "save odb_<yyyymmdd>.odb". You can reload the thing after such a fatal 
"rm .[A-Z]*.SHM" 
ELOG V3.1.4-2e1708b5