Back Midas Rome Roody Rootana
  Midas DAQ System, Page 1 of 145  Not logged in ELOG logo
Entry  03 Mar 2022, Konstantin Olchanski, Info, zlib required, lz4 internal 
as of commit 8eb18e4ae9c57a8a802219b90d4dc218eb8fdefb, the gzip compression
library is required, not optional.

this fixes midas and manalyzer mis-build if the system gzip library
is accidentally not installed. (is there any situation where
gzip library is not installed on purpose?)

midas internal lz4 compression library was renamed to mlz4 to avoid collision
against system lz4 library (where present). lz4 files from midasio are now
used, lz4 files in midas/include and midas/src are removed.

I see that on recent versions of ubuntu we could switch to the system version 
of the lz4 library. however, on centos-7 systems it is usually not present
and it still is a supported and widely used platform, so we stay
with the midas-internal library for now.

K.O.
Entry  01 Apr 2024, Konstantin Olchanski, Info, xz-utils bomb out, compression benchmarks 
you may have heard the news of a major problem with the xz-utils project, authors of the popular "xz" file compression, 
https://nvd.nist.gov/vuln/detail/CVE-2024-3094

the debian bug tracker is interesting reading on this topic, "750 commits or contributions to xz by Jia Tan, who backdoored it", 
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1068024

and apparently there is problems with the deisng of the .xz file format, making it vulnerable to single-bit errors and unreliable checksums,
https://www.nongnu.org/lzip/xz_inadequate.html

this moved me to review status of file compression in MIDAS.

MIDAS does not use or recommend xz compression, MIDAS programs to not link to xz and lzma libraries provided by xz-utils.

mlogger has built-in support for:
- gzip-1, enabled by default, as the most safe and bog-standard compression method
- bzip2 and pbzip2, as providing the best compression
- lz4, for high data rate situations where gzip and bzip2 cannot keep up with the data

compression benchmarks on an AMD 7700 CPU (8-core, DDR5 RAM) confirm the usual speed-vs-compression tradeoff:

note: observe how both lz4 and pbzip2 compress time is the time it takes to read the file from ZFS cache, around 6 seconds.
note: decompression stacks up in the same order: lz4, gzip fastest, pbzip2 same speed using 10x CPU, bzip2 10x slower uses 1 CPU.
note: because of the fast decompression speed, gzip remains competitive.

no compression: 6 sec, 270 MiBytes/sec,
lz4, bpzip2:    6 sec, same, (pbzip2 uses 10 CPU vs lz4 uses 1 CPU)
gzip -1:       21 sec,  78 MiBytes/sec
bzip2:         70 sec,  23 MiBytes/sec (same speed as pbzip2, but using 1 CPU instead of 10 CPU)

file sizes:

(vslice) dsdaqdev@dsdaqgw:/zdata/vslice$ ls -lSr test.mid*
-rw-r--r-- 1 dsdaqdev users  483319523 Apr  1 14:06 test.mid.bz2
-rw-r--r-- 1 dsdaqdev users  631575929 Apr  1 14:06 test.mid.gz
-rw-r--r-- 1 dsdaqdev users 1002432717 Apr  1 14:06 test.mid.lz4
-rw-r--r-- 1 dsdaqdev users 1729327169 Apr  1 14:06 test.mid
(vslice) dsdaqdev@dsdaqgw:/zdata/vslice$ 

actual benchmarks:

(vslice) dsdaqdev@dsdaqgw:/zdata/vslice$ /usr/bin/time cat test.mid > /dev/null
0.00user 6.00system 0:06.00elapsed 99%CPU (0avgtext+0avgdata 1408maxresident)k

(vslice) dsdaqdev@dsdaqgw:/zdata/vslice$ /usr/bin/time gzip -1 -k test.mid
14.70user 6.42system 0:21.14elapsed 99%CPU (0avgtext+0avgdata 1664maxresident)k

(vslice) dsdaqdev@dsdaqgw:/zdata/vslice$ /usr/bin/time lz4 -k -f test.mid
2.90user 6.44system 0:09.39elapsed 99%CPU (0avgtext+0avgdata 7680maxresident)k

(vslice) dsdaqdev@dsdaqgw:/zdata/vslice$ /usr/bin/time bzip2 -k -f test.mid
64.76user 8.81system 1:13.59elapsed 99%CPU (0avgtext+0avgdata 8448maxresident)k

(vslice) dsdaqdev@dsdaqgw:/zdata/vslice$ /usr/bin/time pbzip2 -k -f test.mid
86.76user 15.39system 0:09.07elapsed 1125%CPU (0avgtext+0avgdata 114596maxresident)k

decompression benchmarks:

(vslice) dsdaqdev@dsdaqgw:/zdata/vslice$ /usr/bin/time lz4cat  test.mid.lz4 > /dev/null
0.68user 0.23system 0:00.91elapsed 99%CPU (0avgtext+0avgdata 7680maxresident)k

(vslice) dsdaqdev@dsdaqgw:/zdata/vslice$ /usr/bin/time zcat  test.mid.gz > /dev/null
6.61user 0.23system 0:06.85elapsed 99%CPU (0avgtext+0avgdata 1408maxresident)k

(vslice) dsdaqdev@dsdaqgw:/zdata/vslice$ /usr/bin/time bzcat  test.mid.bz2 > /dev/null
27.99user 1.59system 0:29.58elapsed 99%CPU (0avgtext+0avgdata 4656maxresident)k

(vslice) dsdaqdev@dsdaqgw:/zdata/vslice$ /usr/bin/time pbzip2 -dc test.mid.bz2 > /dev/null
37.32user 0.56system 0:02.75elapsed 1377%CPU (0avgtext+0avgdata 157036maxresident)k

K.O.
Entry  05 Feb 2007, Konstantin Olchanski, Bug Report, wrong version in include/midas.h? 
The present .../include/midas.h contains
[alpha@laddvme06 ~/online]$ grep 1.9.5 /home/alpha/packages/midas/include/*
/home/alpha/packages/midas/include/midas.h:#define MIDAS_VERSION "1.9.5"

All MIDAS utilities (odbedit ver) presently report version 1.9.5, even for svn
trunk, and this may confuse people as to what version of midas they are using,
and may complicate reporting of bugs.

Perhaps the trunk version should say something like "svn-22233344" (the svn
revision number)? The present "1.9.5" is wrong...

K.O.
    Reply  06 Feb 2007, Stefan Ritt, Bug Report, wrong version in include/midas.h? 
> The present .../include/midas.h contains
> [alpha@laddvme06 ~/online]$ grep 1.9.5 /home/alpha/packages/midas/include/*
> /home/alpha/packages/midas/include/midas.h:#define MIDAS_VERSION "1.9.5"
> 
> All MIDAS utilities (odbedit ver) presently report version 1.9.5, even for svn
> trunk, and this may confuse people as to what version of midas they are using,
> and may complicate reporting of bugs.
> 
> Perhaps the trunk version should say something like "svn-22233344" (the svn
> revision number)? The present "1.9.5" is wrong...

Fully agree. I added a svn_revision string into midas.h, which gets reported now
by "odbedit ver". Unfortunately this reflects only changes in midas.c. If one
changes odb.c for example, the svn revision in midas.c does not get modified by
the SVN system. In addition I changed the present version 1.9.5 to 2.0.0. I made
the tar and zip files. After some internal testing, it will be announced
officially in a few days.
Entry  30 May 2020, Gennaro Tortone, Bug Report, wrong run number 
Hi,
I build MIDAS and ROOTANA using same tag (midas-2020-03-a, rootana-2020-03-a):

if I build examples in ROOTANA I got wrong run number (always 0):

[root@lxgentor examples]# ./ana.exe -r9090

Using THttpServer in read/write mode
TMidasOnline::connect: Connecting to experiment "exo" on host 
"lxgentor.na.infn.it"
MVOdb::SetMidasStatus: Error: MIDAS db_get_value() at ODB path "//runinfo/Run 
number" returned status 312
Opened output file with name : output00000000.root
TDT724Waveform done init...... 
Create Histos
Create Histos
TMidasOnline::eventRequest: Event request: buffer "SYSTEM" (2), event id 
0xffffffff, trigger mask 0xffffffff, sample 2, request id: 0

it seems that some function try to get "//runinfo/Run number" (double slash) 
instead of "/runinfo/Run number"...

Thanks in advance,
Gennaro
    Reply  30 May 2020, Thomas Lindner, Bug Report, wrong run number 
Hi,

I fixed this particular case, so that I now I get the run number correctly.

But Konstantin will need to explain how this class is supposed to be used more generally.  The example programs have a mix with sometimes needing leading slashes and other times not:

Thomass-MacBook-Pro-3:rootana lindner$ grep -s 'runinfo/Run' */*.c*
libAnalyzer/TRootanaEventLoop.cxx:   fODB->RI("runinfo/Run number", &fCurrentRunNumber);
manalyzer/manalyzer.cxx:   int run_number = midas->odbReadInt("/runinfo/Run number");
manalyzer/manalyzer_v0.cxx:   int run_number = midas->odbReadInt("/runinfo/Run number");
old_analyzer/analyzer.cxx:   gOdb->RI("runinfo/Run number", &gRunNumber);

Cheers,
Thomas

> 
> Hi,
> I build MIDAS and ROOTANA using same tag (midas-2020-03-a, rootana-2020-03-a):
> 
> if I build examples in ROOTANA I got wrong run number (always 0):
> 
> [root@lxgentor examples]# ./ana.exe -r9090
> 
> Using THttpServer in read/write mode
> TMidasOnline::connect: Connecting to experiment "exo" on host 
> "lxgentor.na.infn.it"
> MVOdb::SetMidasStatus: Error: MIDAS db_get_value() at ODB path "//runinfo/Run 
> number" returned status 312
> Opened output file with name : output00000000.root
> TDT724Waveform done init...... 
> Create Histos
> Create Histos
> TMidasOnline::eventRequest: Event request: buffer "SYSTEM" (2), event id 
> 0xffffffff, trigger mask 0xffffffff, sample 2, request id: 0
> 
> it seems that some function try to get "//runinfo/Run number" (double slash) 
> instead of "/runinfo/Run number"...
> 
> Thanks in advance,
> Gennaro
    Reply  30 May 2020, Gennaro Tortone, Bug Report, wrong run number 
Hi,

thanks a lot for your grep... I temporary fix my local ROOTANA code with this:

diff --git a/libAnalyzer/TRootanaEventLoop.cxx b/libAnalyzer/TRootanaEventLoop.cxx
index 57111b6..90cf384 100644
--- a/libAnalyzer/TRootanaEventLoop.cxx
+++ b/libAnalyzer/TRootanaEventLoop.cxx
@@ -733,7 +733,7 @@ int TRootanaEventLoop::ProcessMidasOnline(TApplication*app, const char* hostname
    /* fill present run parameters */
 
    fCurrentRunNumber = 0;
-   fODB->RI("/runinfo/Run number", &fCurrentRunNumber);
+   fODB->RI("runinfo/Run number", &fCurrentRunNumber);
 
    //   if ((fODB->odbReadInt("/runinfo/State") == 3))
    //startRun(0,gRunNumber,0);

Regards,
Gennaro

> Hi,
> 
> I fixed this particular case, so that I now I get the run number correctly.
> 
> But Konstantin will need to explain how this class is supposed to be used more generally.  The example programs have a mix with sometimes needing leading slashes and other times not:
> 
> Thomass-MacBook-Pro-3:rootana lindner$ grep -s 'runinfo/Run' */*.c*
> libAnalyzer/TRootanaEventLoop.cxx:   fODB->RI("runinfo/Run number", &fCurrentRunNumber);
> manalyzer/manalyzer.cxx:   int run_number = midas->odbReadInt("/runinfo/Run number");
> manalyzer/manalyzer_v0.cxx:   int run_number = midas->odbReadInt("/runinfo/Run number");
> old_analyzer/analyzer.cxx:   gOdb->RI("runinfo/Run number", &gRunNumber);
> 
> Cheers,
> Thomas
> 
> > 
> > Hi,
> > I build MIDAS and ROOTANA using same tag (midas-2020-03-a, rootana-2020-03-a):
> > 
> > if I build examples in ROOTANA I got wrong run number (always 0):
> > 
> > [root@lxgentor examples]# ./ana.exe -r9090
> > 
> > Using THttpServer in read/write mode
> > TMidasOnline::connect: Connecting to experiment "exo" on host 
> > "lxgentor.na.infn.it"
> > MVOdb::SetMidasStatus: Error: MIDAS db_get_value() at ODB path "//runinfo/Run 
> > number" returned status 312
> > Opened output file with name : output00000000.root
> > TDT724Waveform done init...... 
> > Create Histos
> > Create Histos
> > TMidasOnline::eventRequest: Event request: buffer "SYSTEM" (2), event id 
> > 0xffffffff, trigger mask 0xffffffff, sample 2, request id: 0
> > 
> > it seems that some function try to get "//runinfo/Run number" (double slash) 
> > instead of "/runinfo/Run number"...
> > 
> > Thanks in advance,
> > Gennaro
    Reply  03 Jun 2020, Konstantin Olchanski, Bug Report, wrong run number 
> I build MIDAS and ROOTANA using same tag (midas-2020-03-a, rootana-2020-03-a):
>
> MVOdb::SetMidasStatus: Error: MIDAS db_get_value() at ODB path "//runinfo/Run 
> number" returned status 312
>
> it seems that some function try to get "//runinfo/Run number" (double slash) 
> instead of "/runinfo/Run number"...
> 

You made a mistake somewhere.

rootana release rootana-2020-03 uses VirtualOdb, not MVOdb, so there should be no 
messages from "MVOdb". ODB path "/runinfo/run number" is correct for the 
VirtualOdb classes. MVOdb classes use relative paths, absolute path starting from 
"/" is not permitted, hence the error.

You most likely are using the master branch of rootana.

Commit switching rootana from VirtualOdb to mvodb was made after the release 2020-
03, in May:
https://bitbucket.org/tmidas/rootana/commits/522cd07181c59f557e9ef13a70223ec44be44
bc9

(I confirm the incorrect call to RI("/runinfo/..."), Thomas already fixed it in 
the repository, big thanks!).

The dust is not fully settled yet on the refactoring of rootana, until then, I 
recommend that people use the release version(s).

K.O.
    Reply  03 Jun 2020, Konstantin Olchanski, Bug Report, wrong run number 
> 
> But Konstantin will need to explain how this class is supposed to be used more generally.
>

MVOdb is a replacement for VirtualOdb. It has many functions that were missing in VirtualOdb
and it implements access to both XML and JSON ODB dumps.

>  The example programs have a mix with sometimes needing leading slashes and other times not:
> 
> libAnalyzer/TRootanaEventLoop.cxx:   fODB->RI("runinfo/Run number", &fCurrentRunNumber);
> old_analyzer/analyzer.cxx:   gOdb->RI("runinfo/Run number", &gRunNumber);

RI() is MVOdb, no absolute paths, leading "/" not permitted.

> manalyzer/manalyzer.cxx:   int run_number = midas->odbReadInt("/runinfo/Run number");
> manalyzer/manalyzer_v0.cxx:   int run_number = midas->odbReadInt("/runinfo/Run number");

Hmmm... good catch. these are VirtualOdb calls, but they bypass the VirtualOdb interface (which was removed)
and call the odb access methods directly from the TMidasOnline class. They should be replaced
with MVOdb RI() calls (and leading "/" removed).

I was going to look at the TMidasOnline class next - many things need to be updated,
but it will have to wait until I update the MVOdb and the tmfe documentation and until
I update midasio to read and write the new bank32a data files.

K.O.
    Reply  03 Jun 2020, Gennaro Tortone, Bug Report, wrong run number 
> > I build MIDAS and ROOTANA using same tag (midas-2020-03-a, rootana-2020-03-a):
> >
> > MVOdb::SetMidasStatus: Error: MIDAS db_get_value() at ODB path "//runinfo/Run 
> > number" returned status 312
> >
> > it seems that some function try to get "//runinfo/Run number" (double slash) 
> > instead of "/runinfo/Run number"...
> > 
>
> You made a mistake somewhere.

you are right !
I used rootana-2020-03-a instead of release/rootana-2020-03... my fault !

I have to (re)compile MIDAS for the same error...

Thanks !
Gennaro

> 
> rootana release rootana-2020-03 uses VirtualOdb, not MVOdb, so there should be no 
> messages from "MVOdb". ODB path "/runinfo/run number" is correct for the 
> VirtualOdb classes. MVOdb classes use relative paths, absolute path starting from 
> "/" is not permitted, hence the error.
> 
> You most likely are using the master branch of rootana.
> 
> Commit switching rootana from VirtualOdb to mvodb was made after the release 2020-
> 03, in May:
> https://bitbucket.org/tmidas/rootana/commits/522cd07181c59f557e9ef13a70223ec44be44
> bc9
> 
> (I confirm the incorrect call to RI("/runinfo/..."), Thomas already fixed it in 
> the repository, big thanks!).
> 
> The dust is not fully settled yet on the refactoring of rootana, until then, I 
> recommend that people use the release version(s).
> 
> K.O.
    Reply  04 Jun 2020, Konstantin Olchanski, Bug Report, wrong run number 
> > You made a mistake somewhere.
> 
> you are right !
> I used rootana-2020-03-a instead of release/rootana-2020-03... my fault !
> 
> I have to (re)compile MIDAS for the same error...

The MIDAS version, including what branch you have used is reported on the midas "help" page and in 
the odbedit "version" command.

For example my midas reports:
Tue Mar 24 20:54:11 2020 -0700 - midas-2020-03-a-98-g8b462cc9 on branch develop

This version string includes:
date of commit
git tag and commit number (see "git describe")
"-dirty" if you have modified sources ("git status" shows modified files)
and which git branch you have (I have "develop", you should have "release/midas-2020-03")

I am not sure how ROOTANA reports the version and build strings. I shall check..O...

K.O.
Entry  03 Oct 2023, Konstantin Olchanski, Bug Fix, wrong array size after loading xml or json file 
both the xml and the json decoders have a bug (fix pending). loading saved odb 
from xml and json file did not truncate arrays in odb to the size of arrays in 
the file. for example, if /example/double_array has size 20 in odb, but size 5 
in xml or json file, after loading the file, array size is still 20.

this is unexpected: after loading an odb save file we expect odb to return to 
same state as when odb save file was created. we do not expect some arrays to 
have half of their elements restored from file and half their elements left 
unchanged.

save and restore from .odb file does not have this problem.

I think this is a bug and I committed (but did not yet push) a fix for both xml 
and json odb decoder.

I have run this problem while writing the new history panel editor, where 
deleting variables did not work because json rpc db_paste() was not truncating 
any arrays.

I am still finishing up the last few bits of the new history panel editor, and 
there is a bit of time to discuss and comment this odb change before I push it 
to midas.

K.O.
    Reply  15 Apr 2017, Konstantin Olchanski, Bug Report, where to report bugs, stop form odbedit broken 
>
> What is the preferred channel to report potential bugs (elog / bitbucket issues)? 
>

I prefer that bugs be reported on this forum here. Most bugs affect every midas user, so best to notify the 
whole community.

Bitbucket have a nice bug tracking system, but there is a couple of problems:
a) only a couple of people see the bug reports for midas, minimizing probability of fix.
b) bug reports on bitbucket stay on bitbucket, we do not have backups and archives
of bug reports, if tomorrow bitbucket goes belly-up, our bug database goes poof! with them.
c) I can search the bug report on this forum using "grep" (i am sure there is a "find" button
on the bitbucket web page and it finds what I am looking for right away).

So if you have a bug report that others should know about (i.e. the "+" button on the status page does 
not work), I say use this forum.

If you have a bug that you think is unique to you, not interesting to others (i.e. my midas crashes when I 
do X), file it on bitbucket. If you see no activity on the bitbucket for a week or two, repost it here.

K.O.
Entry  13 Nov 2014, Tim Gorringe, Forum, using single frontend with multiple "EQ_POLLED" equipments to generate different data streams  

We have a MIDAS frontend that provides both the readout of raw events 
and the processing of raw events into several distinct derived datasets. 
For one type of derived dataset there is a derived event for 
each raw event. For other types of derived datasets there's a
derived event for every N raw events. We'd like to have the different 
derived event types sent to different buffers / shared memory segments 
and stored in different midas files.

I was thinking of defining a separate equipment for each type of 
derived data. Each equipment would have a different buffer name so
the data would go to different buffers and thereafter to different 
midas files. I was also thinking of defining each equipment as
a "polled event" but with a unique "source ID". I believe the user 
poll_event() function is passed the "source ID" of the equipment
type and therefore could return success/fail based on whether or
not the particular derived event with that source ID is available 
for readout. Each equipment for each derived dataset would have 
a unique readout routine to create and fill the midas databanks 
for that derived event type.

The above scheme is similar to the midas documentation example
of a frontend with a trigger equipment and a scaler equipment
However, the scaler / trigger example uses two different event
types - EQ_POLLED for trigger and EQ_PERIODIC for scaler. I'd like
to use several EQ_POLLED equipments that are distinguished by
their source ID's

Is this a sensible scheme for make different data streams of
different derived event types from a single frontend? Has anyone
tried something similar? 
    Reply  13 Nov 2014, Pierre-Andre Amaudruz, Forum, using single frontend with multiple "EQ_POLLED" equipments to generate different data streams  
Hi Tim,

Multiple Polling equipment are possible, but you may have to balance the polling 
time based on the expected trigger rate for each equipment due to the 
acquisition/processing time of each equipment.

But instead of using the event buffer destination for the dataset selection, you 
could use the trigger mask and the event ID modified at the user code level from 
a single equipment.

Using the macros such as TRIGGER_MASK(pevent), EVENT_ID(pevent) you can modify 
on the fly their assignment. All go through the SYSTEM buffer as usual.

You use the data logger capability of multiple channels to steer the data in 
different files. 
Each logger channel requires a definition of the type of event that you want to 
record. EventID, TriggerMask can in this case be used to select a particular 
type of event.
I used this option and if I recall correctly, the trigger mask is the one you 
want to base your selection upon. This gives you up to 16 channels (bitwise). 
the eventID should remain -1, but it is a valid information from the FEs.

Cheers, PAA


> 
> 
> We have a MIDAS frontend that provides both the readout of raw events 
> and the processing of raw events into several distinct derived datasets. 
> For one type of derived dataset there is a derived event for 
> each raw event. For other types of derived datasets there's a
> derived event for every N raw events. We'd like to have the different 
> derived event types sent to different buffers / shared memory segments 
> and stored in different midas files.
> 
> I was thinking of defining a separate equipment for each type of 
> derived data. Each equipment would have a different buffer name so
> the data would go to different buffers and thereafter to different 
> midas files. I was also thinking of defining each equipment as
> a "polled event" but with a unique "source ID". I believe the user 
> poll_event() function is passed the "source ID" of the equipment
> type and therefore could return success/fail based on whether or
> not the particular derived event with that source ID is available 
> for readout. Each equipment for each derived dataset would have 
> a unique readout routine to create and fill the midas databanks 
> for that derived event type.
> 
> The above scheme is similar to the midas documentation example
> of a frontend with a trigger equipment and a scaler equipment
> However, the scaler / trigger example uses two different event
> types - EQ_POLLED for trigger and EQ_PERIODIC for scaler. I'd like
> to use several EQ_POLLED equipments that are distinguished by
> their source ID's
> 
> Is this a sensible scheme for make different data streams of
> different derived event types from a single frontend? Has anyone
> tried something similar? 
Entry  29 Sep 2020, Amy Roberts, Forum, using python client to start and stop run 
I'm using a python client to start and stop runs, and the following code *appears* 
to set the MIDAS state to "Run"

client.odb_set("/Runinfo/State", 3)

However, it doesn't seem to do other things associated with a run, like start 
accumulating events.

Is there a different way I should start the run from the python client?

Thanks!
    Reply  29 Sep 2020, Ben Smith, Forum, using python client to start and stop run 
The ODB variable "/Runinfo/State" is a symptom of starting/stopping a run, rather than the cause.

In C++, one uses `cm_transition()` to start/stop runs.

In python code you can use the `start_run()` and `stop_run()` functions from `midas.client`: https://bitbucket.org/tmidas/midas/src/00ff089a836100186e9b26b9ca92623e672f0030/python/midas/client.py#lines-793:808
    Reply  06 Oct 2020, Konstantin Olchanski, Forum, using python client to start and stop run 
> The ODB variable "/Runinfo/State" is a symptom of starting/stopping a run, rather than the cause.
> 
> In C++, one uses `cm_transition()` to start/stop runs.
> 
> In python code you can use the `start_run()` and `stop_run()` functions from `midas.client`: https://bitbucket.org/tmidas/midas/src/00ff089a836100186e9b26b9ca92623e672f0030/python/midas/client.py#lines-793:808

one can also run an external command: "mtransition START" and "mtransition STOP"

K.O.
    Reply  29 Dec 2010, Konstantin Olchanski, Bug Report, use of nested locks in MIDAS 
A "nested" or "recursive" lock is a special type of lock that permits a lock holder to lock the same resources again and again, without deadlocking on itself. They are 
very useful, but tricky to implement because most system lock primitives (SYSV semaphores, POSIX mutexes, etc) do not permit nested locks, so all the logic for 
"yes, I am the holder of the lock, yes, I can go ahead without taking it again" (plus the reverse on unlocking) has to be done "by hand". As ever, if implemented 
wrong or used wrong, Bad Things happen. Many people dislike nested locks because of the added complexity, but realistically, it is impossible to build a system 
that does not require nested locking at least somewhere.

MIDAS lock primitives - ss_semaphore_wait_for(), db_lock_database() and bm_lock_buffer() implement a type of nested locks.

ODB locks implemented in db_lock_database() fully support nested (recursive) locking and this feature is heavily used by the ODB library. Many ODB db_xxx() 
functions take the ODB lock, do something, then call another ODB function that also takes the ODB lock recursively. This works well.

Unfortunately, the ODB nested lock implementation is NOT thread-safe. (Unless one is connected through the mserver, in which case, db_xxx() functions ARE 
thread-safe because all ODB access is serialized by the mserver RPC mutex).

Event buffer locks implemented in bm_lock_buffer() rely on ss_semaphore_xxx() to provide nested locking.

ss_semaphore_wait_for() uses SYSV semaphores, which do not provide nested locking, except when called from cm_watchdog(). (keep reading).

Because bm_lock_buffer() does not implement nested locking, use of cm_msg() in buffer management code will lead to self-deadlock, as shown in the following 
stack trace, where bm_cleanup() is working on the SYSMSG buffer, locked it, then called cm_msg() which is now waiting on the SYSMSG lock, which we are holding 
ourselves.

(gdb) where
#0  0x00007fff87274e9e in semop ()
#1  0x0000000100024075 in ss_semaphore_wait_for (semaphore_handle=1179654, timeout=300000) at src/system.c:2280
#2  0x0000000100015292 in bm_lock_buffer (buffer_handle=<value temporarily unavailable, due to optimizations>) at src/midas.c:5386
#3  0x000000010000df97 in bm_send_event (buffer_handle=1, source=0x7fff5fbfd430, buf_size=<value temporarily unavailable, due to optimizations>, 
async_flag=0) at src/midas.c:6484
#4  0x000000010000e6f5 in cm_msg (message_type=2, filename=<value temporarily unavailable, due to optimizations>, line=4226, routine=0x10004559f 
"bm_cleanup", format=0x100045550 "Client '%s' on buffer '%s' removed by %s because process pid %d does not exist") at src/midas.c:722
#5  0x000000010001553c in bm_cleanup_buffer_locked (i=<value temporarily unavailable, due to optimizations>, who=0x100045f42 "bm_open_buffer", 
actual_time=869425784) at src/midas.c:4226
#6  0x00000001000167ee in bm_cleanup (who=0x100045f42 "bm_open_buffer", actual_time=869425784, wrong_interval=0) at src/midas.c:4286
#7  0x000000010001ae27 in bm_open_buffer (buffer_name=<value temporarily unavailable, due to optimizations>, buffer_size=100000, 
buffer_handle=0x10006e9ac) at src/midas.c:4550
#8  0x000000010001ae90 in cm_msg_register (func=0x100000c60 <process_message>) at src/midas.c:895
#9  0x0000000100009a13 in main (argc=3, argv=0x7fff5fbff3d8) at src/odbedit.c:2790

This example deadlock is not a normal code path - I accidentally exposed this deadlock sequence by adding some extra locking.

But in normal use, cm_msg() is called quite often from cm_watchdog() and as protection against this type of deadlock, MIDAS
ss_semaphore_xxx() has a special case that permits one level of nesting for locks called by code executed from cm_watchdog(). This is a very
clever implementation of partial nested locking.

So again, we are running into problems with cm_msg() - logically it should be at the very bottom of the system hierarchy - everybody calls it from their most 
delicate places, while holding various locks, etc - but instead, cm_msg() call the whole MIDAS system all over again - it calls ODB functions, event buffer functions, 
etc - mostly to open and to write into the SYSMSG buffer.

If you are reading this, I hope you are getting a better idea of the difference between textbook systems and systems that are used in the field to get some work 
done.

K.O.
Entry  25 Nov 2004, chris pearson, Forum, use of assert in mhttpd 
   We've had mhttpd aborting regularly since upgrading from midas-1.9.3.  This
happens during elog queries, and is due to an elog file that was incorrectly
modified by hand.  The modification to the file occurred 6 months ago.
   el_retrieve(midas.c:15683) now has several assert statements, one of which
aborts the program on reading the bad entry.

   Why is assert used, instead of an error return from the function (if
necessary), and maybe an error message in the log file?  Assert statements are
often removed, using NDEBUG, for normal use.

Chris

   The problem elog entry had one character removed, so end-of-file came before
the end of the message.  This could probably occur without the file being
altered, if the disk containing the elog fills.
ELOG V3.1.4-2e1708b5