Back Midas Rome Roody Rootana
  Midas DAQ System, Page 1 of 121  Not logged in ELOG logo
ID Date Author Topic Subjectdown
  2350   03 Mar 2022 Konstantin OlchanskiInfozlib required, lz4 internal
as of commit 8eb18e4ae9c57a8a802219b90d4dc218eb8fdefb, the gzip compression
library is required, not optional.

this fixes midas and manalyzer mis-build if the system gzip library
is accidentally not installed. (is there any situation where
gzip library is not installed on purpose?)

midas internal lz4 compression library was renamed to mlz4 to avoid collision
against system lz4 library (where present). lz4 files from midasio are now
used, lz4 files in midas/include and midas/src are removed.

I see that on recent versions of ubuntu we could switch to the system version 
of the lz4 library. however, on centos-7 systems it is usually not present
and it still is a supported and widely used platform, so we stay
with the midas-internal library for now.

K.O.
  338   05 Feb 2007 Konstantin OlchanskiBug Reportwrong version in include/midas.h?
The present .../include/midas.h contains
[alpha@laddvme06 ~/online]$ grep 1.9.5 /home/alpha/packages/midas/include/*
/home/alpha/packages/midas/include/midas.h:#define MIDAS_VERSION "1.9.5"

All MIDAS utilities (odbedit ver) presently report version 1.9.5, even for svn
trunk, and this may confuse people as to what version of midas they are using,
and may complicate reporting of bugs.

Perhaps the trunk version should say something like "svn-22233344" (the svn
revision number)? The present "1.9.5" is wrong...

K.O.
  339   06 Feb 2007 Stefan RittBug Reportwrong version in include/midas.h?
> The present .../include/midas.h contains
> [alpha@laddvme06 ~/online]$ grep 1.9.5 /home/alpha/packages/midas/include/*
> /home/alpha/packages/midas/include/midas.h:#define MIDAS_VERSION "1.9.5"
> 
> All MIDAS utilities (odbedit ver) presently report version 1.9.5, even for svn
> trunk, and this may confuse people as to what version of midas they are using,
> and may complicate reporting of bugs.
> 
> Perhaps the trunk version should say something like "svn-22233344" (the svn
> revision number)? The present "1.9.5" is wrong...

Fully agree. I added a svn_revision string into midas.h, which gets reported now
by "odbedit ver". Unfortunately this reflects only changes in midas.c. If one
changes odb.c for example, the svn revision in midas.c does not get modified by
the SVN system. In addition I changed the present version 1.9.5 to 2.0.0. I made
the tar and zip files. After some internal testing, it will be announced
officially in a few days.
  1923   30 May 2020 Gennaro TortoneBug Reportwrong run number
Hi,
I build MIDAS and ROOTANA using same tag (midas-2020-03-a, rootana-2020-03-a):

if I build examples in ROOTANA I got wrong run number (always 0):

[root@lxgentor examples]# ./ana.exe -r9090

Using THttpServer in read/write mode
TMidasOnline::connect: Connecting to experiment "exo" on host 
"lxgentor.na.infn.it"
MVOdb::SetMidasStatus: Error: MIDAS db_get_value() at ODB path "//runinfo/Run 
number" returned status 312
Opened output file with name : output00000000.root
TDT724Waveform done init...... 
Create Histos
Create Histos
TMidasOnline::eventRequest: Event request: buffer "SYSTEM" (2), event id 
0xffffffff, trigger mask 0xffffffff, sample 2, request id: 0

it seems that some function try to get "//runinfo/Run number" (double slash) 
instead of "/runinfo/Run number"...

Thanks in advance,
Gennaro
  1925   30 May 2020 Thomas LindnerBug Reportwrong run number
Hi,

I fixed this particular case, so that I now I get the run number correctly.

But Konstantin will need to explain how this class is supposed to be used more generally.  The example programs have a mix with sometimes needing leading slashes and other times not:

Thomass-MacBook-Pro-3:rootana lindner$ grep -s 'runinfo/Run' */*.c*
libAnalyzer/TRootanaEventLoop.cxx:   fODB->RI("runinfo/Run number", &fCurrentRunNumber);
manalyzer/manalyzer.cxx:   int run_number = midas->odbReadInt("/runinfo/Run number");
manalyzer/manalyzer_v0.cxx:   int run_number = midas->odbReadInt("/runinfo/Run number");
old_analyzer/analyzer.cxx:   gOdb->RI("runinfo/Run number", &gRunNumber);

Cheers,
Thomas

> 
> Hi,
> I build MIDAS and ROOTANA using same tag (midas-2020-03-a, rootana-2020-03-a):
> 
> if I build examples in ROOTANA I got wrong run number (always 0):
> 
> [root@lxgentor examples]# ./ana.exe -r9090
> 
> Using THttpServer in read/write mode
> TMidasOnline::connect: Connecting to experiment "exo" on host 
> "lxgentor.na.infn.it"
> MVOdb::SetMidasStatus: Error: MIDAS db_get_value() at ODB path "//runinfo/Run 
> number" returned status 312
> Opened output file with name : output00000000.root
> TDT724Waveform done init...... 
> Create Histos
> Create Histos
> TMidasOnline::eventRequest: Event request: buffer "SYSTEM" (2), event id 
> 0xffffffff, trigger mask 0xffffffff, sample 2, request id: 0
> 
> it seems that some function try to get "//runinfo/Run number" (double slash) 
> instead of "/runinfo/Run number"...
> 
> Thanks in advance,
> Gennaro
  1926   30 May 2020 Gennaro TortoneBug Reportwrong run number
Hi,

thanks a lot for your grep... I temporary fix my local ROOTANA code with this:

diff --git a/libAnalyzer/TRootanaEventLoop.cxx b/libAnalyzer/TRootanaEventLoop.cxx
index 57111b6..90cf384 100644
--- a/libAnalyzer/TRootanaEventLoop.cxx
+++ b/libAnalyzer/TRootanaEventLoop.cxx
@@ -733,7 +733,7 @@ int TRootanaEventLoop::ProcessMidasOnline(TApplication*app, const char* hostname
    /* fill present run parameters */
 
    fCurrentRunNumber = 0;
-   fODB->RI("/runinfo/Run number", &fCurrentRunNumber);
+   fODB->RI("runinfo/Run number", &fCurrentRunNumber);
 
    //   if ((fODB->odbReadInt("/runinfo/State") == 3))
    //startRun(0,gRunNumber,0);

Regards,
Gennaro

> Hi,
> 
> I fixed this particular case, so that I now I get the run number correctly.
> 
> But Konstantin will need to explain how this class is supposed to be used more generally.  The example programs have a mix with sometimes needing leading slashes and other times not:
> 
> Thomass-MacBook-Pro-3:rootana lindner$ grep -s 'runinfo/Run' */*.c*
> libAnalyzer/TRootanaEventLoop.cxx:   fODB->RI("runinfo/Run number", &fCurrentRunNumber);
> manalyzer/manalyzer.cxx:   int run_number = midas->odbReadInt("/runinfo/Run number");
> manalyzer/manalyzer_v0.cxx:   int run_number = midas->odbReadInt("/runinfo/Run number");
> old_analyzer/analyzer.cxx:   gOdb->RI("runinfo/Run number", &gRunNumber);
> 
> Cheers,
> Thomas
> 
> > 
> > Hi,
> > I build MIDAS and ROOTANA using same tag (midas-2020-03-a, rootana-2020-03-a):
> > 
> > if I build examples in ROOTANA I got wrong run number (always 0):
> > 
> > [root@lxgentor examples]# ./ana.exe -r9090
> > 
> > Using THttpServer in read/write mode
> > TMidasOnline::connect: Connecting to experiment "exo" on host 
> > "lxgentor.na.infn.it"
> > MVOdb::SetMidasStatus: Error: MIDAS db_get_value() at ODB path "//runinfo/Run 
> > number" returned status 312
> > Opened output file with name : output00000000.root
> > TDT724Waveform done init...... 
> > Create Histos
> > Create Histos
> > TMidasOnline::eventRequest: Event request: buffer "SYSTEM" (2), event id 
> > 0xffffffff, trigger mask 0xffffffff, sample 2, request id: 0
> > 
> > it seems that some function try to get "//runinfo/Run number" (double slash) 
> > instead of "/runinfo/Run number"...
> > 
> > Thanks in advance,
> > Gennaro
  1927   03 Jun 2020 Konstantin OlchanskiBug Reportwrong run number
> I build MIDAS and ROOTANA using same tag (midas-2020-03-a, rootana-2020-03-a):
>
> MVOdb::SetMidasStatus: Error: MIDAS db_get_value() at ODB path "//runinfo/Run 
> number" returned status 312
>
> it seems that some function try to get "//runinfo/Run number" (double slash) 
> instead of "/runinfo/Run number"...
> 

You made a mistake somewhere.

rootana release rootana-2020-03 uses VirtualOdb, not MVOdb, so there should be no 
messages from "MVOdb". ODB path "/runinfo/run number" is correct for the 
VirtualOdb classes. MVOdb classes use relative paths, absolute path starting from 
"/" is not permitted, hence the error.

You most likely are using the master branch of rootana.

Commit switching rootana from VirtualOdb to mvodb was made after the release 2020-
03, in May:
https://bitbucket.org/tmidas/rootana/commits/522cd07181c59f557e9ef13a70223ec44be44
bc9

(I confirm the incorrect call to RI("/runinfo/..."), Thomas already fixed it in 
the repository, big thanks!).

The dust is not fully settled yet on the refactoring of rootana, until then, I 
recommend that people use the release version(s).

K.O.
  1928   03 Jun 2020 Konstantin OlchanskiBug Reportwrong run number
> 
> But Konstantin will need to explain how this class is supposed to be used more generally.
>

MVOdb is a replacement for VirtualOdb. It has many functions that were missing in VirtualOdb
and it implements access to both XML and JSON ODB dumps.

>  The example programs have a mix with sometimes needing leading slashes and other times not:
> 
> libAnalyzer/TRootanaEventLoop.cxx:   fODB->RI("runinfo/Run number", &fCurrentRunNumber);
> old_analyzer/analyzer.cxx:   gOdb->RI("runinfo/Run number", &gRunNumber);

RI() is MVOdb, no absolute paths, leading "/" not permitted.

> manalyzer/manalyzer.cxx:   int run_number = midas->odbReadInt("/runinfo/Run number");
> manalyzer/manalyzer_v0.cxx:   int run_number = midas->odbReadInt("/runinfo/Run number");

Hmmm... good catch. these are VirtualOdb calls, but they bypass the VirtualOdb interface (which was removed)
and call the odb access methods directly from the TMidasOnline class. They should be replaced
with MVOdb RI() calls (and leading "/" removed).

I was going to look at the TMidasOnline class next - many things need to be updated,
but it will have to wait until I update the MVOdb and the tmfe documentation and until
I update midasio to read and write the new bank32a data files.

K.O.
  1929   03 Jun 2020 Gennaro TortoneBug Reportwrong run number
> > I build MIDAS and ROOTANA using same tag (midas-2020-03-a, rootana-2020-03-a):
> >
> > MVOdb::SetMidasStatus: Error: MIDAS db_get_value() at ODB path "//runinfo/Run 
> > number" returned status 312
> >
> > it seems that some function try to get "//runinfo/Run number" (double slash) 
> > instead of "/runinfo/Run number"...
> > 
>
> You made a mistake somewhere.

you are right !
I used rootana-2020-03-a instead of release/rootana-2020-03... my fault !

I have to (re)compile MIDAS for the same error...

Thanks !
Gennaro

> 
> rootana release rootana-2020-03 uses VirtualOdb, not MVOdb, so there should be no 
> messages from "MVOdb". ODB path "/runinfo/run number" is correct for the 
> VirtualOdb classes. MVOdb classes use relative paths, absolute path starting from 
> "/" is not permitted, hence the error.
> 
> You most likely are using the master branch of rootana.
> 
> Commit switching rootana from VirtualOdb to mvodb was made after the release 2020-
> 03, in May:
> https://bitbucket.org/tmidas/rootana/commits/522cd07181c59f557e9ef13a70223ec44be44
> bc9
> 
> (I confirm the incorrect call to RI("/runinfo/..."), Thomas already fixed it in 
> the repository, big thanks!).
> 
> The dust is not fully settled yet on the refactoring of rootana, until then, I 
> recommend that people use the release version(s).
> 
> K.O.
  1937   04 Jun 2020 Konstantin OlchanskiBug Reportwrong run number
> > You made a mistake somewhere.
> 
> you are right !
> I used rootana-2020-03-a instead of release/rootana-2020-03... my fault !
> 
> I have to (re)compile MIDAS for the same error...

The MIDAS version, including what branch you have used is reported on the midas "help" page and in 
the odbedit "version" command.

For example my midas reports:
Tue Mar 24 20:54:11 2020 -0700 - midas-2020-03-a-98-g8b462cc9 on branch develop

This version string includes:
date of commit
git tag and commit number (see "git describe")
"-dirty" if you have modified sources ("git status" shows modified files)
and which git branch you have (I have "develop", you should have "release/midas-2020-03")

I am not sure how ROOTANA reports the version and build strings. I shall check..O...

K.O.
  1268   15 Apr 2017 Konstantin OlchanskiBug Reportwhere to report bugs, stop form odbedit broken
>
> What is the preferred channel to report potential bugs (elog / bitbucket issues)? 
>

I prefer that bugs be reported on this forum here. Most bugs affect every midas user, so best to notify the 
whole community.

Bitbucket have a nice bug tracking system, but there is a couple of problems:
a) only a couple of people see the bug reports for midas, minimizing probability of fix.
b) bug reports on bitbucket stay on bitbucket, we do not have backups and archives
of bug reports, if tomorrow bitbucket goes belly-up, our bug database goes poof! with them.
c) I can search the bug report on this forum using "grep" (i am sure there is a "find" button
on the bitbucket web page and it finds what I am looking for right away).

So if you have a bug report that others should know about (i.e. the "+" button on the status page does 
not work), I say use this forum.

If you have a bug that you think is unique to you, not interesting to others (i.e. my midas crashes when I 
do X), file it on bitbucket. If you see no activity on the bitbucket for a week or two, repost it here.

K.O.
  1039   13 Nov 2014 Tim GorringeForumusing single frontend with multiple "EQ_POLLED" equipments to generate different data streams

We have a MIDAS frontend that provides both the readout of raw events 
and the processing of raw events into several distinct derived datasets. 
For one type of derived dataset there is a derived event for 
each raw event. For other types of derived datasets there's a
derived event for every N raw events. We'd like to have the different 
derived event types sent to different buffers / shared memory segments 
and stored in different midas files.

I was thinking of defining a separate equipment for each type of 
derived data. Each equipment would have a different buffer name so
the data would go to different buffers and thereafter to different 
midas files. I was also thinking of defining each equipment as
a "polled event" but with a unique "source ID". I believe the user 
poll_event() function is passed the "source ID" of the equipment
type and therefore could return success/fail based on whether or
not the particular derived event with that source ID is available 
for readout. Each equipment for each derived dataset would have 
a unique readout routine to create and fill the midas databanks 
for that derived event type.

The above scheme is similar to the midas documentation example
of a frontend with a trigger equipment and a scaler equipment
However, the scaler / trigger example uses two different event
types - EQ_POLLED for trigger and EQ_PERIODIC for scaler. I'd like
to use several EQ_POLLED equipments that are distinguished by
their source ID's

Is this a sensible scheme for make different data streams of
different derived event types from a single frontend? Has anyone
tried something similar? 
  1040   13 Nov 2014 Pierre-Andre AmaudruzForumusing single frontend with multiple "EQ_POLLED" equipments to generate different data streams
Hi Tim,

Multiple Polling equipment are possible, but you may have to balance the polling 
time based on the expected trigger rate for each equipment due to the 
acquisition/processing time of each equipment.

But instead of using the event buffer destination for the dataset selection, you 
could use the trigger mask and the event ID modified at the user code level from 
a single equipment.

Using the macros such as TRIGGER_MASK(pevent), EVENT_ID(pevent) you can modify 
on the fly their assignment. All go through the SYSTEM buffer as usual.

You use the data logger capability of multiple channels to steer the data in 
different files. 
Each logger channel requires a definition of the type of event that you want to 
record. EventID, TriggerMask can in this case be used to select a particular 
type of event.
I used this option and if I recall correctly, the trigger mask is the one you 
want to base your selection upon. This gives you up to 16 channels (bitwise). 
the eventID should remain -1, but it is a valid information from the FEs.

Cheers, PAA


> 
> 
> We have a MIDAS frontend that provides both the readout of raw events 
> and the processing of raw events into several distinct derived datasets. 
> For one type of derived dataset there is a derived event for 
> each raw event. For other types of derived datasets there's a
> derived event for every N raw events. We'd like to have the different 
> derived event types sent to different buffers / shared memory segments 
> and stored in different midas files.
> 
> I was thinking of defining a separate equipment for each type of 
> derived data. Each equipment would have a different buffer name so
> the data would go to different buffers and thereafter to different 
> midas files. I was also thinking of defining each equipment as
> a "polled event" but with a unique "source ID". I believe the user 
> poll_event() function is passed the "source ID" of the equipment
> type and therefore could return success/fail based on whether or
> not the particular derived event with that source ID is available 
> for readout. Each equipment for each derived dataset would have 
> a unique readout routine to create and fill the midas databanks 
> for that derived event type.
> 
> The above scheme is similar to the midas documentation example
> of a frontend with a trigger equipment and a scaler equipment
> However, the scaler / trigger example uses two different event
> types - EQ_POLLED for trigger and EQ_PERIODIC for scaler. I'd like
> to use several EQ_POLLED equipments that are distinguished by
> their source ID's
> 
> Is this a sensible scheme for make different data streams of
> different derived event types from a single frontend? Has anyone
> tried something similar? 
  2000   29 Sep 2020 Amy RobertsForumusing python client to start and stop run
I'm using a python client to start and stop runs, and the following code *appears* 
to set the MIDAS state to "Run"

client.odb_set("/Runinfo/State", 3)

However, it doesn't seem to do other things associated with a run, like start 
accumulating events.

Is there a different way I should start the run from the python client?

Thanks!
  2001   29 Sep 2020 Ben SmithForumusing python client to start and stop run
The ODB variable "/Runinfo/State" is a symptom of starting/stopping a run, rather than the cause.

In C++, one uses `cm_transition()` to start/stop runs.

In python code you can use the `start_run()` and `stop_run()` functions from `midas.client`: https://bitbucket.org/tmidas/midas/src/00ff089a836100186e9b26b9ca92623e672f0030/python/midas/client.py#lines-793:808
  2002   06 Oct 2020 Konstantin OlchanskiForumusing python client to start and stop run
> The ODB variable "/Runinfo/State" is a symptom of starting/stopping a run, rather than the cause.
> 
> In C++, one uses `cm_transition()` to start/stop runs.
> 
> In python code you can use the `start_run()` and `stop_run()` functions from `midas.client`: https://bitbucket.org/tmidas/midas/src/00ff089a836100186e9b26b9ca92623e672f0030/python/midas/client.py#lines-793:808

one can also run an external command: "mtransition START" and "mtransition STOP"

K.O.
  738   29 Dec 2010 Konstantin OlchanskiBug Reportuse of nested locks in MIDAS
A "nested" or "recursive" lock is a special type of lock that permits a lock holder to lock the same resources again and again, without deadlocking on itself. They are 
very useful, but tricky to implement because most system lock primitives (SYSV semaphores, POSIX mutexes, etc) do not permit nested locks, so all the logic for 
"yes, I am the holder of the lock, yes, I can go ahead without taking it again" (plus the reverse on unlocking) has to be done "by hand". As ever, if implemented 
wrong or used wrong, Bad Things happen. Many people dislike nested locks because of the added complexity, but realistically, it is impossible to build a system 
that does not require nested locking at least somewhere.

MIDAS lock primitives - ss_semaphore_wait_for(), db_lock_database() and bm_lock_buffer() implement a type of nested locks.

ODB locks implemented in db_lock_database() fully support nested (recursive) locking and this feature is heavily used by the ODB library. Many ODB db_xxx() 
functions take the ODB lock, do something, then call another ODB function that also takes the ODB lock recursively. This works well.

Unfortunately, the ODB nested lock implementation is NOT thread-safe. (Unless one is connected through the mserver, in which case, db_xxx() functions ARE 
thread-safe because all ODB access is serialized by the mserver RPC mutex).

Event buffer locks implemented in bm_lock_buffer() rely on ss_semaphore_xxx() to provide nested locking.

ss_semaphore_wait_for() uses SYSV semaphores, which do not provide nested locking, except when called from cm_watchdog(). (keep reading).

Because bm_lock_buffer() does not implement nested locking, use of cm_msg() in buffer management code will lead to self-deadlock, as shown in the following 
stack trace, where bm_cleanup() is working on the SYSMSG buffer, locked it, then called cm_msg() which is now waiting on the SYSMSG lock, which we are holding 
ourselves.

(gdb) where
#0  0x00007fff87274e9e in semop ()
#1  0x0000000100024075 in ss_semaphore_wait_for (semaphore_handle=1179654, timeout=300000) at src/system.c:2280
#2  0x0000000100015292 in bm_lock_buffer (buffer_handle=<value temporarily unavailable, due to optimizations>) at src/midas.c:5386
#3  0x000000010000df97 in bm_send_event (buffer_handle=1, source=0x7fff5fbfd430, buf_size=<value temporarily unavailable, due to optimizations>, 
async_flag=0) at src/midas.c:6484
#4  0x000000010000e6f5 in cm_msg (message_type=2, filename=<value temporarily unavailable, due to optimizations>, line=4226, routine=0x10004559f 
"bm_cleanup", format=0x100045550 "Client '%s' on buffer '%s' removed by %s because process pid %d does not exist") at src/midas.c:722
#5  0x000000010001553c in bm_cleanup_buffer_locked (i=<value temporarily unavailable, due to optimizations>, who=0x100045f42 "bm_open_buffer", 
actual_time=869425784) at src/midas.c:4226
#6  0x00000001000167ee in bm_cleanup (who=0x100045f42 "bm_open_buffer", actual_time=869425784, wrong_interval=0) at src/midas.c:4286
#7  0x000000010001ae27 in bm_open_buffer (buffer_name=<value temporarily unavailable, due to optimizations>, buffer_size=100000, 
buffer_handle=0x10006e9ac) at src/midas.c:4550
#8  0x000000010001ae90 in cm_msg_register (func=0x100000c60 <process_message>) at src/midas.c:895
#9  0x0000000100009a13 in main (argc=3, argv=0x7fff5fbff3d8) at src/odbedit.c:2790

This example deadlock is not a normal code path - I accidentally exposed this deadlock sequence by adding some extra locking.

But in normal use, cm_msg() is called quite often from cm_watchdog() and as protection against this type of deadlock, MIDAS
ss_semaphore_xxx() has a special case that permits one level of nesting for locks called by code executed from cm_watchdog(). This is a very
clever implementation of partial nested locking.

So again, we are running into problems with cm_msg() - logically it should be at the very bottom of the system hierarchy - everybody calls it from their most 
delicate places, while holding various locks, etc - but instead, cm_msg() call the whole MIDAS system all over again - it calls ODB functions, event buffer functions, 
etc - mostly to open and to write into the SYSMSG buffer.

If you are reading this, I hope you are getting a better idea of the difference between textbook systems and systems that are used in the field to get some work 
done.

K.O.
  176   25 Nov 2004 chris pearsonForumuse of assert in mhttpd
   We've had mhttpd aborting regularly since upgrading from midas-1.9.3.  This
happens during elog queries, and is due to an elog file that was incorrectly
modified by hand.  The modification to the file occurred 6 months ago.
   el_retrieve(midas.c:15683) now has several assert statements, one of which
aborts the program on reading the bad entry.

   Why is assert used, instead of an error return from the function (if
necessary), and maybe an error message in the log file?  Assert statements are
often removed, using NDEBUG, for normal use.

Chris

   The problem elog entry had one character removed, so end-of-file came before
the end of the message.  This could probably occur without the file being
altered, if the disk containing the elog fills.
  177   14 Dec 2004 Konstantin OlchanskiForumuse of assert in mhttpd
>    We've had mhttpd aborting regularly since upgrading from midas-1.9.3.  This
> happens during elog queries, and is due to an elog file that was incorrectly
> modified by hand.

(sorry for delayed reply, for reasons unknown, I did not get an email notice when this was posted)

Yes, I agree, error handling in midas elog code is insufficient (note missing error checks for
read() and lseek() system calls). Anything but "perfect" elog files would cause funny errors and
malfunctions.

>  The modification to the file occurred 6 months ago.
>    el_retrieve(midas.c:15683) now has several assert statements, one of which
> aborts the program on reading the bad entry.

I added those to fix problems with "broken last NN days" and with infinite looping in the elog code
that we observed in TWIST.

You are welcome to replace the assert() statements with proper error handling. I used to have some code
that could report the filename of the bad elog file. Can we also report the exact file location for broken
files.

Please send me the diff, I will commit it to midas cvs.

>    Why is assert used, instead of an error return from the function (if
> necessary), and maybe an error message in the log file?  Assert statements are
> often removed, using NDEBUG, for normal use.

I use assert() in several ways:

0) I want a core dump each time X happens. (This is the only reasonable action when facing memory/stack
corruption. The problems in the elog code were stack corruption).
1) "I am too lazy to write proper error handling code" so I just crash and burn. This includes the
case where "proper error handling" would be "too invasive".
2) the error is too bad (or too deep) and there is no reasonable way to recover. Print an error message
and dump core (for later analysis). I sometimes use "cm_msg(); abort()". (assert is "printf("error"); abort()")

Please refer to literature for philosophic discussions on uses of assert() (Argh! Stefan will have my
head again!), but I will mention that "abort() early, abort() often" I find very effective. BTW, this technique
is heavily used in the Linux kernel (oops(), bug(), panic()) with some good effect, too.

>    The problem elog entry had one character removed, so end-of-file came before
> the end of the message.  This could probably occur without the file being
> altered, if the disk containing the elog fills.

Yes, I think you are right. In TWIST, we have seen disk-full conditions break both elog and history.

K.O.
  141   26 Jul 2003 Konstantin Olchanski use "odbedit -C" to connect to corrupted ODB
Add switch "-C" to odbedit to allow it to connect to corrupted ODB. Then,
depending on corruption, the user can manually remove or correct the
corrupted entries. Also, some corruption is automatically fixed by "odbedit"
itself. I use this functionality to debug and fix broken ODBs.

K.O.

For your enjoyment, here is the diff:

diff -r1.64 odbedit.c
3058a3059
> BOOL          corrupted;
3063c3064
<   debug = cmd_mode = FALSE;
---
>   debug = corrupted = cmd_mode = FALSE;
3077a3079,3080
>     else if (argv[i][0] == '-' && argv[i][1] == 'C')
>       corrupted = TRUE;
3104c3107,3108
<         printf("               [-c Command] [-c @CommandFile] [-s size]
[-g (debug)]\n\n");
---
>         printf("               [-c Command] [-c @CommandFile] [-s size]\n");
>         printf("               [-g (debug)] [-C (connect to corrupted
ODB)]\n\n");
3123c3127,3133
<   if (status != CM_SUCCESS)
---
>   else if ((status == DB_INVALID_HANDLE)&&corrupted)
>     {
>     cm_get_error(status, str);
>     puts(str);
>     printf("ODB is corrupted, connecting anyway...\n");
>     }
>   else if (status != CM_SUCCESS)
ELOG V3.1.4-2e1708b5