Back Midas Rome Roody Rootana
  Midas DAQ System, Page 151 of 158  Not logged in ELOG logo
Entry  01 Apr 2025, Konstantin Olchanski, Bug Fix, ODB and event buffer - release semaphore before abort() and core dump 
There is a long standing problem with ODB and event buffers. If they detect an 
internal data inconsistency and cannot continue running, they call abort() to 
dump core and stop.

Problem is in some code paths, they do this while holding the ODB or event 
buffer semaphore. (Linux kernel automatically releases SYSV semaphores after 
core dump is finished and program holding them is stopped).

If core dump takes longer than 10 seconds (for whatever reason, but we see this 
often enough), all other programs that wait for ODB or event buffer access, will 
also timeout and also crash (with core dump). Result is a core dump storm, at 
the end all MIDAS programs are crashed. (Luckily recovery is easy, simply 
restart everything).

Now I realize that in many situation, we do not need to hold the semaphore while 
dumping core - the content of ODB and event buffer shared memories is not 
important for debugging the crash - and it is safe to release the semaphore 
before calling abort().

This is now implemented for ODB and event buffers. Hopefully core dump storms 
will not happen again.

commit 96369c29deba1752fd3d25bed53e6594773d7e1a
release ODB semaphore before calling abort() to dump core. if core dump takes 
longer than 10 sec all other midas programs will timeout and crash.

commit 2506406813f1e7581572f0d5721d3761b7c8e8dd
unlock event buffer before calling abort() in bm_validate_client_index_locked(), 
refactor bm_get_my_client_locked()


K.O.
Entry  05 May 2025, Konstantin Olchanski, Bug Fix, Bug fix in SQL history 
A bug was introduced to the SQL history in 2022 that made renaming of variable names not work. This is now fixed.

break commit:
54bbc9ed5d65d8409e8c9fe60b024e99c9f34a85
fix commit:
159d8d3912c8c92da7d6d674321c8a26b7ba68d4

P.S.

This problem was caused by an unfortunate design of the c++ class system. If I want to add more data to an existing 
class, I write this:

class old_class {
int i,j,k;
}

class bigger_class: public old_class {
int additional_variable;
}

But if I have this:

struct x { int i,j; }

class y {
std::vector<x> array_of_x;
}

and I want to add "k" to "x", c++ has not way to do this. history code has this workaround:

class bigger_y: public y
{
std::vector<int> array_of_k;
}

int bigger_y:foo(int n) {
printf("%d %d %d\", array_of_x[n].i, array_of_x[n].j, array_of_k[n]);
}

problem is that it is not obvious that "array_of_x" and "array_of_k" are connected
and they can easily get out of sync (if elements are added or removed). this is the
bug that happened in the history code. I now added assert(array_of_x.size()==array_of_k.size())
to offer at least some protection going forward.

P.S. As final solution I think I want to completely separate file history and sql history code,
they have more things different than common.

K.O.
Entry  24 Jul 2025, Konstantin Olchanski, Bug Fix, support for large history files 
FILE history code (mhf_*.dat files) did not support reading history files bigger than about 2GB, this is now 
fixed on branch "feature/history_off64_t" (in final testing, to be merged ASAP).

History files were never meant to get bigger than about 100 MBytes, but it turns out large files can still 
happen:

1) files are rotated only when history is closed and reopened
2) we removed history close and open on run start
3) so files are rotated only when mlogger is restarted

In the old code, large files would still happen if some equipment writes a lot of data (I have a file from 
Stefan with history record size about 64 kbytes, written at 1/second, MIDAS handles this just fine) or if 
there is no runs started and stopped for a long time.

There are reasons for keeping file size smaller:

a) I would like to use mmap() to read history files, and mmap() of a 100 Gbyte file on a 64 Gbyte RAM 
machine would not work very well.
b) I would like to implement compressed history files and decompression of a 100 Gbyte file takes much 
longer than decompression of a 100 Mbyte file. it is better if data is in smaller chunks.

(it is easy to write a utility program to break-up large history files into smaller chunks).

Why use mmap()? I note that the current code does 1 read() syscall per history record (it is much better to 
read data in bigger chunks) and does multiple seek()/read() syscalls to find the right place in the history 
file (plays silly buggers with the OS read-ahead and data caching). mmap() eliminates all syscalls and has 
the potential to speed things up quite a bit.

K.O.
Entry  20 Nov 2025, Konstantin Olchanski, Bug Fix, ODB update, branch feature/db_delete_key merged into develop 
In the darkside vertical slice midas daq, we observed odb corruption which I 
traced to db_delete_key(). cause of corruption is not important. important is to 
have a robust odb where small corruption will stay localized and will not 
require erasing corrupt odb and reloading it from a backup file.

To help debug such corruption one can try to set ODB "/Experiment/Protect ODB" 
to "yes". This will make ODB shared memory read-only and user code scribbling 
into the wrong memory address will cause a seg fault and core dump instead of 
silent ODB corruption. This feature is not enabled by default because changing 
ODB shared memory mapping from "read-only" to "writable" (and back) is not very 
fast and it slows down MIDAS noticably.

MIDAS right before this merge was tagged "midas-2025-11-a", if you see this ODB 
update cause trouble, please report it here and revert to this tagged version.

Updates:
- harden db_delete_key() against internal corruption, if odb inconsistency is 
detected, do a clean crash instead of trying to delete stuff and corrupting odb 
to the point where it has to be erased and reloaded from a backup file.
- additional refactoring to separate read-locked and write-locked code.
- merge of missing patch to avoid odb corruption when key area becomes 100% full 
(or was it the data area? I forget now, I fixed one of them long time ago, now 
both are fixed).
- remove the "follow_links" argument from db_delete_key(), see separate 
discussion on this.
- add db_delete() to delete things by ODB path not by hkey (atomic fused 
together db_find_link() and db_delete_key()).
- fixes for incorrect use of db_find_key() and db_delete_key(), this 
unexpectedly follows symlinks and deletes the wrong ODB entry. (should have been 
db_find_link(), now replaced with atomic db_delete()).

K.O.
    Reply  25 Nov 2025, Stefan Ritt, Bug Fix, ODB update, branch feature/db_delete_key merged into develop 
Thanks for the fixes, which I all approve.

There is still a "follow_links" in midas_c_compat.h line 70 for Python. Probably Ben has to look into that. Also 
client.py has it.

Stefan
Entry  25 Nov 2025, Konstantin Olchanski, Bug Fix, fixed db_find_keys() 
Function db_find_keys() added by person unnamed in April 2020 never worked correctly, it is now fixed, 
repaired, also unsafe strcpy() replaced by mstrlcpy().

This function is used by msequencer ODBSet function and by odbedit "set" command.

Under all conditions it returned DB_NO_KEYS, only two use cases actually worked:

set runinfo/state 1 <--- no match pattern - works
set run*/state 1    <--- match multiple subdirectories - works
set runinfo/stat* 1 <--- bombs out with DB_NO_KEY
set run*/stat* 1    <--- bombs out with DB_NO_KEY

All four use cases now work.

commit b5b151c9bc174ca5fd71561f61b4288c40924a1a

K.O.
    Reply  25 Nov 2025, Konstantin Olchanski, Bug Fix, ODB update, branch feature/db_delete_key merged into develop 
> Thanks for the fixes, which I all approve.
> 
> There is still a "follow_links" in midas_c_compat.h line 70 for Python. Probably Ben has to look into that. Also 
> client.py has it.

Correct, Ben will look at this on the python side.

And I will be updating mvodb soon and fix it there.

K.O.
Entry  01 Dec 2025, Konstantin Olchanski, Bug Fix, mvodb updated 
I updated mvodb and test_mvodb. MIDAS ODB and JSON ODB now implement all API 
functions. ReadKey, ReadDir and ReadKeyLastWritten were previously missing from 
some implementations.

I do not remember any other bugs or problems in mvodb, if you want me to add, fix 
or change something, please speak up!

K.O.
Entry  03 Dec 2025, Konstantin Olchanski, Bug Fix, no more breakage in history display when panning 
In the DL experiment (unknown version of midas, likely mid-summer 2025), we see artefacts in the 
history display where pieces of the data seem to be missing, there is gaps in the graphs. reloading the 
page restores correct display confirming that in fact there is no gaps in the data. This made history 
plots very painful to use.

This problem does not exist anymore in the latest midas, most likely it was fixed around September 4, 
2025. Most likely it was broken since at least February 2025 (previous changes to this file).

If you see this problem, updating mhistory.js to latest version is probably enough to fix it.

K.O.
Entry  05 Dec 2025, Konstantin Olchanski, Bug Fix, update of JRPC and BRPC 
With the merge of RPC_CXX code, MIDAS RPC can now return data of arbitrary large size and I am 
proceeding to update the corresponding mjsonrpc interface.

If you use JRPC and BRPC in the tmfe framework, you need to do nothing, the updated RPC handlers 
are already tested and merged, the only effect is that large data returned by HandleRpc() and 
HandleBinaryRpc() will no longer be truncated.

If you use your own handlers for JRPC and BRPC, please add the RPC handlers as shown at the end 
of this message. There is no need to delete/remove the old RPC handlers.

To avoid unexpected breakage, the new code is not yet enabled by default, but you can start 
using it immediately by replacing the mjsonrpc call:

mjsonrpc_call("jrpc", ...

with

mjsonrpc_call("jrpc_cxx", ...

ditto for "brpc", see resources/example.html for complete code.

After migration is completed, if you have some old frontends where you cannot add the new RPC 
handlers, you can still call them using the "jrpc_old" and "brpc_old" mjsonrpc calls.

I will cut-over the default "jrpc" and "brpc" calls to the new RPC_CXX in about a month or so.

If you need more time, please let me know.

K.O.

Register the new RPCs:

   cm_register_function(RPC_JRPC_CXX, rpc_cxx_callback);
   cm_register_function(RPC_BRPC_CXX, binary_rpc_cxx_callback);

and add the handler functions: (see tmfe.cxx for full example)

static INT rpc_cxx_callback(INT index, void *prpc_param[])
{
   const char* cmd  = CSTRING(0);
   const char* args = CSTRING(1);
   std::string* pstr = CPSTDSTRING(2);

   *pstr = "my return data";

   return RPC_SUCCESS;
}

static INT binary_rpc_cxx_callback(INT index, void *prpc_param[])
{
   const char* cmd  = CSTRING(0);
   const char* args = CSTRING(1);
   std::vector<char>* pbuf = CPSTDVECTOR(2);

   pbuf->clear();
   pbuf->push_back(my return data);

   return RPC_SUCCESS;
}

K.O.
Entry  06 Jun 2003, Pierre-André Amaudruz, , Welcome 
Dear Midas users,

As you certainly aware, ELOG (Electronic Logbook) has been written
by Stefan Ritt and its functionality is part of the Midas package too.
This web site using Elog is replacing the W-Agora Forum previously setup.

You will need to register to this forum in order to gain Write access and 
possible Email notification.

We would like to encourage you to post your questions or comments at
this Midas Elog site instead of using private Email to the authors as your 
remarks are surely of interest to the other users too.

 
 
Entry  12 Jun 2003, Pierre-André Amaudruz, , Tape handling 
- remove ss_tape_get_blockn from lazylogger.c
- add ss_tape_get_blockn to system.c
- add ss_tape_get_blockn prototype into midas.h
- fix buffer size for "dir" in mtape.c
- add block# for "dir" in mtape if command successful.
- handle TID_STRUCT bank type by display as 8bit in ybos.c (mdump)
Entry  17 Jun 2003, Stefan Ritt, , example experiment makefile for NT 
I have added ROOT support to midas\examples\experiment\makefile.nt. To 
compile the example experiment under Windows, one needs

1) Installed version of ROOT
2) Having ROOTSYS environment variable defined
3) Invoke "nmake -f makefile.nt" in the midas\examples\experiment directory

Please note that in the current release 3.05 of ROOT, sockets are not yet 
working under Windows, so the histogram server built into the analyzer 
cannot be accessed. It is however possible to output the analyzed data into 
a .root file and visualize it with the root browser like

analyzer -i run00001.mid -o run00001.root
Entry  26 Jun 2003, David Morris, , pthreads for Linux 
Added ss_create_thread support for Linux in system.c
Added pthread library in main make file
Entry  02 Jul 2003, Pierre-André Amaudruz, , Midas/ROOT Analyser situation midas-root.jpg
The current and future situation of the Midas analyzer is summarized in the
attachment below.

Box explanation:
================
Front end:
---------
Midas code for accessing/gathering the hardware information into the Midas
format.

Midas SHM:
---------
Midas back end shared memory where the front end data are sent to.

mlogger:
-------
Data logger collecting the midas events and storing them on a physical
logging device (Disk, Tape)

Midas Analyzer:
--------------
Midas client for event-by-event analysis. Incoming data can be either online
or offline.

mserver:
-------
Subprocess interfacing external (remote) midas client to the centralized
data collection and database system.

PAW:
---
Standalone physics data analyzer (CERN).

ROOT:
----
Standalone Physics data analyser (CERN).


This diagram represents the data path from the Frontend to the analyzer in
online and offline mode. Each data path is annoted with a circled number
discussed below. In all cases, the data will flow from the front end
application to the midas back end data buffers which reside in a specific
share memory for a given experiment.

Path:
(1): From the shared memory, the midas analyzer can request events directly
and process them for output to divers destination.

(2): The data logger is a specific application which stores all the data to
 a storage media such as a disk or tape. This path is specific to the
creation of file.mid file format. The actual storage file in this .mid
format can be readout later on by the midas analyzer.

(3): The Midas analyzer has been developed originally for interfacing to the
PAW analyzer which uses its own shared memory segment for online display.
The analyzer can also save the data into a specific data format consistent
with PAW (HBOOK and Ntuples, extension .rz).

(4): Presently the data logger support a creation of the ROOT file format.
This file contains in the form of a Tree the midas event-by-event data. This
file is fully compatible with ROOT and therefore can be read out by the
standard ROOT application.

(5): Equivalent to the data logger, the analyzer receiving from the data
buffer or reading from a .mid file data can apply an event-by-event analysis
and on request produce a compliant ROOT file for further analysis. This
.root file can be composed of Trees as well as histograms.

(6): The possibility of ONLINE ROOT analysis has been implemented in a first
stage through the TMapFile (ROOT shared memory). While this configuration is
still in use an experiment, the intention is to deprecate it and replace it
with the data path (7).

(7): This path uses the network socket channel to transfer data out of the
analyzer to the ROOT environment. The current analyzer has a limited support
for ROOT analysis by only publishing on request the Midas analysis built in
histograms. No mean is yet implemented for Tree passing mechanism.

(8): The pass has not been yet investigated, but ROOT does provide
accessibility to external function calls which makes this option possible.
The ROOT framework will then perform dedicated event call to the main midas
data buffer using the standard midas communication scheme. The data format
translation from Midas banks to ROOT format will have to be taken care at
the user level in the ROOT environment.


Discussion:
==========
Presently the Socket communication between Midas and ROOT (7) is under
revision by Stefan Ritt and René Brun. This revision will simplify the
remote access of an object such as an histogram. For the Tree itself, the
requirement would be to implement a "ring buffer" mechanism for remote tree
request. This is currently under discussion.

The path (8) has been suggested by Triumf to address small experiment setup
where only a single analyzer is required. This path minimize the DAQ
requirements by moving all the data analysis handling to the user.
The same ROOT analysis code would be applicable to a ONLINE as well as
OFFLINE analysis.

Cons:
- Necessity of publishing raw data through the network for every instance of
the remote analyzer.
- Result sharing of the analysis cannot be done yet in real time.

Pros:
- No need of extra task for data translation (midas/root).
- Unique data unpacking code part of the user code.
- Less CPU requirement.

Other issues:
============
- The current necessity of the Midas shared memory for the midas analyzer to
run is a concern in particular for offline analysis where a priori no midas
is available. 

- The handling of the run/analyzer parameters. Possible parameter extraction
from file.odb.
Entry  26 Jul 2003, Konstantin Olchanski, , more ODB checks in src/odb.c 
Add more checks to db_validate_key() for pkey->total_size, item_size and
num_values. Automatically correct total_size to be item_size*num_values (we
saw this corruption and tested this fix).

K.O.

For your enjoyment, here is the diff:

RCS file: /usr/local/cvsroot/midas/src/odb.c,v
retrieving revision 1.64
diff -r1.64 odb.c
718a719,744
>   /* check key sizes */
>   if ((pkey->total_size < 0)||(pkey->total_size > pheader->key_size))
>     {
>     cm_msg(MERROR, "db_validate_key", "Warning: invalid key \"%s\"
total_size: %d", path, pkey->total_size);
>     return 0;
>     }
> 
>   if ((pkey->item_size < 0)||(pkey->item_size > pheader->key_size))
>     {
>     cm_msg(MERROR, "db_validate_key", "Warning: invalid key \"%s\"
item_size: %d", path, pkey->item_size);
>     return 0;
>     }
> 
>   if ((pkey->num_values < 0)||(pkey->num_values > pheader->key_size))
>     {
>     cm_msg(MERROR, "db_validate_key", "Warning: invalid key \"%s\"
num_values: %d", path, pkey->num_values);
>     return 0;
>     }
> 
>   /* check and correct key size */
>   if (pkey->total_size != pkey->item_size*pkey->num_values)
>     {
>     cm_msg(MINFO,  "db_validate_key", "Warning: corrected key \"%s\" size:
total_size=%d, should be %d*%d=%d", path, pkey->total_size, pkey->item_size,
pkey->num_values, pkey
->item_size*pkey->num_values);
>     pkey->total_size = pkey->item_size*pkey->num_values;
>     }
> 
Entry  26 Jul 2003, Konstantin Olchanski, , use "odbedit -C" to connect to corrupted ODB 
Add switch "-C" to odbedit to allow it to connect to corrupted ODB. Then,
depending on corruption, the user can manually remove or correct the
corrupted entries. Also, some corruption is automatically fixed by "odbedit"
itself. I use this functionality to debug and fix broken ODBs.

K.O.

For your enjoyment, here is the diff:

diff -r1.64 odbedit.c
3058a3059
> BOOL          corrupted;
3063c3064
<   debug = cmd_mode = FALSE;
---
>   debug = corrupted = cmd_mode = FALSE;
3077a3079,3080
>     else if (argv[i][0] == '-' && argv[i][1] == 'C')
>       corrupted = TRUE;
3104c3107,3108
<         printf("               [-c Command] [-c @CommandFile] [-s size]
[-g (debug)]\n\n");
---
>         printf("               [-c Command] [-c @CommandFile] [-s size]\n");
>         printf("               [-g (debug)] [-C (connect to corrupted
ODB)]\n\n");
3123c3127,3133
<   if (status != CM_SUCCESS)
---
>   else if ((status == DB_INVALID_HANDLE)&&corrupted)
>     {
>     cm_get_error(status, str);
>     puts(str);
>     printf("ODB is corrupted, connecting anyway...\n");
>     }
>   else if (status != CM_SUCCESS)
Entry  29 Jul 2003, Konstantin Olchanski, , Have to link with -lpthread? 
It appears that all midas applications are now required to link with the
pthreads library even if they do not use threads. This is caused by a
pthread_create() call from ss_thread_create() in system.c.

Is this the intended behaviour?

K.O.
    Reply  30 Jul 2003, David Morris, , Have to link with -lpthread? 
The change is required to support implementation of pthreads in the Linux
compile of Midas. This was added recently. I believe pthreads is also needed
for ROOT based compiles.

David

> It appears that all midas applications are now required to link with the
> pthreads library even if they do not use threads. This is caused by a
> pthread_create() call from ss_thread_create() in system.c.
> 
> Is this the intended behaviour?
> 
> K.O.
Entry  11 Aug 2003, Konstantin Olchanski, , mhttpd crash on corrupted ODB /RunInfo 
Invalid values of ODB /RunInfo/State cause mhttpd crash in
show_status_page() because of an out of bounds access to the array of state
names. Suggest this fix: remove array of state names, use existing ladder of
if/else statements to explicitely set state name. Verified the fix works for
TWIST. Will commit this into MIDAS CVS unless get feedback.

src/mhttpd.c:show_status_page() {
  ...
  rsprintf("<tr align=center><td>Run #%d", runinfo.run_number);

  if (runinfo.state == STATE_STOPPED)
    rsprintf("<td colspan=1 bgcolor=#FF0000>Stopped");
  else if (runinfo.state == STATE_PAUSED)
    rsprintf("<td colspan=1 bgcolor=#FFFF00>Paused");
  else if (runinfo.state == STATE_RUNNING)
    rsprintf("<td colspan=1 bgcolor=#00FF00>Running");
  else
    rsprintf("<td colspan=1 bgcolor=#FFFFFF>Unknown");

  if (runinfo.requested_transition)
  ...

K.O.
ELOG V3.1.4-2e1708b5