Back Midas Rome Roody Rootana
  Midas DAQ System, Page 131 of 139  Not logged in ELOG logo
ID Date Author Topicdown Subject
  2377   29 Mar 2022 Konstantin OlchanskiBug Fixmdump can read lz4 and bz2 files now
I converted mdump file i/o from older mdsupport library to newer midasio library 
and it can now read .mid, .mid.gz, .mid.lz4 and .mid.bz2 files. Output should be 
identical to what it printed before, if you see any differences, please report 
them here or on bitbucket. K.O.
  2378   30 Mar 2022 Konstantin OlchanskiBug Fixerroneous removal of odb clients, fixed
commit https://bitbucket.org/tmidas/midas/commits/b1fe21445109774be3f059c2124727b414abf835
made on 2022-02-21 fixed a serious bug in ODB.

a multithread race condition against an incorrectly updated shared variable caused removal of 
random clients from ODB with error message:

My client index %d in ODB is invalid: out of range 0..%d. Maybe this client was removed by a 
timeout, see midas.log. Cannot continue, aborting...

the race is between db_open_database() in one program (executed when any midas program starts) and 
db_get_my_client_locked() in all running midas programs.

as long as no midas programs are started (db_open_database() is not executed), this bug does not 
happen.

if i.e. odbedit is executed very often, i.e. from a script, probability of hitting this bug becomes 
quite high.

fixed now.

K.O.
  2379   31 Mar 2022 Konstantin OlchanskiBug Fix"run stop" trouble in mlogger, fixed
while debugging something else, I ran into a bit of trouble in mlogger.

I set the mlogger event limit to 100, and after reaching 100 events, mlogger
sayd "stopping run", but nothing happened, run kept going.

it turns out mlogger tried stopping the run too soon, the run-start
transition did not finish yet and the error message about trying
to stop a run while another transition is in progress was missing.

(fixed - if another transition is in progress, we try again later)

it also turns out that cm_transition() checks if another transition
is in progress way too late, all the way in the transition thread,
where it cannot return it is an error to mlogger.

(fixed - first thing done in cm_transition() is this check).

while debugging this, I tested the ODB flags "/Logger/Async transitions"
and /Logger/Multithread transitions". It turns out only two transition
types still work from inside mlogger - multithread transition
and detached transition (via the mtransition helper).

the issue is the dead lock between mlogger and frontend. while mlogger
is inside cm_transition(), it is not reading the SYSTEM buffer,
while at the same time frontends are writing into it. If SYSTEM
buffer happens to be pretty full, we dead lock - frontends are waiting
for free space in the SYSTEM buffer do not respond to RPCs, mlogger is not 
reading from the SYSTEM and it stuck trying to issue "run stop" RPC
to frontend. (this dead lock is not forever, eventually frontend
is killed by RPC timeout, mlogger survives and stops the run).

this is a well known problem and as solution, mlogger has been using the 
multithreaded transitions for years.

now I removed the OBD /Logger/Async transition and /Logger/Multithread 
transition flags, instead, there is now a flag /Logger/Detached transitions
set to FALSE by default. Setting it to TRUE will cause mlogger to fork 
"mtransition STOP" and "mtransition START" for stopping and starting runs,
this is useful in case there is trouble with multithreading in mlogger.

K.O.
  2386   24 Apr 2022 Konstantin OlchanskiBug Fixmserver buffer overrun and crash
There is a memory allocation bug in the mserver.

ALIGN8() was missing when receiving events from the event socket and data buffer 
was allocated 4 bytes too short. but only for some received events and only in 
very unlucky sequence of received events. result was a rare but obnoxious crash 
of fevme frontend in alpha-2 at CERN. (we do not see any crash from this in 
alpha-g or anywhere else, the best I can tell).

fixed in commit 4dc06ba47ff7caa5251fd8c48d8533f35799f3a6.

If you use the mserver, please update to this commit or apply following patch in 
midas.cxx:

-   int bufsize = sizeof(INT) + event_size;
+   int bufsize = sizeof(INT) + total_size;

K.O.
  2396   04 May 2022 Konstantin OlchanskiBug Fixmysql history update
the code for writing midas history to mysql has been updated to work against 
MYSQL 8.0.23 (CERN ALPHA-2):

- as ever mysql reports inconsistent data types (I create column with type 
"integer", mysql reports it has type "int" and so forth), the special kludge to 
take care of this had to be tweaked.

- this caused some columns to be marked "inactive" and the code to "reactivate" 
them was missing (fixed)

- binary history event data size was computed incorrectly for events with 
"inactive" columns (fixed) and caused assert() failure and mlogger crash.

- mysql read of column definitions for history event "system" (as in 
/history/links/system) bombed because of incorrect quoting (worked before, why? 
why bombed now?). this caused duplicate columns to be created in mysql table 
"system" and mlogger bomb-out with complaint about "duplicated columns" 
(actually the error message was missing, so it was a silent bomb-out). quoting 
fixed, missing error message fixed, but cleanup of duplicate columns has to be 
done by hand. in case of alpha-2 the fix was to remove the unused 
/history/links/system).

if you are using mysql history please update or patch src/history_schema.cxx.

commit 9d17d2fef233cf457121ca7c2a283c4c76ed33bc

K.O.
  2405   16 May 2022 Konstantin OlchanskiBug Fixmserver buffer overrun and crash
> There is a memory allocation bug in the mserver.

Fix for this problem introduced a new problem, an infinite loop in bm_flush_cache, 
bitbucket bugs https://bitbucket.org/tmidas/midas/issues/339/infinite-loop-in-
mserver-due-to-mfes and https://bitbucket.org/tmidas/midas/issues/331/stuck-
semaphore-of-system-buffer

This is now fixed and the buffer write cache logic and size was rejigged
according to calculations in https://daq00.triumf.ca/elog-midas/Midas/2401

Event buffer write cache (as set via ODB Equipment/Common and via 
bm_set_cache_size()) now take 2 possible values:
0 - write cache is disabled and
MIN_WRITE_CACHE_SIZE - (10 Mbytes) minimum permitted cache size
bigger cache size values are permitted, up to buffer_size/3, but probably not useful 
if my calculations are right.
smaller cache size values are generally not useful, if my calculations are right.

mfe.c and tmfe c++ frontends updated to request the new write cache size by default.

if events are getting stuck in the write cache for too long, instead of reducing the 
cache size, one should increase frequency of bm_flush_cache() calls (1/sec by 
default).

commit 373bcc3ab7f83c3c7bf6c051c237de043a982502

K.O.
  2433   19 Aug 2022 Konstantin OlchanskiBug Fix"Detected duplicate or non-monotonous data" in history files
serious (but rare) bug was fixed in the history reader. unlucky experiment would see 
errors about "Detected duplicate or non-monotonous data" in some history file, fixed by 
removing/renaming the offending file. (reported by MEG experiment)

it turns out there was nothing wrong with the data files (good), but there
was a nasty bug in the history reader. it did not ensure that we read history
files in chronological order. under some conditions order of files could be
reversed, older files would be read after newer files and trip the built-in
protection against returning non-monotonically increasing history data to the user.

fixed commit 
https://bitbucket.org/tmidas/midas/commits/9893f85ebe33e96cc63f501a0f89e1f8932c894d

for more details, see https://bitbucket.org/tmidas/midas/issues/350/file-history-non-
monotonic-time

K.O.
  2436   23 Aug 2022 Konstantin OlchanskiBug Fix"Detected duplicate or non-monotonous data" in history files
> serious (but rare) bug was fixed in the history reader.

previous fix was incomplete. please update to git commit
https://bitbucket.org/tmidas/midas/commits/b343c3c98e4e6fd00a00cf686c74c7ccc6da0c63

K.O.
  2447   11 Nov 2022 Frederik WautersBug FixO_CREAT in open in split.cxx
midas currently does not compile on linux

/usr/include/x86_64-linux-gnu/bits/fcntl2.h:50:24: error: call to ‘__open_missing_mode’ declared with attribute error: open with O_CREAT or O_TMPFILE in second argument needs 3 arguments
   50 |    __open_missing_mode ();

giving the mode is mandatory: https://man7.org/linux/man-pages/man2/open.2.html

fix is to give open in midas/examples/lowlevel/split.cxx a default mode, e.g. 006600
  2448   12 Nov 2022 Stefan RittBug FixO_CREAT in open in split.cxx
> midas currently does not compile on linux
> 
> /usr/include/x86_64-linux-gnu/bits/fcntl2.h:50:24: error: call to ‘__open_missing_mode’ declared with attribute error: open with O_CREAT or O_TMPFILE in second argument needs 3 arguments
>    50 |    __open_missing_mode ();
> 
> giving the mode is mandatory: https://man7.org/linux/man-pages/man2/open.2.html
> 
> fix is to give open in midas/examples/lowlevel/split.cxx a default mode, e.g. 006600

Thanks. Fixed.

Stefan
  2449   17 Nov 2022 Konstantin OlchanskiBug FixO_CREAT in open in split.cxx
> > midas currently does not compile on linux
> > fix is to give open in midas/examples/lowlevel/split.cxx a default mode, e.g. 006600

I got more warnings from split.cxx, looked at the code and see so many problems that it is easier
to delete it than it is to fix it.

Check for end of file is done incorrectly (check for read() return 0, -1 or short read),
memory overrun if given file name is longer than 80 bytes, no check for valid event length
read from the file, and so on and so on.

A better example for reading and writing midas files is in midasio/test_midasio.cxx. Proper c++ coding, and can read compressed files.

K.O.
  2450   17 Nov 2022 Konstantin OlchanskiBug Fix"Detected duplicate or non-monotonous data" in history files
> > serious (but rare) bug was fixed in the history reader.
> previous fix was incomplete. please update to git commit
> https://bitbucket.org/tmidas/midas/commits/b343c3c98e4e6fd00a00cf686c74c7ccc6da0c63

a race condition between reading history file in mhttpd and writing history file in 
mlogger was accidentally introduced. mhttpd would file spurious errors about "timestamp 
is after last timestamp".

fixed, please update to git commit
https://bitbucket.org/tmidas/midas/commits/7a9f6e0c58ffddcacb9ee19934ce3e2033a805ef

fix race condition in history file reader - a race condition was added accidentally - 
first the reader remembers the history file size and the time of the last entry, then it 
goes to read the file and bombs if at the same time mlogger added more entries - their 
time is after the remembered time of last entry and error "timestamp is after last 
timestamp" is triggered.

K.O.
  2580   09 Aug 2023 Konstantin OlchanskiBug FixStefan's improved ODB flush to disk
This is an important improvement, should have a post of it's own. K.O.

> > > RFE filed:
> > > https://bitbucket.org/tmidas/midas/issues/367/odb-should-be-saved-to-disk-
periodically
> > 
> > Implemented and closed: https://bitbucket.org/tmidas/midas/issues/367/odb-
should-be-saved-to-disk-periodically
> > 
> > Stefan
> 
> Stefan's comments from the closed bug report:
> 
> Ok I implemented some periodic flushing. Here is what I did:
> 
> Created
> 
> /System/Flush/Flush period : TID_UINT32 /System/Flush/Last flush : TID_UINT32
> 
> which control the flushing to disk. The default value for “Flush period” is 60 
seconds or one minute.
> 
> All clients call db_flush_database() through their cm_yield() function
> db_flush_database() checks the “Last flush” and only flushes the ODB when the 
period has expired. This test is 
> done inside the ODB semaphore so that we don’t get a race condigiton
> If the period has expired, db_flush_database() calls ss_shm_flush()
> ss_shm_flush() tries to allocate a buffer of the shared memory. If the 
allocation is not successful (out of 
> memory), ss_shm_flush() writes directly to the binary file as before.
> If the allocation is successful, ss_shm_flush() copies the share memory to a 
buffer and passes this buffer to a 
> dedicated thread which writes the buffer to the binary file. This causes 
ss_shm_flush() to return immediately and 
> not block the calling program during the disk write operation.
> Added back the “if (destroy_flag) ss_shm_flush()” so that the ODB is flushed 
for sure before the shared memory 
> gets deleted.
> This means now that under normal circumstances, exiting programs like odbedit 
do NOT flush the ODB. This allows to 
> call many “odbedit -c” in a row without the flush penalty. Nevertheless, the 
ODB then gets flushed by other 
> clients latest 60 seconds (or whatever the flush period is) after odbedit 
exits.
> 
> Please note that ODB flushing has two purposes:
> 
> When all programs exit, we need a persistent storage for the ODB. In most 
experiments this only happens very 
> seldom. Maybe at the end of a beam time period.
> If the computer crashes, a recent version of the ODB is kept on disk to 
simplify recovery after the crash.
> Since crashes are not so often (during production periods we have maybe one 
hardware failure every few years) the 
> flushing of the ODB too often does not make sense and just consumes resources. 
Flushing does also not help from 
> corrupted ODBs, since the binary image will also get corrupted. So the only 
reason for periodic flushes is to ease 
> recovery after a total crash. I put the default to 60 seconds, but if people 
are really paranoid they can decrease 
> it to 10 seconds or so. Or increase it to 600 seconds if their system does not 
crash every week and disks are 
> slow.
> 
> I made a dedicated branch feature/periodic_odb_flush so people can test the 
new functionality. If there are no 
> complaints within the next few days, I will merge that into develop.
> 
> Stefan
  2614   03 Oct 2023 Konstantin OlchanskiBug Fixwrong array size after loading xml or json file
both the xml and the json decoders have a bug (fix pending). loading saved odb 
from xml and json file did not truncate arrays in odb to the size of arrays in 
the file. for example, if /example/double_array has size 20 in odb, but size 5 
in xml or json file, after loading the file, array size is still 20.

this is unexpected: after loading an odb save file we expect odb to return to 
same state as when odb save file was created. we do not expect some arrays to 
have half of their elements restored from file and half their elements left 
unchanged.

save and restore from .odb file does not have this problem.

I think this is a bug and I committed (but did not yet push) a fix for both xml 
and json odb decoder.

I have run this problem while writing the new history panel editor, where 
deleting variables did not work because json rpc db_paste() was not truncating 
any arrays.

I am still finishing up the last few bits of the new history panel editor, and 
there is a bit of time to discuss and comment this odb change before I push it 
to midas.

K.O.
  2703   05 Feb 2024 Ben SmithBug Fixstring --> int64 conversion in the python interface ?
> The symptoms are consistent with a string --> int64 conversion not happening 
> where it is needed. 

Thanks for the report Pasha. Indeed I was missing a conversion in one place. Fixed now!

Ben
  2710   13 Feb 2024 Konstantin OlchanskiBug Fixstring --> int64 conversion in the python interface ?
> > The symptoms are consistent with a string --> int64 conversion not happening 
> > where it is needed. 
> 
> Thanks for the report Pasha. Indeed I was missing a conversion in one place. Fixed now!
> 

Are we running these tests as part of the nightly build on bitbucket? They would be part of 
the "make test" target. Correct python dependancies may need to be added to the bitbucket OS 
image in bitbucket-pipelines.yml. (This is a PITA to get right).

K.O.
  2711   14 Feb 2024 Konstantin OlchanskiBug Fixadded ubuntu-22 to nightly build on bitbucket, now need python!
> Are we running these tests as part of the nightly build on bitbucket? They would be part of 
> the "make test" target. Correct python dependancies may need to be added to the bitbucket OS 
> image in bitbucket-pipelines.yml. (This is a PITA to get right).

I added ubuntu-22 to the nightly builds.

but I notice the build says "no python" and I am not sure what packages I need to install for 
midas python to work.

Ben, can you help me with this?

https://bitbucket.org/tmidas/midas/pipelines/results/1106/steps/%7B9ef2cf97-bd9f-4fd3-9ca2-9c6aa5e20828%7D

K.O.
  1   06 Jun 2003 Pierre-André Amaudruz Welcome
Dear Midas users,

As you certainly aware, ELOG (Electronic Logbook) has been written
by Stefan Ritt and its functionality is part of the Midas package too.
This web site using Elog is replacing the W-Agora Forum previously setup.

You will need to register to this forum in order to gain Write access and 
possible Email notification.

We would like to encourage you to post your questions or comments at
this Midas Elog site instead of using private Email to the authors as your 
remarks are surely of interest to the other users too.

 
 
  145   12 Jun 2003 Pierre-André Amaudruz Tape handling
- remove ss_tape_get_blockn from lazylogger.c
- add ss_tape_get_blockn to system.c
- add ss_tape_get_blockn prototype into midas.h
- fix buffer size for "dir" in mtape.c
- add block# for "dir" in mtape if command successful.
- handle TID_STRUCT bank type by display as 8bit in ybos.c (mdump)
  144   17 Jun 2003 Stefan Ritt example experiment makefile for NT
I have added ROOT support to midas\examples\experiment\makefile.nt. To 
compile the example experiment under Windows, one needs

1) Installed version of ROOT
2) Having ROOTSYS environment variable defined
3) Invoke "nmake -f makefile.nt" in the midas\examples\experiment directory

Please note that in the current release 3.05 of ROOT, sockets are not yet 
working under Windows, so the histogram server built into the analyzer 
cannot be accessed. It is however possible to output the analyzed data into 
a .root file and visualize it with the root browser like

analyzer -i run00001.mid -o run00001.root
ELOG V3.1.4-2e1708b5