ELOG Midas

Back Midas Rome Roody Rootana

Midas DAQ System, Page 8 of 49

Not logged in

Find | Login | Help

Full | Summary | Threaded | Collapse | Expand

975 Entries

Goto page Previous 1, 2, 3 ... 7, 8, 9 ... 47, 48, 49 Next

09 Aug 2023, Konstantin Olchanski, Bug Fix, Stefan's improved ODB flush to disk

This is an important improvement, should have a post of it's own. K.O.

> > > RFE filed:
> > > https://bitbucket.org/tmidas/midas/issues/367/odb-should-be-saved-to-disk-
periodically
> > 
> > Implemented and closed: https://bitbucket.org/tmidas/midas/issues/367/odb-
should-be-saved-to-disk-periodically
> > 
> > Stefan
> 
> Stefan's comments from the closed bug report:
> 
> Ok I implemented some periodic flushing. Here is what I did:
> 
> Created
> 
> /System/Flush/Flush period : TID_UINT32 /System/Flush/Last flush : TID_UINT32
> 
> which control the flushing to disk. The default value for “Flush period” is 60 
seconds or one minute.
> 
> All clients call db_flush_database() through their cm_yield() function
> db_flush_database() checks the “Last flush” and only flushes the ODB when the 
period has expired. This test is 
> done inside the ODB semaphore so that we don’t get a race condigiton
> If the period has expired, db_flush_database() calls ss_shm_flush()
> ss_shm_flush() tries to allocate a buffer of the shared memory. If the 
allocation is not successful (out of 
> memory), ss_shm_flush() writes directly to the binary file as before.
> If the allocation is successful, ss_shm_flush() copies the share memory to a 
buffer and passes this buffer to a 
> dedicated thread which writes the buffer to the binary file. This causes 
ss_shm_flush() to return immediately and 
> not block the calling program during the disk write operation.
> Added back the “if (destroy_flag) ss_shm_flush()” so that the ODB is flushed 
for sure before the shared memory 
> gets deleted.
> This means now that under normal circumstances, exiting programs like odbedit 
do NOT flush the ODB. This allows to 
> call many “odbedit -c” in a row without the flush penalty. Nevertheless, the 
ODB then gets flushed by other 
> clients latest 60 seconds (or whatever the flush period is) after odbedit 
exits.
> 
> Please note that ODB flushing has two purposes:
> 
> When all programs exit, we need a persistent storage for the ODB. In most 
experiments this only happens very 
> seldom. Maybe at the end of a beam time period.
> If the computer crashes, a recent version of the ODB is kept on disk to 
simplify recovery after the crash.
> Since crashes are not so often (during production periods we have maybe one 
hardware failure every few years) the 
> flushing of the ODB too often does not make sense and just consumes resources. 
Flushing does also not help from 
> corrupted ODBs, since the binary image will also get corrupted. So the only 
reason for periodic flushes is to ease 
> recovery after a total crash. I put the default to 60 seconds, but if people 
are really paranoid they can decrease 
> it to 10 seconds or so. Or increase it to 600 seconds if their system does not 
crash every week and disks are 
> slow.
> 
> I made a dedicated branch feature/periodic_odb_flush so people can test the 
new functionality. If there are no 
> complaints within the next few days, I will merge that into develop.
> 
> Stefan

31 Mar 2022, Stefan Ritt, Suggestion, Maximum ODB size

Anybody some idea what the maximum ODB size can be? In the old days, the linux 
kernels had a severe limit on shared memory of usually 8MB, but in the age of 
64GB RAM being a standard, we should be able to grow bigger. Tried

odbinit -s 1024MB --cleanup

which went through without complain, even put that value in to .ODB_SIZE.TXT, but 
when I started odbedit doing "mem", I only see a size of 1MB. Probably somewhere 
deep inside we have a limit which prevents the user to create very large ODBs, 
but this should be mentioned more prominently in odbinit. Like "size too large, 
maximum allowd is xxx MB".

Stefan

04 Apr 2022, Konstantin Olchanski, Suggestion, Maximum ODB size

> Anybody some idea what the maximum ODB size can be?

It turns out ODB size limit is hardwired on db_open_database() at 100 Mbytes.

I now committed an improved error message for this.

I confirm that "odbinit -s 100MB" works and creates ODB with 50 Mbyte data area and 50 
Mbyte key area.

> in the age of 64GB RAM being a standard, we should be able to grow bigger ...

I agree, I think we can safely bump the limit from 100 Mbytes to 1 Gbyte, maybe 1.5 or 
1.99 Gbytes. Above that we run into 32-bit/31-bit cleanliness problems.

And creating extra large 1 GB ODB but using only a few megabytes will not waste any 
RAM, because the .ODB.SHM file is demand-paged and non-used parts of ODB will not be 
mapped into RAM. (It will waste disk space, file .ODB.SHM will be 1 GByte size).

However, 1 GByte (FPGA based) and 4-8 GByte (Raspberry Pi & co) machines are again
becoming popular and relevant for running MIDAS, and they have very slow "disk" 
subsystems, with NAND, SD and USB flash, so we should not go crazy here.

> odbinit -s 1024MB --cleanup

there is a bug in odbinit, if initial odbinit fails, ODB with default size is creates, 
and original rejected ODB size is written to .ODB_SIZE.TXT (an inconsistency).

bitbucket bug 328

> [ how do I resize ODB ??? ]

we need odbresize. bitbucket bug 329.

K.O.

27 Apr 2023, Marius Koeppel, Suggestion, Maximum ODB size

Hi all,

> I agree, I think we can safely bump the limit from 100 Mbytes to 1 Gbyte, maybe 1.5 or 
> 1.99 Gbytes. Above that we run into 32-bit/31-bit cleanliness problems.
We just went in and changed: int odb_size_limit = INT_MAX;//100*1000*1000; in odb.cxx. And we could create ODBs with 1GB and 1.5 GB.

Since the DecodeSize function in odbinit has also foreseen yottabytes ;) (const char units[] = {'k', 'M', 'G', 'T', 'P', 'E', 'Z', 'Y'};) we think going to GB for the maximum ODB size would be create.

> there is a bug in odbinit, if initial odbinit fails, ODB with default size is creates, 
> and original rejected ODB size is written to .ODB_SIZE.TXT (an inconsistency).
Can#t we go with the maximum size here if the user inputs a larger size? So just below       printf("Checking ODB size...\n"); one could check for the odb_size_limit. In general one could move the odb_size_limit to midas.h so its not only available in odb.cxx.

Best,
Marius

27 Apr 2023, Konstantin Olchanski, Suggestion, Maximum ODB size

> > I agree, I think we can safely bump the limit from 100 Mbytes to 1 Gbyte, maybe 1.5 or 
> > 1.99 Gbytes. Above that we run into 32-bit/31-bit cleanliness problems.
>
> We just went in and changed: int odb_size_limit = INT_MAX;//100*1000*1000; in odb.cxx.
>

This is change is wrong. As I wrote, ODB is not 64-bit clean and it is not 32-bit clean. We think is is 31-bit clean, so maximum size would be slightly less than 2 Gbytes.

> And we could create ODBs with 1GB and 1.5 GB.

Congratulations. created != "it works". for proper test, you should fill it with 1.5 GB of stuff, save to json file, reload from json file, save to a different json file and compare that they have same contents (minus timestamps).

We could spend a lot of time making odb 32-bit clean and give you 4GB-max ODB, but would it be useful? For large ODB, "save to .json" already takes a long time ("save to .xml" is slower, "save to .odb" ditto, also buggy). We already have complaints that runs take forever to start because mlogger 
takes a long time to write the ODB save file.

P.S. 64-bit clean ODB will be binary incompatible, all internal pointers are 32-bit right now.

K.O.

27 Apr 2023, Marius Koeppel, Suggestion, Maximum ODB size

> This is change is wrong. As I wrote, ODB is not 64-bit clean and it is not 32-bit clean. We think is is 31-bit clean, so maximum size would be slightly less than 2 Gbytes.

I just wanted to show that changing it and creating bigger ODBs is in general possible.

My main intention was to trigger the discussion again. I also think in general 1GB is enough. But for our applications sometimes 100MB is just on the edge. 



> Congratulations. created != "it works". for proper test, you should fill it with 1.5 GB of stuff, save to json file, reload from json file, save to a different json file and compare that they have same contents (minus timestamps).

You’re right we did not properly test it. I will run this test with a 1GB ODB.



> We could spend a lot of time making odb 32-bit clean and give you 4GB-max ODB, but would it be useful? For large ODB, "save to .json" already takes a long time ("save to .xml" is slower, "save to .odb" ditto, also buggy). We already have complaints that runs take forever to start because mlogger 

> takes a long time to write the ODB save file.

I also agree that going in and making it 32-bit or even 64-bit clean is not worth the effort.

Also concerning the writing speed of the logger etc I am fully with you.

However, having the freedom to choose a bit bigger ODB would be great.



You said the writing into .odb is buggy. Do you mean it’s buggy in general or only in this specific case?

We save the ODB most of the time in the .odb format. 



Cheers,

Marius

27 Apr 2023, Konstantin Olchanski, Suggestion, Maximum ODB size

> You said the writing into .odb is buggy. Do you mean it’s buggy in general or only in this specific case?
> We save the ODB most of the time in the .odb format. 

I recommend JSON. Main advantage is you can read it using JSON decoder available for any language, no need to write custom code.

Other than that, the main issue is encoding of strings. For ODB this is key names and string values.

JSON was the first to standardize escape characters what can encode all valid UTF-8 UNICODE strings,
the system of escape characters is clean, easy to understand and easy to implement. https://www.json.org/json-en.html

XML is not as well defined as JSON, i.e. go and try to find the XML BNF grammar. I am not sure if the MIDAS XML encoder
and decoder is fully UTF-8 clean, and if some unlucky combinations of characters break string encoding or decoding. This
is usually tested using a fuzzer (generates all possible, unlucky and unlikely string values). Most suspicious
would be quotes, and square and angle brackets. If some character combinations break encoding or decoding, likely
this cannot be fixed in MIDAS without breaking backwards self-compatibility (will not read old ODB files correctly).

Same applies for the ODB format, except that it is even more ad-hoc. Again, any problems are hard to fix without
breaking backward self-compatibility.

In addition, in the past, the ODB and XML decoders had trouble with very long strings, this has been
fixed some time ago.

K.O.

27 Apr 2023, Konstantin Olchanski, Suggestion, Maximum ODB size

my vote is to bump the ODB size limit to 1999*1000*1000 (not quite 2GB). but this needs to be tested. especially save and restore from ODB, XML and JSON files, including how long it takes to save and load a 1.9GB ODB. K.O.

28 Apr 2023, Marius Koeppel, Suggestion, Maximum ODB size

> my vote is to bump the ODB size limit to 1999*1000*1000 (not quite 2GB). but this needs to be tested. especially save and restore from ODB, XML and JSON files, including how long it takes to save and load a 1.9GB ODB. K.O.

I had some fun with python and created a test script which can be executed in the MIDASSYS/online folder (test_odb.py). I did not really normalize the time so it will be different at different systems but I guess the trend is important (see create_time.pdf).
What is surprising to me is that even that I only write one STRING key to the time increases. Is this maybe related to what Stefan said about the run start - so that odbedit needs some time to load the bigger ODB?
Second thing is that also the creation / storing and load time is increasing. Should this be or is there a bug in the code I use or again is this related to the previous point?

The test of comparing the ODB after store / load / store already fails for the json format. I know I only test if the dicts are the same, so for timestamps this already fails.
But what is strange here is that sometimes the test works sometimes not and its different from run to run.

I will try to improve the test a bit more but for a short update this is how it looks so fare.

Best,
Marius

28 Apr 2023, Stefan Ritt, Suggestion, Maximum ODB size

> Is this maybe related to what Stefan said about the run start - so that odbedit needs some time to load the bigger ODB?

At the run start mlogger writes the ODB to the .mid file. This needs conversion (binary ODB -> XML ASCII) which can take time.
This does NOT depend on the ODB size, but on the ODB *content*. Every key in the ODB takes time to convert. So if your ODB as 1.5 GB
but only a few keys, this is still fast. Only if you have 200 million keys int he ODB, then mlogger takes lots of time to convert
200 million values to XML or JSON strings.

Stefan

28 Apr 2023, Marius Koeppel, Suggestion, Maximum ODB size

> At the run start mlogger writes the ODB to the .mid file. This needs conversion (binary ODB -> XML ASCII) which can take time.
> This does NOT depend on the ODB size, but on the ODB *content*. Every key in the ODB takes time to convert. So if your ODB as 1.5 GB
> but only a few keys, this is still fast. Only if you have 200 million keys int he ODB, then mlogger takes lots of time to convert
> 200 million values to XML or JSON strings.
This was also my assumption. Is this the same for odbedit -c save FILE?
Because this is what I tested with the script and there one can see in the plot that the time increases to write the file if the ODB size increases.
The content of the ODB is always the same - one STRING key in the directory Test.

Best,
Marius

28 Apr 2023, Konstantin Olchanski, Suggestion, Maximum ODB size

> > Is this maybe related to what Stefan said about the run start - so that odbedit needs some time to load the bigger ODB?
> 
> At the run start mlogger writes the ODB to the .mid file. This needs conversion (binary ODB -> XML ASCII) which can take time.
> This does NOT depend on the ODB size, but on the ODB *content*.
>

Yes and no. They must be storing more than 100 Mbytes of stuff in ODB, if they are asking to bump ODB size from 100 Mbyte to 2 GByte ODB.

On the MIDAS, side, though, we have to plan for the worst case, if max ODB size 1.9 GB and it is full of data,
and mlogger (and odbedit save and load) take 10-30 seconds, then at least all timeouts (watchdog timeout, RPC timeout, etc)
must be increased accordingly.

K.O.

27 Apr 2023, Stefan Ritt, Suggestion, Maximum ODB size

> Congratulations. created != "it works".

Two other tings to consider:

1) The ODB shared memory is dumped into a binary file (".ODB.SHM") after the last client finished and read if the first client starts, to get it persistent. 
So this could slow down starting and stopping, but only the first client, so I guess it's not an issue.

2) Traditionally, the ODB gets dumped to the .mid file at the beginning and end of every run, so that one know the exact configuration of the experiment
for offline analysis. This can be turned off of course, but most experiments use it. If the ODB is dumped in any ASCII format, this can take quite long.
Assume it takes 10 seconds at the beginning of each run, and we take a run every five minutes. Then we loose 48 mins of precious beam time every day.

Best,
Stefan

28 Apr 2023, Konstantin Olchanski, Suggestion, Maximum ODB size

> > Congratulations. created != "it works".
> 
> Two other tings to consider:
> 
> 1) The ODB shared memory is dumped into a binary file (".ODB.SHM") after the last client finished and read if the first client starts, to get it persistent. 
> So this could slow down starting and stopping, but only the first client, so I guess it's not an issue.
>

typical disk writing speed is 100-1000 Mbytes/sec, so writing 1 GB .ODB.SHM will take 1-10 seconds. NFS over 1gige network is 100 Mbytes/sec, so 10 seconds to 
write .ODB.SHM. embedded ARM write speed to SD flash can be as low as 10 Mbytes/sec, so up to 100 seconds.

> 
> 2) Traditionally, the ODB gets dumped to the .mid file at the beginning and end of every run, so that one know the exact configuration of the experiment
> for offline analysis. This can be turned off of course, but most experiments use it. If the ODB is dumped in any ASCII format, this can take quite long.
> Assume it takes 10 seconds at the beginning of each run, and we take a run every five minutes. Then we loose 48 mins of precious beam time every day.
> 

new default is to save as JSON, (as of my last measurement) JSON encoder is faster than the XML (and ODB?) encoder, by default result is compressed by GZIP-1 (66 
Mbytes/sec is my old benchmark, should remeasure on new DDR5 machines), compressed JSON is written .mid.gz file at disk speed (as above). Alternatively, use LZ4 
compression, runs roughly at memcpy() speed, less compression, written to .mid.lz4 at disk speed.

if data storage is ZFS, ZFS built-in LZ4 compression is now enabled by default, so result writing uncompressed .mid file (no compression of ODB dump), should be 
roughly same as when using MIDAS LZ4 compression and writing .mid.lz4.

bottom line, I need to remeasure gzip and lz4 compression speeds on new computers (DDR4 AMD 5000 series and DDR5 AMD 7000 series).

K.O.

09 Jun 2023, Konstantin Olchanski, Suggestion, Maximum ODB size

> > 1) The ODB shared memory is dumped into a binary file (".ODB.SHM") after the last client finished ...

correction: ODB shared memory is saved to .ODB.SHM each time a client stops, this is db_close_database().

I have just run into a problem with this in the DRAGON experiment. At begin and end of run they run
a script that does a large number of "odbedit" calls to read stuff from ODB and it was taking a very long time.
Each odbedit invocation was taking about 1 second, starting odbedit is quick, stopping odbedit takes about 1 second.

It turns out each invocation of odbedit saves .ODB.SHM, ODB was 100 Mbytes size, home disk is an HDD (~100-200 Mbytes/sec writing speed), so yes, about 1 second to 
stop odbedit.

Solution was to reduce ODB size from 100 Mbytes to 10 Mbytes, odbedit now run quickly, begin and end of run scripts run quickly. problem solved.

K.O.

P.S. no, I am not the dragon experiment, no, I did not write those scripts, no, I will not rewrite them, persons who wrote them are long gone, no, the persons running 
dragon today will not be rewriting them.

12 Jun 2023, Stefan Ritt, Suggestion, Maximum ODB size

> correction: ODB shared memory is saved to .ODB.SHM each time a client stops, this is db_close_database().

The original design of the midas shared memory (back in the 1990's) was that the ODB shared memory file gets
only saved into the .ODB.SHM when the *last* client exits. This ensures to keep the ODB persistent when the
shared memory gets deleted. I vaguely remember I put something in like:

db_close_database()
...
  destroy_flag = (pheader->num_clients == 0);

  if (destroy_flag)
     ss_shm_flush(pheader->name, pdb->shm_adr, pdb->shm_size, pdb->shm_handle);
...

Now I see that the "if (destory_flag)" is missing. Not sure if it was removed once, or if it actually never
was there. But I see no point in flushing the ODB when a client ends. We need the flushing only before the
shared memory gets deleted. We we have to ensure that the share memory and the binary dump file stay in sync
(like if all midas clients die at the same time), we could add some code to flush the ODB like once per minute,
but not attach it to db_close_database(). I know several experiments using "odbedit -c xxx" in vast quantities,
so all these experiments would then benefit.

Note: Mu3e at PSI also uses 100 MB ODB, and they really need it.

Thoughts and opinions?

Best,
Stefan

12 Jun 2023, Konstantin Olchanski, Suggestion, Maximum ODB size

> > correction: ODB shared memory is saved to .ODB.SHM each time a client stops, this is db_close_database().
> 
> The original design of the midas shared memory (back in the 1990's) was that the ODB shared memory file gets
> only saved into the .ODB.SHM when the *last* client exits. This ensures to keep the ODB persistent when the
> shared memory gets deleted. I vaguely remember I put something in like:
> 
> db_close_database()
> ...
>   destroy_flag = (pheader->num_clients == 0);
> 
>   if (destroy_flag)
>      ss_shm_flush(pheader->name, pdb->shm_adr, pdb->shm_size, pdb->shm_handle);

I remember the same, but I tracked it down in git to the very first commit, and there is no if() there,
odb is saved to .ODB.SHM on every client shutdown, not just the last client. I guess we both misremebered.

What's more, ss_shm_flush() is done while holding the ODB semaphore, so all other midas programs that try to access
odb at the same time (including the mserver) will stall until write() and close() return. at least we do not fsync(),
and there is no waiting until data is committed to physical media.

$ git annotate 3bb04af4d^ src/odb.c
...
ef8320177	(Stefan Ritt	1998-10-08 13:46:02 +0000	875)  destroy_flag = (pheader->num_clients == 0);
ef8320177	(Stefan Ritt	1998-10-08 13:46:02 +0000	876)
ef8320177	(Stefan Ritt	1998-10-08 13:46:02 +0000	877)  /* flush shared memory to disk */
ef8320177	(Stefan Ritt	1998-10-08 13:46:02 +0000	878)  ss_flush_shm(pheader->name, pheader, sizeof(DATABASE_HEADER)+2*pheader->data_size);
ef8320177	(Stefan Ritt	1998-10-08 13:46:02 +0000	879)
ef8320177	(Stefan Ritt	1998-10-08 13:46:02 +0000	880)  /* unmap shared memory, delete it if we are the last */
ef8320177	(Stefan Ritt	1998-10-08 13:46:02 +0000	881)  ss_close_shm(pheader->name, pheader,
ef8320177	(Stefan Ritt	1998-10-08 13:46:02 +0000	882)               _database[hDB-1].shm_handle, destroy_flag);
...

K.O.

13 Jun 2023, Stefan Ritt, Suggestion, Maximum ODB size

> I remember the same, but I tracked it down in git to the very first commit, and there is no if() there,
> odb is saved to .ODB.SHM on every client shutdown, not just the last client. I guess we both misremebered.

I confirm. Really strange how your mind can trick you. I'm absolutely sure I had this planned originally (1995?), but it got never implemented.

Well, never too late. So I added the "if" and committed to develop. I did a quick test and things seem to work fine here. Actually programs stop 
a bit faster now. So please everybody give it a try and report back here.

BTW, how do I resize the ODB. I remember we discussed this some time ago, and concluded that odbedit needs a resize flag. Has this even been 
done? If not, what is the "official" way to resize the ODB. We had some documentation about that some time ago, but I can't find it anymore.

Stefan

13 Jun 2023, Marius Koeppel, Suggestion, Maximum ODB size

> BTW, how do I resize the ODB. I remember we discussed this some time ago, and concluded that odbedit needs a resize flag. Has this even been 
> done? If not, what is the "official" way to resize the ODB. We had some documentation about that some time ago, but I can't find it anymore.

I guess this is still not done and the issue is still open: https://bitbucket.org/tmidas/midas/issues/329/need-odbresize
I guess if we touch this maybe the problem with the wrong size should be also fixed: https://bitbucket.org/tmidas/midas/issues/328/odbinit-s-1024mb-creates-odb-with-wrong

Best,
Marius

13 Jun 2023, Konstantin Olchanski, Suggestion, Maximum ODB size

> > BTW, how do I resize the ODB.

ODB cannot be resized "online". Everything has to stop, save content to odb.json, get rid of old ODB.SHM, ensure ODB shared memory is destroyed (SysV or POSIX shared memory), 
create new ODB with new size, load odb.json. Feel free to punch this into chatgpt > odbresize.cxx, commit, test, push.

> I remember we discussed this some time ago, and concluded that odbedit needs a resize flag.

ODB cannot be resized online. ODB API has ODB clients holding ODB handles which are pointers (offsets) into ODB shared memory.

> Has this even been done?
> I guess this is still not done and the issue is still open: https://bitbucket.org/tmidas/midas/issues/329/need-odbresize
> I guess if we touch this maybe the problem with the wrong size should be also fixed: https://bitbucket.org/tmidas/midas/issues/328/odbinit-s-1024mb-creates-odb-with-wrong

please contribute 14 distraction-free days to my patreon. thanks in advance!

K.O.

13 Jun 2023, Konstantin Olchanski, Suggestion, Maximum ODB size

> > I remember the same, but I tracked it down in git to the very first commit, and there is no if() there,
> > odb is saved to .ODB.SHM on every client shutdown, not just the last client. I guess we both misremebered.

small problem. build an experiment, start taking data, observe how ODB is never saved to disk because the "last client" never stops. as bonus, crash 
the computer, observe how all changes to ODB are now lost. if mlogger is configured to save odb.json at the end of run, and to write ODB dumps at 
begin and end of every data file, you can recover some of the lost data.

for better effect, ODB should be dumped to disk at periodic intervals. but. current implementation writes odb to disk while holding the ODB 
semaphore, which means all ODB access stops for the duration, specifically, there will be gaps in the history because mlogger cannot read history 
data from ODB.

a better implementation could take the ODB lock, make a copy of ODB shared memory, release the ODB lock, complete writing to disk without holding the 
lock. protection is needed against 100 midas programs trying to do this all at the same time. computers with 0.5 GB RAM (many ARM FPGA SoCs) will be 
limited to ~100 Mbyte ODB). plus deal with memory allocation failures when taking a copy of a 2GB ODB.

in theory, the mmap() shared memory (already implemented in midas) does this automatically, but we lose control
over disk writes, we see some OSes write odb to disk "too often" and at wrong times, i.e. while we are in the middle
of creating or deleting something. current sequence of open(), atomic write() and close() ensures ODB.SHM always
contains a valid odb. (minus loss of OS and disk caches to crash or power loss).

K.O.

13 Jun 2023, Stefan Ritt, Suggestion, Maximum ODB size

> small problem. build an experiment, start taking data, observe how ODB is never saved to disk because the "last client" never stops. as bonus, crash 
> the computer, observe how all changes to ODB are now lost. if mlogger is configured to save odb.json at the end of run, and to write ODB dumps at 
> begin and end of every data file, you can recover some of the lost 

The new behavior is not much worse than before. Assume 10 programs running happily for days, computer crashes, all ODB changes lost. 
So indeed a periodic flush without holding the lock might be best. Use a semaphore to prevent all programs flushing at the same time, or put
the flush only in the logger after an end of run.

Stefan

13 Jun 2023, Konstantin Olchanski, Suggestion, Maximum ODB size

> 
> > small problem. build an experiment, start taking data, observe how ODB is never saved to disk because the "last client" never stops. as bonus, crash 
> > the computer, observe how all changes to ODB are now lost. if mlogger is configured to save odb.json at the end of run, and to write ODB dumps at 
> > begin and end of every data file, you can recover some of the lost 
> 
> The new behavior is not much worse than before. Assume 10 programs running happily for days, computer crashes, all ODB changes lost. 
> So indeed a periodic flush without holding the lock might be best. Use a semaphore to prevent all programs flushing at the same time, or put
> the flush only in the logger after an end of run.

are you sure? when/how often does "last midas program finishes" happen? it does not happen on a system crash, not on power loss, not on "shutdown -r now" 
(I am pretty sure). In the experiments you run, how often do you shut down all programs (and check that you did not forget one somehow)?

sanity check. dragon experiment, very active, .ODB.SHM timestamp is 1 second old. not-very-active agmini, today is June 13th, timestamp of .ODB.SHM is June 
2nd. inactive TACTIC, timestamp of .ODB.SHM is May 16th.

so yes, not great, but in the new scheme, ODB.SHM timestamps would probably be from 2021 or 2020.

my vote is to undo this change, it is dangerous because it causes odb to be saved to ODB.SHM never.

K.O.

13 Jun 2023, Stefan Ritt, Suggestion, Maximum ODB size

> are you sure? when/how often does "last midas program finishes" happen? it does not happen on a system crash, not on power loss, not on "shutdown -r now" 
> (I am pretty sure). In the experiments you run, how often do you shut down all programs (and check that you did not forget one somehow)?

Indeed this is almost never the case, maybe once per months. On the other hand, we have a complete crash of the os maybe once a year. Most of the time the programs 
run continuously (we do not need odbedit), so our timestamp is typically one or two days old, so not good either.

> my vote is to undo this change, it is dangerous because it causes odb to be saved to ODB.SHM never.

My vote is to flush the odb either periodically or after each run.

Stefan

15 Jun 2023, Konstantin Olchanski, Suggestion, Maximum ODB size

> 
> > are you sure? when/how often does "last midas program finishes" happen? it does not happen on a system crash, not on power loss, not on "shutdown -r now" 
> > (I am pretty sure). In the experiments you run, how often do you shut down all programs (and check that you did not forget one somehow)?
> 
> Indeed this is almost never the case, maybe once per months. On the other hand, we have a complete crash of the os maybe once a year. Most of the time the programs 
> run continuously (we do not need odbedit), so our timestamp is typically one or two days old, so not good either.
> 
> > my vote is to undo this change, it is dangerous because it causes odb to be saved to ODB.SHM never.
> 
> My vote is to flush the odb either periodically or after each run.
> 

So we are in agreement.

RFE filed:
https://bitbucket.org/tmidas/midas/issues/367/odb-should-be-saved-to-disk-periodically

Dangerous change reverted:
60e4c44ad66346b89ba057391acf7a02890049be

K.O.

bash-3.2$ git diff
diff --git a/src/odb.cxx b/src/odb.cxx
index 0d3b88c2..d104ff28 100644
--- a/src/odb.cxx
+++ b/src/odb.cxx
@@ -2199,7 +2199,14 @@ INT db_close_database(HNDLE hDB)
       destroy_flag = (pheader->num_clients == 0);
 
       /* flush shared memory to disk */
-      if (destroy_flag)
+
+      /* if we save ODB to disk only after last client finishes, we will never save ODB to disk
+         in most experiments - none of them ever completely stop MIDAS in normal operation.
+         as result, all changes to ODB contents will be lost on system crash, power loss
+         or normal reboot. see https://daq00.triumf.ca/elog-midas/Midas/2539
+         K.O. June 2023. */
+
+      if (1 || destroy_flag)
          ss_shm_flush(pheader->name, pdb->shm_adr, pdb->shm_size, pdb->shm_handle);
 
       strlcpy(xname, pheader->name, sizeof(xname));

K.O.

28 Jul 2023, Stefan Ritt, Suggestion, Maximum ODB size

> RFE filed:
> https://bitbucket.org/tmidas/midas/issues/367/odb-should-be-saved-to-disk-periodically

Implemented and closed: https://bitbucket.org/tmidas/midas/issues/367/odb-should-be-saved-to-disk-periodically

Stefan

09 Aug 2023, Konstantin Olchanski, Suggestion, Maximum ODB size

> > RFE filed:
> > https://bitbucket.org/tmidas/midas/issues/367/odb-should-be-saved-to-disk-periodically
> 
> Implemented and closed: https://bitbucket.org/tmidas/midas/issues/367/odb-should-be-saved-to-disk-periodically
> 
> Stefan

Stefan's comments from the closed bug report:

Ok I implemented some periodic flushing. Here is what I did:

Created

/System/Flush/Flush period : TID_UINT32 /System/Flush/Last flush : TID_UINT32

which control the flushing to disk. The default value for “Flush period” is 60 seconds or one minute.

All clients call db_flush_database() through their cm_yield() function
db_flush_database() checks the “Last flush” and only flushes the ODB when the period has expired. This test is 
done inside the ODB semaphore so that we don’t get a race condigiton
If the period has expired, db_flush_database() calls ss_shm_flush()
ss_shm_flush() tries to allocate a buffer of the shared memory. If the allocation is not successful (out of 
memory), ss_shm_flush() writes directly to the binary file as before.
If the allocation is successful, ss_shm_flush() copies the share memory to a buffer and passes this buffer to a 
dedicated thread which writes the buffer to the binary file. This causes ss_shm_flush() to return immediately and 
not block the calling program during the disk write operation.
Added back the “if (destroy_flag) ss_shm_flush()” so that the ODB is flushed for sure before the shared memory 
gets deleted.
This means now that under normal circumstances, exiting programs like odbedit do NOT flush the ODB. This allows to 
call many “odbedit -c” in a row without the flush penalty. Nevertheless, the ODB then gets flushed by other 
clients latest 60 seconds (or whatever the flush period is) after odbedit exits.

Please note that ODB flushing has two purposes:

When all programs exit, we need a persistent storage for the ODB. In most experiments this only happens very 
seldom. Maybe at the end of a beam time period.
If the computer crashes, a recent version of the ODB is kept on disk to simplify recovery after the crash.
Since crashes are not so often (during production periods we have maybe one hardware failure every few years) the 
flushing of the ODB too often does not make sense and just consumes resources. Flushing does also not help from 
corrupted ODBs, since the binary image will also get corrupted. So the only reason for periodic flushes is to ease 
recovery after a total crash. I put the default to 60 seconds, but if people are really paranoid they can decrease 
it to 10 seconds or so. Or increase it to 600 seconds if their system does not crash every week and disks are 
slow.

I made a dedicated branch feature/periodic_odb_flush so people can test the new functionality. If there are no 
complaints within the next few days, I will merge that into develop.

Stefan

06 Mar 2023, Gennaro Tortone, Forum, pull request for PostgreSQL support

Hi,

some minutes ago I published a PR for PostgreSQL support I developed
at INFN-Napoli for Darkside experiment...

I don't know if you receive a notification about this PR and in doubt
I wrote this message...

Thanks in advance,
Gennaro

06 Mar 2023, Konstantin Olchanski, Forum, pull request for PostgreSQL support

> some minutes ago I published a PR for PostgreSQL support I developed
> at INFN-Napoli for Darkside experiment...
> 
> I don't know if you receive a notification about this PR and in doubt
> I wrote this message...

Hi, Gennaro, thank you for the very useful contribution. I saw the previous version 
of your pull request and everything looked quite good. But that pull request was 
for an older version of midas and it would not have applied cleanly to the current 
version. I will take a look at your updated pull request. In theory it should only 
add the Postgres class and modify a few other places in history_schema.cxx and have 
no changes to anything else. (if you need those changes, it should be a separate 
pull request).

Also I am curious what benefits and drawbacks of Postgres vs mysql/mariadb you have 
observed for storing and using midas history data.

K.O.

06 Mar 2023, Gennaro Tortone, Forum, pull request for PostgreSQL support

Hi Konstantin,
thanks for this update |

My main interest for PostgreSQL is usage of TimescaleDB
(https://github.com/timescale/timescaledb) a PostgreSQL extension that
makes possible usage of downsampling functions on time-series...

here at INFN-Napoli we have a large history dataset that we manage
with MIDAS history and MySQL tables. We have a lot of issues 
(wait time, browser hangs, crashes) when we use MIDAS history plot 
pages on large time period because the Javascript web page try to 
download million of records in order to display them on a plot of 
(max) 2000 pixel width...

with native downsampling we can reduce a large dataset keeping the
"shape" of the curve using only the points needed by the plot area;

in TimescaleDB there is "lttb" ( Largest Triangle Three Bucket) a very 
nice and impressive downsampling function that preserve very well the
shape of the series.

If you are interested to see a lttb at work on some data you can open this page:
   https://www.base.is/flot

In next days I will work to add TimescaleDB backend to MIDAS history (it will be
similar to PostgreSQL backend) and we can discuss on how to add these 
downsampling features to history plot web pages, I already developed some
solutions and I will be happy to share them with MIDAS community;

Cheers,
Gennaro

> > some minutes ago I published a PR for PostgreSQL support I developed
> > at INFN-Napoli for Darkside experiment...
> > 
> > I don't know if you receive a notification about this PR and in doubt
> > I wrote this message...
> 
> Hi, Gennaro, thank you for the very useful contribution. I saw the previous version 
> of your pull request and everything looked quite good. But that pull request was 
> for an older version of midas and it would not have applied cleanly to the current 
> version. I will take a look at your updated pull request. In theory it should only 
> add the Postgres class and modify a few other places in history_schema.cxx and have 
> no changes to anything else. (if you need those changes, it should be a separate 
> pull request).
> 
> Also I am curious what benefits and drawbacks of Postgres vs mysql/mariadb you have 
> observed for storing and using midas history data.
> 
> K.O.

20 Mar 2023, Gennaro Tortone, Forum, pull request for PostgreSQL support

Hi,
I have updated the PR with a new one that includes TimescaleDB support and some
changes to mhistory.js to support downsampling queries...

Cheers,
Gennaro

> > some minutes ago I published a PR for PostgreSQL support I developed
> > at INFN-Napoli for Darkside experiment...
> > 
> > I don't know if you receive a notification about this PR and in doubt
> > I wrote this message...
> 
> Hi, Gennaro, thank you for the very useful contribution. I saw the previous version 
> of your pull request and everything looked quite good. But that pull request was 
> for an older version of midas and it would not have applied cleanly to the current 
> version. I will take a look at your updated pull request. In theory it should only 
> add the Postgres class and modify a few other places in history_schema.cxx and have 
> no changes to anything else. (if you need those changes, it should be a separate 
> pull request).
> 
> Also I am curious what benefits and drawbacks of Postgres vs mysql/mariadb you have 
> observed for storing and using midas history data.
> 
> K.O.

24 May 2023, Gennaro Tortone, Forum, pull request for PostgreSQL support

Hi,
is there any news regarding this pull request ? 
(https://bitbucket.org/tmidas/midas/pull-requests/30)

If you agree to merge I can resolve conflicts that now 
(after two months) are listed...

Regards,
Gennaro

> 
> Hi,
> I have updated the PR with a new one that includes TimescaleDB support and some
> changes to mhistory.js to support downsampling queries...
> 
> Cheers,
> Gennaro
> 
> > > some minutes ago I published a PR for PostgreSQL support I developed
> > > at INFN-Napoli for Darkside experiment...
> > > 
> > > I don't know if you receive a notification about this PR and in doubt
> > > I wrote this message...
> > 
> > Hi, Gennaro, thank you for the very useful contribution. I saw the previous version 
> > of your pull request and everything looked quite good. But that pull request was 
> > for an older version of midas and it would not have applied cleanly to the current 
> > version. I will take a look at your updated pull request. In theory it should only 
> > add the Postgres class and modify a few other places in history_schema.cxx and have 
> > no changes to anything else. (if you need those changes, it should be a separate 
> > pull request).
> > 
> > Also I am curious what benefits and drawbacks of Postgres vs mysql/mariadb you have 
> > observed for storing and using midas history data.
> > 
> > K.O.

18 Jul 2023, Konstantin Olchanski, Forum, pull request for PostgreSQL support

> is there any news regarding this pull request ? 
> (https://bitbucket.org/tmidas/midas/pull-requests/30)

apologies for taking a very long time to review the proposed changes.

the main problem with this pull request remains, it tangles together too many changes to the code and I cannot simply 
say "this is okey", merge and commit it.

example of unrelated change is diff of mlogger.cxx, change of function in: "db_get_value(hDB, 0, "/Logger/Multithread 
transitions" ... )". there is also unrelated changes to whitespace sprinkled around.

can you review your diffs again and try to remove as much unrelated and unnecessary changes as you can?

I could do this for you, and merge my version, but next time you merge base midas, you will have a collision.

unrelated change of function is introduction of something called "downsampling", what is the purpose of this? How is it 
different from requesting binned data? Is it just a kludge to reduce the data size? Before we merge it, can you post a 
description/discussion to this forum here? (as a separate topic, separate from discussion of PostgreSQL merge).

the changes to add PostgreSQL so fat look reasonable:
- CMakeLists, is always painful but if you do same a MySQL, should be okey, we always end up rejigging this several 
times before it works everywhere.
- history.h, ok, minus changes for adding the "downsample" feature
- mlogger.cxx, changes are too tangled with "downsample" feature, cannot review
- SetDownsample() API is defective, should have separate Get() and Set() functions
- history_common.cxx, please do not add downsampling code to history providers that do not/will not support it.
- history_odbc.cxx, please do not change it. it does not support downsampling and never will.
- history.cxx, ditto
- mjsonrpc.cxx, history API is changed, we must know: is new JS compatible with old mhttpd? is old JS compatible with 
new mhttpd? (mixed versions are very common in practice). if there is incompatibility, can you recoded it to be 
compatible?
- history_schema.cxx: bitbucket diff is a dog's breakfast, cannot review. I will have to checkout your branch and diff 
by hand.

changes to mhistory.js appear to be extensive and some explanation is needed for what is changed, what bugs/problems 
are fixed, what new features are added.

to move forward, can you generate a pull requests that only adds pgsql to history_schema.cxx, history_common.cxx and 
mlogger.cxx and does not add any other functions, features and does not change any whistespace?

K.O.


> 
> If you agree to merge I can resolve conflicts that now 
> (after two months) are listed...
> 
> Regards,
> Gennaro
> 
> > 
> > Hi,
> > I have updated the PR with a new one that includes TimescaleDB support and some
> > changes to mhistory.js to support downsampling queries...
> > 
> > Cheers,
> > Gennaro
> > 
> > > > some minutes ago I published a PR for PostgreSQL support I developed
> > > > at INFN-Napoli for Darkside experiment...
> > > > 
> > > > I don't know if you receive a notification about this PR and in doubt
> > > > I wrote this message...
> > > 
> > > Hi, Gennaro, thank you for the very useful contribution. I saw the previous version 
> > > of your pull request and everything looked quite good. But that pull request was 
> > > for an older version of midas and it would not have applied cleanly to the current 
> > > version. I will take a look at your updated pull request. In theory it should only 
> > > add the Postgres class and modify a few other places in history_schema.cxx and have 
> > > no changes to anything else. (if you need those changes, it should be a separate 
> > > pull request).
> > > 
> > > Also I am curious what benefits and drawbacks of Postgres vs mysql/mariadb you have 
> > > observed for storing and using midas history data.
> > > 
> > > K.O.

21 Jul 2023, Konstantin Olchanski, Forum, pull request for PostgreSQL support

> > is there any news regarding this pull request ? 
> > (https://bitbucket.org/tmidas/midas/pull-requests/30)
> 
> apologies for taking a very long time to review the proposed changes.
> 

I merged the PgSql bits by hand - the automatic tools make a dog's breakfast from the history_schema.cxx diffs. Ouch.

history_schema.cxx merged pretty much cleanly, but I have one question about CreateSqlColumn() with sql_strict set to "true". Can you say 
more why this is needed? Should this also be made the default for MySQL? The best I can tell the default values are only needed if we write 
to SQL but forget to provide values that should not be NULL? But our code never does this? Or this is for reading from SQL, where NULL values 
are replaced with the default values? I do not have time to look into this right now, I hope you can clarify it for me?

Also notice the fDownsample is set to zero and cannot be changed. I recommend we set it through the MakeMidasHistoryPgsql() factory method.

Please pull, merge, retest, update the pull request, check that there is no unrelated changes (changes in mlogger.cxx is a direct red flag!) 
and we should be able to merge the rest of your stuff pronto.

K.O.

commit e85bb6d37c85f02fc4895cae340ba71ab36de906 (HEAD -> develop, origin/develop, origin/HEAD)
Author: Konstantin Olchanski <olchansk@triumf.ca>
Date:   Fri Jul 21 09:45:08 2023 -0700

    merge PQSQL history in history_schema.cxx

commit f254ebd60a23c6ee2d4870f3b6b5e8e95a8f1f09
Author: Konstantin Olchanski <olchansk@triumf.ca>
Date:   Fri Jul 21 09:19:07 2023 -0700

    add PGSQL Makefile bits

commit aa5a35ba221c6f87ae7a811236881499e3d8dcf7
Author: Konstantin Olchanski <olchansk@triumf.ca>
Date:   Fri Jul 21 08:51:23 2023 -0700

    merge PGSQL support from https://bitbucket.org/gtortone/midas/branch/feature/timescaledb_support except for history_schema.cxx

21 Jul 2023, Gennaro Tortone, Forum, pull request for PostgreSQL support










Hi Konstantin,

thanks a lot for your work on PostgreSQL and TimescaleDB integration...
and sorry for unrelated changes on source code !

I will return on this task at end of this year (maybe October or November) because
I'm working on different tasks... but I will keep in mind your suggestions in order
to provide good source code.

Thanks,
Gennaro


> 
> I merged the PgSql bits by hand - the automatic tools make a dog's breakfast from the history_schema.cxx diffs. Ouch.
> 
> history_schema.cxx merged pretty much cleanly, but I have one question about CreateSqlColumn() with sql_strict set to "true". Can you say 
> more why this is needed? Should this also be made the default for MySQL? The best I can tell the default values are only needed if we write 
> to SQL but forget to provide values that should not be NULL? But our code never does this? Or this is for reading from SQL, where NULL values 
> are replaced with the default values? I do not have time to look into this right now, I hope you can clarify it for me?
> 
> Also notice the fDownsample is set to zero and cannot be changed. I recommend we set it through the MakeMidasHistoryPgsql() factory method.
> 
> Please pull, merge, retest, update the pull request, check that there is no unrelated changes (changes in mlogger.cxx is a direct red flag!) 
> and we should be able to merge the rest of your stuff pronto.
> 
> K.O.
> 
> commit e85bb6d37c85f02fc4895cae340ba71ab36de906 (HEAD -> develop, origin/develop, origin/HEAD)
> Author: Konstantin Olchanski <olchansk@triumf.ca>
> Date:   Fri Jul 21 09:45:08 2023 -0700
> 
>     merge PQSQL history in history_schema.cxx
> 
> commit f254ebd60a23c6ee2d4870f3b6b5e8e95a8f1f09
> Author: Konstantin Olchanski <olchansk@triumf.ca>
> Date:   Fri Jul 21 09:19:07 2023 -0700
> 
>     add PGSQL Makefile bits
> 
> commit aa5a35ba221c6f87ae7a811236881499e3d8dcf7
> Author: Konstantin Olchanski <olchansk@triumf.ca>
> Date:   Fri Jul 21 08:51:23 2023 -0700
> 
>     merge PGSQL support from https://bitbucket.org/gtortone/midas/branch/feature/timescaledb_support except for history_schema.cxx

28 Jul 2023, Stefan Ritt, Forum, pull request for PostgreSQL support

The compilation of midas was broken by the last modification. The reason is that 

   Pgsql *fPgsql = NULL;

was not protected by 

#ifdef HAVE_PGSQL

So I put all PGSQL code under a big #ifdef and now it compiles again. You might want to double check my modification at

https://bitbucket.org/tmidas/midas/commits/e3c7e73459265e0d7d7a236669d1d1f2d9292a74

Best,
Stefan

09 Aug 2023, Konstantin Olchanski, Forum, pull request for PostgreSQL support

> The compilation of midas was broken by the last modification. The reason is that 
>    Pgsql *fPgsql = NULL;
> was not protected by #ifdef HAVE_PGSQL

confirmed, my mistake, I forgot to test with "make cmake NO_PGSQL". your fix is correct, thanks.

K.O.

02 Aug 2023, Caleb Marshall, Forum, Issues with Universe II Driver

Hello,

At our lab we are currently in the process of migrating more of our systems over to Midas. However, all of our working systems are dependent on SBCs with the Tsi-148 chips of which we only have a handful. In order to have some backups and spares for testing, we have been attempting to get Midas working with some borrowed SBCs (Concurrent Technologies VX 40x/04x) with Universe-II chips. The SBC is running CentOS 7. I have tried to follow the instructions posted here. The universe-II kernel module appears to load correctly, dmesg gives:

[ 32.384826] VME: Board is system controller
[ 32.384875] VME: Driver compiled for SMP system
[ 32.384877] VME: Installed VME Universe module version: 3.6.KO6

However, running vmescan.exe fails with a segfault. Running with gdb shows:

vmic_mmap: Mapped VME AM 0x0d addr 0x00000000 size 0x00ffffff at address 0x80a01000
mvme_open:
Bus handle = 0x7
DMA handle = 0x6045d0
DMA area size = 1048576 bytes
DMA physical address = 0x7ffff7eea000
vmic_mmap: Mapped VME AM 0x2d addr 0x00000000 size 0x0000ffff at address 0x86ff0000

Program received signal SIGSEGV, Segmentation fault.
mvme_read_value (mvme=0x604010, vme_addr=<optimized out>)
at /home/jam/midas/packages/midas/drivers/vme/vmic/vmicvme.c:352
352 dst = *((WORD *)addr);

With the pointer addr originating from a call to vmic_mapcheck within the mvme_read_value functions in the vmicvme.c file. Help with where to go from here would be appreciated.

-Caleb

02 Aug 2023, Konstantin Olchanski, Forum, Issues with Universe II Driver

I maintain the tsi148 and the universe-II drivers. I confirm -KO6 is my latest 
version, last updated for 32-bit Debian-11, and we still use it at TRIUMF.

It is good news that the vme_universe kernel module built, loaded and reported 
correct stuff to dmesg.

It is not clear why mvme_read_value() crashed. We need to know the value of 
vme_addr and addr, can you add printf()s for them using format "%08x" and try 
again?

K.O.

<p>&nbsp;</p>

<table align="center" cellspacing="1" style="border:1px solid #486090; 
width:98%">
	<tbody>
		<tr>
			<td style="background-color:#486090">Caleb Marshall 
wrote:</td>
		</tr>
		<tr>
			<td style="background-color:#FFFFB0">
			<p>Hello,</p>

			<p>At our lab we are currently in the process of 
migrating more of our systems over to Midas. However, all of our working systems 
are dependent on SBCs with the Tsi-148 chips of which we only have a handful. In 
order to have some backups and spares for testing, we have been attempting to 
get Midas working with some borrowed SBCs (Concurrent Technologies VX 40x/04x) 
with Universe-II chips. The SBC is running&nbsp;CentOS 7.&nbsp;I have tried to 
follow the instructions posted <a 
href="https://daq00.triumf.ca/DaqWiki/index.php/VME-
CPU#V7648_and_V7750_BIOS_Settings">here</a>. The universe-II kernel module 
appears to load correctly, dmesg gives:</p>

			<p>[ &nbsp; 32.384826] VME: Board is system 
controller<br />
			[ &nbsp; 32.384875] VME: Driver compiled for SMP 
system<br />
			[ &nbsp; 32.384877] VME: Installed VME Universe module 
version: 3.6.KO6<br />
			&nbsp;</p>

			<p>However, running vmescan.exe fails with a segfault. 
Running with gdb shows:</p>

			<p>vmic_mmap: Mapped VME AM 0x0d addr 0x00000000 size 
0x00ffffff at address 0x80a01000<br />
			mvme_open:<br />
			Bus handle &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 
&nbsp;= 0x7<br />
			DMA handle &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 
&nbsp;= 0x6045d0<br />
			DMA area size &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; = 
1048576 bytes<br />
			DMA &nbsp; &nbsp;physical address = 0x7ffff7eea000<br />
			vmic_mmap: Mapped VME AM 0x2d addr 0x00000000 size 
0x0000ffff at address 0x86ff0000</p>

			<p>Program received signal SIGSEGV, Segmentation fault.
<br />
			mvme_read_value (mvme=0x604010, vme_addr=&lt;optimized 
out&gt;)<br />
			&nbsp; &nbsp; at 
/home/jam/midas/packages/midas/drivers/vme/vmic/vmicvme.c:352<br />
			352&nbsp;&nbsp; &nbsp; &nbsp; &nbsp;dst &nbsp;= *((WORD 
*)addr);<br />
			&nbsp;</p>

			<p>With the pointer addr originating from a call 
to&nbsp;vmic_mapcheck within the&nbsp;&nbsp;mvme_read_value functions in the 
vmicvme.c file. Help with where to go from here would be appreciated.</p>

			<p>-Caleb&nbsp;</p>

			<p>&nbsp;</p>
			</td>
		</tr>
	</tbody>
</table>

<p>&nbsp;</p>

03 Aug 2023, Caleb Marshall, Forum, Issues with Universe II Driver

Here is the output:

vmic_mmap: Mapped VME AM 0x0d addr 0x00000000 size 0x00ffffff at address 0x80a01000
mvme_open:
Bus handle              = 0x3
DMA handle              = 0x158f5d0
DMA area size           = 1048576 bytes
DMA    physical address = 0x7f91db553000
vmic_mmap: Mapped VME AM 0x2d addr 0x00000000 size 0x0000ffff at address 0x86ff0000
vme addr: 00000000 
addr: db543000

03 Aug 2023, Konstantin Olchanski, Forum, Issues with Universe II Driver

> Here is the output:
> 
> vmic_mmap: Mapped VME AM 0x0d addr 0x00000000 size 0x00ffffff at address 0x80a01000
> mvme_open:
> Bus handle              = 0x3
> DMA handle              = 0x158f5d0
> DMA area size           = 1048576 bytes
> DMA    physical address = 0x7f91db553000
> vmic_mmap: Mapped VME AM 0x2d addr 0x00000000 size 0x0000ffff at address 0x86ff0000
> vme addr: 00000000 
> addr: db543000 

I see the problem. A24 is mapped at 0x80xxxxxx, A16 is mapped at 0x86ffxxxx, but 
mvme_read computed address 0xdb543000, out of range of either mapped vme address. ouch.

One more thing to check, AFAIK, this universe-II codes were never used on 64-bit CPU 
before, we only have 32-bit Pentium-3 and Pentium-4 machines with these chips. The 
tsi148 codes used to work both 32-bit and 64-bit, we used to have both flavours of 
CPUs, but now only have 64-bit.

What is your output for "uname -a"? does it report 32-bit or 64-bit kernel?

If you feel adventurous, you can build 32-bit midas (cd .../midas; make linux32), 
compile vmescan.o with "-m32" and link vmescan.exe against .../midas/linux-m32/lib, and 
see if that works. Meanwhile, I can check if vmicvme.c is 64-bit clean. Checking if 
kernel module is 64-bit clean would be more difficult...

K.O.

03 Aug 2023, Caleb Marshall, Forum, Issues with Universe II Driver

I am looking into compiling the 32 bit midas.

In the meantime, here is the kernel info:

3.10.0-1160.11.1.el7.x86_64 

Thank you for the help.
-Caleb

04 Aug 2023, Caleb Marshall, Forum, Issues with Universe II Driver

I can compile 32 bit midas. Unless I am interpreting the linking error, I don't 
think I can use the driver as built. 

While trying to compile vme_scan, most of the programs fail with:

/usr/bin/ld: skipping incompatible /usr/lib/gcc/x86_64-redhat-
linux/4.8.5/../../../../lib/libvme.so when searching for -lvme
/usr/bin/ld: skipping incompatible /lib/../lib/libvme.so when searching for -lvme
/usr/bin/ld: skipping incompatible /usr/lib/../lib/libvme.so when searching for -
lvme
/usr/bin/ld: skipping incompatible /usr/lib/gcc/x86_64-redhat-
linux/4.8.5/../../../libvme.so when searching for -lvme
/usr/bin/ld: skipping incompatible //lib/libvme.so when searching for -lvme
/usr/bin/ld: skipping incompatible //usr/lib/libvme.so when searching for -lvme

with libvme.so being built by the universe-II driver. Not sure if I can get around 
this without messing with the driver? Is it possible to build a 32 bit version of 
that shared library without having to touch the actual kernel module? 

-Caleb

04 Aug 2023, Konstantin Olchanski, Forum, Issues with Universe II Driver

> I can compile 32 bit midas. Unless I am interpreting the linking error, I don't 
> think I can use the driver as built.

I think you are right, Makefile from the Universe package does not build a -m32 version 
of libvme.so. I think I can fix that...

K.O.

12 May 2023, Stefan Ritt, Info, New environment variable MIDAS_EXPNAME

A new environment variable MIDAS_EXPNAME has been introduced. This must be
used for cases where people use MIDAS_DIR, and is then equivalent for the
experiment name and directory usually used in the exptab file. This fixes
and issue with creating and deleting shared memory in midas as described in

https://bitbucket.org/tmidas/midas/issues/363/deletion-of-shared-memory-fails

The documentation has been updated at

https://daq00.triumf.ca/MidasWiki/index.php/MIDAS_environment_variables#MIDAS_EXPN
AME

/Stefan

12 May 2023, Konstantin Olchanski, Info, New environment variable MIDAS_EXPNAME

> A new environment variable MIDAS_EXPNAME has been introduced [to be used together with 
MIDAS_DIR]

This is fixes an important buglet. If experiment uses MIDAS_DIR instead of exptab, at the time 
of connecting to ODB, we do not know the experiment name and use name "Default" to create ODB 
shared memory, instead of actual experiment name.

This creates an inconsistency, if some MIDAS programs in the same experiment use MIDAS_DIR while 
others use exptab (this would be unusual, but not impossible) they would connect to two 
different ODB shared memories, former using name "Default", latter using actual experiment name.

As an indication that something is not right, when stopping MIDAS programs, there is an error 
message about failure to delete shared ODB shared memory (because it was created using name 
"Default" and delete using the correct experiment name fails).

Also it can cause co-mingling between two different experiments, depending on the type of shared 
memory used by MIDAS (see $MIDAS_DIR/.SHM_TYPE.TXT):
POSIX - (usually not used) not affected (experiment name is not used)
POSIXv2 - (usually not used) affected (shm name is "$EXP_NAME_ODB")
POSIXv3 - used on MacOS - affected (shm name is "$UID_$EXP_NAME_ODB" so "$UID_Default_ODB" will 
collide)
POSIXv4 - used on Linux - not affected (shm name includes $MIDAS_DIR which is different for 
different experiments)

K.O.

20 Jun 2023, Stefan Ritt, Info, New environment variable MIDAS_EXPNAME

I just realized that we had already MIDAS_EXPT_NAME, and now people get confused with

MIDAS_EXPT_NAME
and
MIDAS_EXPNAME

In trying to fix this confusion, I changed the name of the second variable to MIDAS_EXPT_NAME as well, 
so we only have one variable now. If this causes any problems please report here.

Stefan

28 Jul 2023, Stefan Ritt, Info, New environment variable MIDAS_EXPNAME

Concerning naming of shared memories I went one step further due to some requirement of a local experiment. 
The experiment needs to change the experiment name shown on the web status page depending on the exact 
configuration, but we do not want to change the whole midas experiment each time.

So I simple removed the check that the experiment name coming from the environment and used for the shared 
memory gets checked against the experiment name in the ODB. The only connection there is that the ODB name 
gets set to the environment name is it does not exist or is empty, just to have some default value. So for 
most people nothing should change. If one changes however the name in the ODB (under /Experiment/Name), 
nothing will change internally, just the web display via mhttpd changes its title.

I hope this has no bad side-effects, so please have a look if you see any issue in your experiment.

Stefan

24 Jul 2023, Nick Hastings, Bug Report, Incompatible data types with mysql odbc interface

Hello,

I have recently set up a midas-2022-05-c instance and have been trying to configure
it to use the mysql odbc interface. Tables are being created for it but
the logger is issuing errors that some of the column types are incorrect. For example
in the log I see:

14:22:12.689 2023/07/25 [Logger,ERROR] [history_odbc.cxx:1531:hs_define_event,ERROR] Error: History event 'Run transitions': Incompatible data type for tag 'State' type 'UINT32', SQL column 'state' type 'INT UNSIGNED'
14:22:12.689 2023/07/25 [Logger,ERROR] [history_odbc.cxx:1531:hs_define_event,ERROR] Error: History event 'Run transitions': Incompatible data type for tag 'Run number' type 'UINT32', SQL column 'run_number' type 'INT UNSIGNED'

Checking the table in the database I see:

MariaDB [t2kgscND280]> describe run_transitions;
+------------+------------------+------+-----+---------------------+-------------------------------+
| Field      | Type             | Null | Key | Default             | Extra                         |
+------------+------------------+------+-----+---------------------+-------------------------------+
| _t_time    | timestamp        | NO   | MUL | current_timestamp() | on update current_timestamp() |
| _i_time    | int(11)          | NO   | MUL | NULL                |                               |
| state      | int(10) unsigned | YES  |     | NULL                |                               |
| run_number | int(10) unsigned | YES  |     | NULL                |                               |
+------------+------------------+------+-----+---------------------+-------------------------------+
4 rows in set (0.000 sec)

Please note that this is not the only history variable that has this problem. There are multiple variables
for which:

Incompatible data type for tag 'Foo Bar' type 'UINT32', SQL column 'foo_bar' type 'INT UNSIGNED'

Checking history_odbc.cxx, I see:

static const char *sql_type_mysql[] = {
   "xxxINVALIDxxxNULL", // TID_NULL
   "tinyint unsigned",  // TID_UINT8
   "tinyint",           // TID_INT8
   "char",              // TID_CHAR
   "smallint unsigned", // TID_UINT16
   "smallint",          // TID_INT16
   "integer unsigned",  // TID_UINT32
   "integer",           // TID_INT32
   "tinyint",           // TID_BOOL
   "float",             // TID_FLOAT
   "double",            // TID_DOUBLE
   "tinyint unsigned",  // TID_BITFIELD
   "VARCHAR",           // TID_STRING
   "xxxINVALIDxxxARRAY",
   "xxxINVALIDxxxSTRUCT",
   "xxxINVALIDxxxKEY",
   "xxxINVALIDxxxLINK"
};

So it seems that unsigned int should map to UINT32.

The database is:
Server version: 10.5.16-MariaDB MariaDB Server

Please let me know if more information is needed.

Note that the choice of using the odbc interface is because we
plan to import an old database that was created using the odbc interface
with a previous version of midas (yes this is your old friend T2K/ND280).

Regards,

Nick.

25 Jul 2023, Nick Hastings, Bug Report, Incompatible data types with mysql odbc interface

Hello,

wanted add few things:

1. I see the same problem for INT32
2. For now I've worked around these problems with https://bitbucket.org/nickhastings/midas/commits/e4776f7511de0647077c8c80d43c17bbfe2184fd
3. I'm using mariadb-connector-odbc-3.1.12-3.el9.x86_64 (System is AlmaLinux 9)

Regards,

Nick.

18 Jul 2023, Gennaro Tortone, Bug Report, access to filesystem through mhttpd

Hi,

after some networks security scans I received some warnings because mhttpd expose
server filesystem through HTTP(S)...

in details a MIDAS user can access to /etc/passwd or download other files from
filesystem using a web browser:

(e.g. http://midas.host:8080/etc/passwd)

I know that /etc/passwd does not contain users password and mhttpd runs as an
unprivileged user but in principle this should be avoided in order to minimize 
security risks: if I authorize a user to use MIDAS interface in order to handle
acquisition tasks this should not authorize the user to access the server filesystem...
but this access should be restricted to MIDAS web pages, custom pages etc.

What do you think about this ?

Cheers,
Gennaro

18 Jul 2023, Konstantin Olchanski, Bug Report, access to filesystem through mhttpd

> (e.g. http://midas.host:8080/etc/passwd)

not again! I complained about this before, and I added a fix, but it must be broken again.

getting a copy of /etc/passwd is reasonably benign, but getting a copy of 
/home/$USER/.ssh/id_rsa, id_rsa.pub, knownhosts and authorized_keys is a disaster.

(running mhttpd behind a web proxy does not solve the problem, number of attackers is 
reduced to only the people who know the proxy password and to local users).

K.O.

19 Jul 2023, Zaher Salman, Bug Report, access to filesystem through mhttpd

Have you actually been able to read /etc/passwd this way? I tested this on a few of our servers and it does not work. As far as I know, there is access to files in resources, custom pages etc.

Other possible ways to access the file system is via mjsonrpc calls, but again these are restricted to certain folders.

Can you please give us more details about this.

Zaher

> > (e.g. http://midas.host:8080/etc/passwd)
> 
> not again! I complained about this before, and I added a fix, but it must be broken again.
> 
> getting a copy of /etc/passwd is reasonably benign, but getting a copy of 
> /home/$USER/.ssh/id_rsa, id_rsa.pub, knownhosts and authorized_keys is a disaster.
> 
> (running mhttpd behind a web proxy does not solve the problem, number of attackers is 
> reduced to only the people who know the proxy password and to local users).
> 
> K.O.

11 Jul 2023, Anubhav Prakash, Forum, Possible ODB corruption! Webpages https://midptf01.triumf.ca/?cmd=Programs not loading!

The ODB server seems to have crashed/corrupted. I tried reloading the previous
working version of ODB(using the commands in folliwng image) but it didn't work.

I have also attached the screenshot of the site https://midptf01.triumf.ca/?cmd=Programs. Any help to resolve this would be appreciated! Normally Prof. Thomas Lindner would solve such issues, but he is busy working at CERN till 17th of July, and we cannot afford to wait until then.

The following is the error: when I run bash /home/midptf/online/bin/start_daq.sh

[ODBEdit1,INFO] Fixing ODB "/Programs/ODBEdit" struct size mismatch (expected
316, odb size 92)
[ODBEdit1,ERROR] [odb.cxx:556:realloc_data,ERROR] cannot malloc_data(256),
called from db_set_link_data
[ODBEdit1,ERROR] [odb.cxx:6923:db_set_link_data,ERROR] Cannot reallocate
"/System/Tmp/140305391605888I/Start command" with new size 256 bytes, online
database full
[ODBEdit1,ERROR] [odb.cxx:8531:db_paste,ERROR] found string length of zero, set
to 32, odb path "Start command"
[ODBEdit1,ERROR] [odb.cxx:11293:db_get_record,ERROR] struct size mismatch for
"/Programs/ODBEdit" (expected size: 316, size in ODB: 92)
[ODBEdit1,ERROR] [odb.cxx:556:realloc_data,ERROR] cannot malloc_data(256),
called from db_set_link_data
[ODBEdit1,ERROR] [odb.cxx:6923:db_set_link_data,ERROR] Cannot reallocate
"/System/Tmp/140305391605888I/Start command" with new size 256 bytes, online
database full
[ODBEdit1,ERROR] [odb.cxx:8531:db_paste,ERROR] found string length of zero, set
to 32, odb path "Start command"
[ODBEdit1,ERROR] [odb.cxx:11381:db_get_record1,ERROR] after db_check_record()
still struct size mismatch (expected 316, odb size 92) of "/Programs/ODBEdit",
calling db_create_record()
[ODBEdit1,ERROR] [odb.cxx:556:realloc_data,ERROR] cannot malloc_data(256),
called from db_set_link_data
[ODBEdit1,ERROR] [odb.cxx:6923:db_set_link_data,ERROR] Cannot reallocate
"/System/Tmp/140305391605888I/Start command" with new size 256 bytes, online
database full
[ODBEdit1,ERROR] [odb.cxx:8531:db_paste,ERROR] found string length of zero, set
to 32, odb path "Start command"
[ODBEdit1,ERROR] [odb.cxx:11387:db_get_record1,ERROR] repaired struct size
mismatch of "/Programs/ODBEdit"
[ODBEdit1,ERROR] [odb.cxx:11293:db_get_record,ERROR] struct size mismatch for
"/Programs/ODBEdit" (expected size: 316, size in ODB: 92)
[ODBEdit1,ERROR] [alarm.cxx:702:al_check,ERROR] Cannot get program info record
for program "ODBEdit", db_get_record1() status 319
[ODBEdit1,INFO] Fixing ODB "/Programs/mhttpd" struct size mismatch (expected
316, odb size 60)
[ODBEdit1,ERROR] [odb.cxx:556:realloc_data,ERROR] cannot malloc_data(256),
called from db_set_link_data
[ODBEdit1,ERROR] [odb.cxx:6923:db_set_link_data,ERROR] Cannot reallocate
"/System/Tmp/140305391605888I/Start command" with new size 256 bytes, online
database full
[ODBEdit1,ERROR] [odb.cxx:8531:db_paste,ERROR] found string length of zero, set
to 32, odb path "Start command"
[ODBEdit1,ERROR] [odb.cxx:8531:db_paste,ERROR] found string length of zero, set
to 32, odb path "Start command"
[ODBEdit1,ERROR] [odb.cxx:11293:db_get_record,ERROR] struct size mismatch for
"/Programs/mhttpd" (expected size: 316, size in ODB: 92)
[ODBEdit1,ERROR] [odb.cxx:556:realloc_data,ERROR] cannot malloc_data(256),
called from db_set_link_data
[ODBEdit1,ERROR] [odb.cxx:6923:db_set_link_data,ERROR] Cannot reallocate
"/System/Tmp/140305391605888I/Start command" with new size 256 bytes, online
database full
[ODBEdit1,ERROR] [odb.cxx:8531:db_paste,ERROR] found string length of zero, set
to 32, odb path "Start command"
[ODBEdit1,ERROR] [odb.cxx:11381:db_get_record1,ERROR] after db_check_record()
still struct size mismatch (expected 316, odb size 92) of "/Programs/mhttpd",
calling db_create_record()
[ODBEdit1,ERROR] [odb.cxx:556:realloc_data,ERROR] cannot malloc_data(256),
called from db_set_link_data
[ODBEdit1,ERROR] [odb.cxx:6923:db_set_link_data,ERROR] Cannot reallocate
"/System/Tmp/140305391605888I/Start command" with new size 256 bytes, online
database full
[ODBEdit1,ERROR] [odb.cxx:8531:db_paste,ERROR] found string length of zero, set
to 32, odb path "Start command"
[ODBEdit1,ERROR] [odb.cxx:11387:db_get_record1,ERROR] repaired struct size
mismatch of "/Programs/mhttpd"
[ODBEdit1,ERROR] [odb.cxx:11293:db_get_record,ERROR] struct size mismatch for
"/Programs/mhttpd" (expected size: 316, size in ODB: 92)
[ODBEdit1,ERROR] [alarm.cxx:702:al_check,ERROR] Cannot get program info record
for program "mhttpd", db_get_record1() status 319
[ODBEdit1,INFO] Fixing ODB "/Programs/Logger" struct size mismatch (expected
316, odb size 60)
[ODBEdit1,ERROR] [odb.cxx:556:realloc_data,ERROR] cannot malloc_data(256),
called from db_set_link_data
[ODBEdit1,ERROR] [odb.cxx:6923:db_set_link_data,ERROR] Cannot reallocate
"/System/Tmp/140305391605888I/Start command" with new size 256 bytes, online
database full
[ODBEdit1,ERROR] [odb.cxx:8531:db_paste,ERROR] found string length of zero, set
to 32, odb path "Start command"
[ODBEdit1,ERROR] [odb.cxx:8531:db_paste,ERROR] found string length of zero, set
to 32, odb path "Start command"
[ODBEdit1,ERROR] [odb.cxx:11293:db_get_record,ERROR] struct size mismatch for
"/Programs/Logger" (expected size: 316, size in ODB: 92)
[ODBEdit1,ERROR] [odb.cxx:556:realloc_data,ERROR] cannot malloc_data(256),
called from db_set_link_data
[ODBEdit1,ERROR] [odb.cxx:6923:db_set_link_data,ERROR] Cannot reallocate
"/System/Tmp/140305391605888I/Start command" with new size 256 bytes, online
database full
[ODBEdit1,ERROR] [odb.cxx:8531:db_paste,ERROR] found string length of zero, set
to 32, odb path "Start command"
[ODBEdit1,ERROR] [odb.cxx:11381:db_get_record1,ERROR] after db_check_record()
still struct size mismatch (expected 316, odb size 92) of "/Programs/Logger",
calling db_create_record()
[ODBEdit1,ERROR] [odb.cxx:556:realloc_data,ERROR] cannot malloc_data(256),
called from db_set_link_data
[ODBEdit1,ERROR] [odb.cxx:6923:db_set_link_data,ERROR] Cannot reallocate
"/System/Tmp/140305391605888I/Start command" with new size 256 bytes, online
database full
[ODBEdit1,ERROR] [odb.cxx:8531:db_paste,ERROR] found string length of zero, set
to 32, odb path "Start command"
[ODBEdit1,ERROR] [odb.cxx:11387:db_get_record1,ERROR] repaired struct size
mismatch of "/Programs/Logger"
[ODBEdit1,ERROR] [odb.cxx:11293:db_get_record,ERROR] struct size mismatch for
"/Programs/Logger" (expected size: 316, size in ODB: 92)
[ODBEdit1,ERROR] [alarm.cxx:702:al_check,ERROR] Cannot get program info record
for program "Logger", db_get_record1() status 319

14:54:29 [ODBEdit,ERROR] [odb.cxx:1763:db_validate_db,ERROR] Warning: database
data area is 100% full

14:54:29 [ODBEdit,ERROR] [odb.cxx:1283:db_validate_key,ERROR] hkey 643368, path
"/Alarms/Classes/<NULL>/Display BGColor", string value is not valid UTF-8

14:54:29 [ODBEdit1,ERROR] [odb.cxx:556:realloc_data,ERROR] cannot
malloc_data(256), called from db_set_link_data

14:54:29 [ODBEdit1,ERROR] [odb.cxx:6923:db_set_link_data,ERROR] Cannot
reallocate "/System/Tmp/140305391605888I/Start command" with new size 256 bytes,
online database full

14:54:29 [ODBEdit1,ERROR] [odb.cxx:8531:db_paste,ERROR] found string length of
zero, set to 32, odb path "Start command"

14:54:29 [ODBEdit1,ERROR] [odb.cxx:8531:db_paste,ERROR] found string length of
zero, set to 32, odb path "Start command"

11 Jul 2023, Thomas Lindner, Forum, Possible ODB corruption! Webpages https://midptf01.triumf.ca/?cmd=Programs not loading!

Hi Anubhav,

I have fixed the ODB corruption problem.

Cheers,
Thomas

Anubhav Prakash wrote:

The ODB server seems to have crashed/corrupted. I tried reloading the previous
working version of ODB(using the commands in folliwng image) but it didn't work.

21 Jun 2023, Gennaro Tortone, Bug Report, mserver and script execution

Hi,
I have the following setup:

- MIDAS release: release/midas-2022-05-c
- host with MIDAS frontend (mclient)
- host with MIDAS server (mhttpd / mserver)

On mclient I run a frontend with:

./feodt5751 -h mserver -e develop -i 0

On mserver I see frontend ready and ODB variables in place;

I noticed a strange behavior with "/Programs/Execute on start run" and 
"/Programs/Execute on stop run". In details the script to execute at start of run
is executed on "mserver" host but the script to execute at stop of run is executed on
"mclient" host (!)

Is this a bug or I'm missing some documentation links ?

Thanks in advance,
Gennaro

26 Jun 2023, Stefan Ritt, Bug Report, mserver and script execution

Indeed that could well be (and is certainly not intended like that). I checked the code
and found that "execute on start run" and "execute on stop run" are called inside
cm_transition(). That means they are executed on the computer which calls cm_transition().
If you use mhttpd and start a run through the web interface, then mhttpd runs on your
server and "execute on start run" gets executed on your server. If you stop the run
by your frontend running on the client machine (like if a certain number of events 
is reached), then "execute on stop run" gets executed on your client.

An easy way around would not to use "/Equipment/Trigger/Common/Event limit" which
gets check by your frontend and therefore on the client computer, but use 
"/Logger/Channels/0/Settings/Event limit" which gets checked by the logger and
therefore executed on the server computer.

Getting a consistent behaviour (like always executing scripts on the server) would
require a major rework of the run transition framework with probably many undesired
side-effects, so lots of debugging work.

Stefan

27 Jun 2023, Gennaro Tortone, Bug Report, mserver and script execution

Hi Stefan,

> Indeed that could well be (and is certainly not intended like that). I checked the code
> and found that "execute on start run" and "execute on stop run" are called inside
> cm_transition(). That means they are executed on the computer which calls cm_transition().
> If you use mhttpd and start a run through the web interface, then mhttpd runs on your
> server and "execute on start run" gets executed on your server. If you stop the run
> by your frontend running on the client machine (like if a certain number of events 
> is reached), then "execute on stop run" gets executed on your client.

ok, this is clear to me...

> An easy way around would not to use "/Equipment/Trigger/Common/Event limit" which
> gets check by your frontend and therefore on the client computer, but use 
> "/Logger/Channels/0/Settings/Event limit" which gets checked by the logger and
> therefore executed on the server computer.

we never used "/Equipment/Trigger/Common/Event limit" but we always used
"/Logger/Channels/0/Settings/Event limit"...

btw I did some tests and I understand that this issue is related to 'deferred transition'
on frontend. Indeed I disabled deferred transition on frontend side and now script
execution is carried out always on MIDAS server;

Cheers,
Gennaro

27 Jun 2023, Stefan Ritt, Bug Report, mserver and script execution

> btw I did some tests and I understand that this issue is related to 'deferred transition'
> on frontend. Indeed I disabled deferred transition on frontend side and now script
> execution is carried out always on MIDAS server;

Ah, that's clear now. In a deferred transition, the frontend finally stops the run (after the 
condition is given to finish). Since the client calls cm_transition(), the script gets executed on 
the client. Changing that would be a rather large rework of the code. So maybe better call a 
script which executes another script via ssh on the server.

Stefan

23 Jun 2023, Gennaro Tortone, Bug Report, deferred stop transition

Hi,

I'm facing some issues with 'stop' deferred transition and I suspect of
a MIDAS bug regarding this...

to reproduce the issue I use the 'deferredfe' MIDAS example (develop branch), 
changing only the equipment name from 'Deferred' to 'Deferred%02d' in order
to be able to run multiple 'deferredfe' instances;

I run *three* 'deferredfe' frontends using:

./deferredfe -i 0
./deferredfe -i 2
./deferredfe -i 3

Everything goes fine on MIDAS web page and 'deferredfe' frontends are initialized
and ready to run; issues occour after 'start' when I stop the frontends: sometimes
at first shot and sometimes at next 'start'/'stop' the deferred 'stop' transition
seems to be handled in wrong way... and often one frontend goes in 'segmentation fault'

The odd thing is when I run *two* instances: in this case no issues are reported...

Thanks in advance,
Gennaro

23 Jun 2023, Stefan Ritt, Bug Report, deferred stop transition

Deferred transitions were only implemented with a single instance of a program deferring the 
transition. To have several instances, MIDAS probably needs to be extended. Certainly this 
was never tested, so it's not a surprise that we get a segmentation fault.

Stefan

23 Jun 2023, Gennaro Tortone, Bug Report, deferred stop transition

Hi Stefan,

so if I have two different frontends (feov1725 and feodt5751) connected on the same 'mserver'
I'm in the same situation ?

Cheers,
Gennaro

> Deferred transitions were only implemented with a single instance of a program deferring the 
> transition. To have several instances, MIDAS probably needs to be extended. Certainly this 
> was never tested, so it's not a surprise that we get a segmentation fault.
> 
> Stefan

23 Jun 2023, Stefan Ritt, Bug Report, deferred stop transition

> I'm in the same situation ?

Yepp.

26 Jun 2023, Gennaro Tortone, Bug Report, deferred stop transition

> Deferred transitions were only implemented with a single instance of a program deferring the 
> transition. To have several instances, MIDAS probably needs to be extended. Certainly this 
> was never tested, so it's not a surprise that we get a segmentation fault.
> 
> Stefan

Hi Stefan,

I copied deferredfe.cxx to mydeferredfe.cxx and I changed mydeferred.cxx to be a different frontend:

const char *frontend_name = "mydeferredfe";

If I start two "different" frontends:

./deferredfe
./mydeferredfe 

and try to start/stop a run... the result is the same: frontend status messing up on next 'start':

---

deferredfe:

Started run 332
Event ID:4 - Event#: 0
Event ID:4 - Event#: 1
Event ID:4 - Event#: 2
Event ID:4 - Event#: 3
Event ID:4 - Event#: 4
Event ID:4 - Event#: 5
Event ID:4 - Event#: 6

mydeferredfe:

Started run 332
Transition ignored, Event ID:2 - Event#: 0
Transition ignored, Event ID:2 - Event#: 1
Transition ignored, Event ID:2 - Event#: 2
Transition ignored, Event ID:2 - Event#: 3
Transition ignored, Event ID:2 - Event#: 4
End of cycle... perform transition
Event ID:2 - Event#: 5
End of cycle... perform transition
Event ID:2 - Event#: 6

---

so, it seems that the issue is not related to different 'instances' of same frontend but
that *at most* one frontend on whole MIDAS server can handle deferred transitions...

is this the case ?

Cheers,
Gennaro

26 Jun 2023, Stefan Ritt, Bug Report, deferred stop transition

> so, it seems that the issue is not related to different 'instances' of same frontend but
> that *at most* one frontend on whole MIDAS server can handle deferred transitions...
> 
> is this the case ?


Correct.

- Stefan

13 Jun 2023, Thomas Senger, Forum, Include subroutine through relative path in sequencer

Hi, I would like to restructure our sequencer scripts and the paths. Until now many things are not generic at all. I would like to ask if it is possible to include files through a relative path for example something like 
INCLUDE ../chip/global_basic_functions
Maybe I just did not found how to do it.

13 Jun 2023, Stefan Ritt, Forum, Include subroutine through relative path in sequencer

> Hi, I would like to restructure our sequencer scripts and the paths. Until now many things are not generic at all. I would like to ask if it is possible to include files through a relative path for example something like 
> INCLUDE ../chip/global_basic_functions
> Maybe I just did not found how to do it.

It was not there. I implemented it in the last commit.

Stefan

13 Jun 2023, Marco Francesconi, Forum, Include subroutine through relative path in sequencer

> > Hi, I would like to restructure our sequencer scripts and the paths. Until now many things are not generic at all. I would like to ask if it is possible to include files through a relative path for example something like 

> > INCLUDE ../chip/global_basic_functions

> > Maybe I just did not found how to do it.

> 

> It was not there. I implemented it in the last commit.

> 

> Stefan



Hi Stefan,

when I did this job for MEG II we decided not to include relative paths and the ".." folder to avoid an exploit called "XML Entity Injection".

In short is to avoid leaking files outside the sequencer folders like  /etc/password or private SSH keys.

I do not remember in this moment why we pushed for absolute paths instead but let's keep this in mind.



Marco

13 Jun 2023, Stefan Ritt, Forum, Include subroutine through relative path in sequencer

> when I did this job for MEG II we decided not to include relative paths and the ".." folder to avoid an exploit called "XML Entity Injection".
> In short is to avoid leaking files outside the sequencer folders like  /etc/password or private SSH keys.
> I do not remember in this moment why we pushed for absolute paths instead but let's keep this in mind.

I thought about that. But before we had absolute paths in the sequencer INCLUDE statement. So having "../../../etc/passwd" is as bad as the
absolute path "/etc/passwd". So nothing really changed. What we really should prevent is to LOAD files into the sequencer from outside the
sequence subdirectory. And this is prevented by the file loader. Actually we will soon replace the file loaded with a modern JS dialog, and
the code restricts all operations to within the experiment directory and below.

Stefan

09 Jun 2023, Konstantin Olchanski, Info, added IPv6 support for mserver and MIDAS RPC

as of commit 71fb5e82f3e9a1501b4dc2430f9193ee5d176c82, MIDAS RPC and the mserver 
listen for connections both on IPv4 and IPv6. mserver clients and MIDAS RPC 
clients can connect to MIDAS using both IPv4 and IPv6. In the default 
configuration ("/Expt/Security/Enable non-localhost RPC" set to "n"), IPv4 
localhost is used, as before. Support for IPv6 is a by product from switching 
from obsolete non-thread-safe gethostbyname() and getaddrbyname() to modern 
getaddrinfo() and getnameinfo(). This fixes bug 357, observed crash of mhttpd 
inside gethostbyname(). K.O.

23 May 2023, Kou Oishi, Bug Report, Event builder fails at every 10 runs

Dear MIDAS experts,

Greetings! 
I am currently utilizing MIDAS for our experiment and I have encountered an issue with our event builder, which was developed based on the example code 'eventbuilder/mevb.cxx'. I'm uncertain whether this is a genuine bug or an inherent feature of MIDAS.

The event builder fails to initiate the 10th run since its startup, requiring us to relaunch it. Upon investigating the code, I have identified that this issue stems from line 8404 of mfe.cxx (the version's hash is db94df6fa79772c49888da9374e143067a1fff3a). According to the code, the 10-run limit is imposed by the variable MAX_EVENT_REQUESTS in midas.h. While I can increase this value as the code suggests, it does not provide a complete solution, as the same problem will inevitably resurface. This complication unnecessarily hampers our data collection during long observation periods.

Despite the code indicating 'BM_NO_MEMORY: too many requests,' this explanation does not seem logical to me. In fact, other standard frontends do not encounter this problem and can start new runs as required without requiring a frontend relaunch.

I apologize for not yet fully grasping the intricate implementation of midas.cxx and mfe.cxx. However, I would greatly appreciate any suggestions or insights you can offer to help resolve this issue.

Thank you in advance for your kind assistance.

31 May 2023, Ben Smith, Bug Report, Event builder fails at every 10 runs

> The event builder fails to initiate the 10th run since its startup, 
> 'BM_NO_MEMORY: too many requests,'

Hi Kou,

It sounds like you might be calling bm_request_event() when starting a run, but not calling bm_delete_request() when the run stops. So you end up "leaking" event requests and eventually reach the limit of 10 open requests.

In examples/eventbuilder/mevb.c the request deletion happens in source_unbooking(), which is called as part of the "run stopping" logic. I've just updated the midas repository so the example compiles correctly, and was able to start/stop 15 runs without crashing.

Can you check the end-of-run logic in your version to ensure you're calling bm_delete_request()?

02 Jun 2023, Kou Oishi, Bug Report, Event builder fails at every 10 runs

Dear Ben,

Hello. Thank you for your attention to this problem!

> It sounds like you might be calling bm_request_event() when starting a run, but not calling bm_delete_request() when the run stops. So you end up "leaking" event requests and eventually reach the limit of 10 open requests.

I understand. Thanks for the description.

> In examples/eventbuilder/mevb.c the request deletion happens in source_unbooking(), which is called as part of the "run stopping" logic. I've just updated the midas repository so the example compiles correctly, and was able to start/stop 15 runs without crashing.
> 
> Can you check the end-of-run logic in your version to ensure you're calling bm_delete_request()?

I really appreciate your update.
Although I am away at the moment from the DAQ development, I will test it and report the result here as soon as possible.

Best regards,
Kou

10 May 2023, Lukas Gerritzen, Suggestion, Desktop notifications for messages

It would be nice to have MIDAS notifications pop up outside of the browser window.

To get enable this myself, I hijacked the speech synthesis and I added the following to mhttpd_speak_now(text) inside mhttpd.js:

let notification = new Notification('MIDAS Message', {
    body: text,
});

I couldn't ask for the permission for notifications here, as Firefox threw the error "The Notification permission may only be requested from inside a short running user-generated event handler". Therefore, I added a button to config.html:

<button class="mbutton" onclick="Notification.requestPermission()">Request notification permission</button>

There might be a more elegant solution to request the permission.

10 May 2023, Stefan Ritt, Suggestion, Desktop notifications for messages

Lukas Gerritzen wrote:

It would be nice to have MIDAS notifications pop up outside of the browser window.

There are certainly dozens of people who do "I don't like pop-up windows all the time". So this has to come with a switch in the config page to turn it off. If there is a switch "allow pop-up windows", then we have the other fraction of people using Edge/Chrome/Safari/Opera saying "it's not working on my specific browser on version x.y.z". So I'm only willing to add that feature if we are sure it's a standard things working in most environments.

Best,
Stefan

10 May 2023, Lukas Gerritzen, Suggestion, Desktop notifications for messages

Stefan Ritt wrote:

people using Edge/Chrome/Safari/Opera saying "it's not working on my specific browser on version x.y.z". So I'm only willing to add that feature if we are sure it's a standard things working in most environments.

[The API looks pretty standard to me. Firefox, Chrome, Opera have been supporting it for about 9 years, Safari for almost 6. I didn't find out when Edge 14 was released, but they're at version 112 now.

Since browsers don't want to annoy their users, many don't allow websites to ask for permissions without user interaction. So the workflow would be something like: The user has to press a button "please ask for permission", then the browser opens a dialog "do you want to grant this website permission to show notifications?" and only then it works. So I don't think it's an annoying popup-mess, especially since system notifications don't capture the focus and typically vanish after a few seconds. If that feature is hidden behind a button on the config page, it shouldn't lead to surprises. Especially since users can always revoke that permission.

11 May 2023, Stefan Ritt, Suggestion, Desktop notifications for messages

Ok, I implemented desktop notifications. In the MIDAS config page, you can now enable browser notifications for the different types of messages. Not sure this works perfectly, but a staring point. So please let me know if there is any issue.

Stefan

10 May 2023, Lukas Gerritzen, Suggestion, Make sequencer more compatible with mobile devices

When trying to select a run script on an iPad or other mobile device, you cannot enter subdirectories. This is caused by the following part:

if (script.substring(0, 1) === "[") {
   // refuse to load script if the selected a subdirectory
   return;
}

and the fact that the <option> elements are listening for double click events, which seem to be impossible on a mobile device.

The following modification allows browsing the directories without changing the double click behaviour on a desktop:

diff --git a/resources/load_script.html b/resources/load_script.html
index 41bfdccd..36caa57f 100644
--- a/resources/load_script.html
+++ b/resources/load_script.html
@@ -59,6 +59,28 @@
 </div>

 <script>
+   document.getElementById("msg_sel").onchange = function() {
+      script = this.value;
+      button = document.getElementById("load_button");
+      if (script.substring(0, 4) === "[..]") {
+         // Change button to go back
+         enable_button_by_id("load_button");
+         button.innerHTML = "Back";
+         button.onclick = up_subdir;
+      } else if (script.substring(0, 1) === "[") {
+         // Change button to load subdirectory
+         enable_button_by_id("load_button");
+         button.innerHTML = "Enter subdirectory";
+         button.onclick = load_subdir;
+      } else {
+         // Change button to load script
+         enable_button_by_id("load_button");
+         button = document.getElementById("load_button");
+         button.innerHTML = "Load script";
+         button.onclick = load_script;
+      }
+   }
+
 function set_if_changed(id, value)
 {
    var e = document.getElementById(id);

This makes the code quoted above redundant, so the check can actually be omitted.

10 May 2023, Stefan Ritt, Suggestion, Make sequencer more compatible with mobile devices

Lukas Gerritzen wrote:

When trying to select a run script on an iPad or other mobile device, you cannot enter subdirectories. This is caused by the following part:

We are working right now on a general file picker, which will replace also the file picker for the sequencer. So please wait until the new thing is out and then test it there.

Stefan

08 May 2023, Alexey Kalinin, Forum, Scrript in sequencer

Hello,

I tried different ways to pass parameters to bash script, but there are seems to 
be empty, what could be the problem?

We have seuqencer like

ODBGET "/Runinfo/runnumber", firstrun
LOOP n,10
#changing HV
TRANSITION start
WAIT seconds,300
TRANSITION stop
ENDLOOP
ODBGET "/Runinfo/runnumber", lastrun
SCRIPT /.../script.sh ,$firstrun ,$lastrun

and script.sh like
firstrun=$1
lastrun=$2

Thanks. Alexey.

08 May 2023, Stefan Ritt, Forum, Scrript in sequencer

> I tried different ways to pass parameters to bash script, but there are seems to 
> be empty, what could be the problem?

Indeed there was a bug in the sequencer with parameter passing to scripts. I fixed it
and committed the changes to the develop branch.

Stefan

09 May 2023, Alexey Kalinin, Forum, Scrript in sequencer

Thanks. It works perfect.
Another question is:
Is it possible to run .msl seqscript from bash cmd?
Maybe it's easier then
1 odbedit -c 'set "/sequencer/load filename" filename.msl'
2 odbedit -c 'set "/sequencer/load new file" TRUE'
3 odbedit -c 'set "/sequencer/start script" TRUE'

What is the best way to have a button starting sequencer
from  /script (or /alias )?

Alexey.

> > I tried different ways to pass parameters to bash script, but there are seems to 
> > be empty, what could be the problem?
> 
> Indeed there was a bug in the sequencer with parameter passing to scripts. I fixed it
> and committed the changes to the develop branch.
> 
> Stefan

10 May 2023, Stefan Ritt, Forum, Scrript in sequencer

> Thanks. It works perfect.
> Another question is:
> Is it possible to run .msl seqscript from bash cmd?
> Maybe it's easier then
> 1 odbedit -c 'set "/sequencer/load filename" filename.msl'
> 2 odbedit -c 'set "/sequencer/load new file" TRUE'
> 3 odbedit -c 'set "/sequencer/start script" TRUE'

That will work.

> What is the best way to have a button starting sequencer
> from  /script (or /alias )?

Have a look at 

https://daq00.triumf.ca/MidasWiki/index.php/Sequencer#Controlling_the_sequencer_from_custom_pages

where I put the necessary information.

Stefan

26 Apr 2023, Martin Mueller, Forum, Problem with running midas odbxx frontends on a remote machine using the -h option

Hi

We have a problem with running midas frontends when they should connect to a experiment on a different machine using the -h option. Starting them locally works fine. Firewall is off on both systems, Enable non-localhost RPC and Disable RPC hosts check are set to 'yes', mserver is running on the machine that we want to connect to. 
Error message looks like this: 

...
Connect to experiment Mu3e on host 10.32.113.210...
OK
Init hardware...
terminate called after throwing an instance of 'mexception'
  what():  
/home/mu3e/midas/include/odbxx.h:1102: Wrong key type in XML file
Stack trace:
1   0x00000000000042D828 (null) + 4380712
2   0x00000000000048ED4D midas::odb::odb_from_xml(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) + 605
3   0x0000000000004999BD midas::odb::odb(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) + 317
4   0x000000000000495383 midas::odb::read_key(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) + 1459
5   0x0000000000004971E3 midas::odb::connect(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, bool, bool) + 259
6   0x000000000000497636 midas::odb::connect(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, bool, bool) + 502
7   0x00000000000049883B midas::odb::connect_and_fix_structure(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 171
8   0x0000000000004385EF setup_odb() + 8351
9   0x00000000000043B2E6 frontend_init() + 22
10  0x000000000000433304 main + 1540
11  0x0000007F8C6FE3724D __libc_start_main + 239
12  0x000000000000433F7A _start + 42

Aborted (core dumped)


We have the same problem for all our frontends. When we want to start them locally they work. Starting them locally with ./frontend -h localhost also reproduces the error above.

The error can also be reproduced with the odbxx_test.cxx example in the midas repo by replacing line 22 in midas/examples/odbxx/odbxx_test.cxx (cm_connect_experiment(NULL, NULL, "test", NULL);) with cm_connect_experiment("localhost", "Mu3e", "test", NULL); (Put the name of the experiment instead of "Mu3e")

running odbxx_test locally gives us then the same error as our other frontend.

Thanks in advance,
Martin

27 Apr 2023, Konstantin Olchanski, Forum, Problem with running midas odbxx frontends on a remote machine using the -h option

Looks like your MIDAS is built without debug information (-O2 -g), the stack trace does not have file names and line numbers. Please rebuild with debug information and report the stack trace. Thanks. K.O.

> Connect to experiment Mu3e on host 10.32.113.210...
> OK
> Init hardware...
> terminate called after throwing an instance of 'mexception'
>   what():  
> /home/mu3e/midas/include/odbxx.h:1102: Wrong key type in XML file
> Stack trace:
> 1   0x00000000000042D828 (null) + 4380712
> 2   0x00000000000048ED4D midas::odb::odb_from_xml(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) + 605
> 3   0x0000000000004999BD midas::odb::odb(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) + 317
> 4   0x000000000000495383 midas::odb::read_key(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) + 1459
> 5   0x0000000000004971E3 midas::odb::connect(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, bool, bool) + 259
> 6   0x000000000000497636 midas::odb::connect(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, bool, bool) + 502
> 7   0x00000000000049883B midas::odb::connect_and_fix_structure(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 171
> 8   0x0000000000004385EF setup_odb() + 8351
> 9   0x00000000000043B2E6 frontend_init() + 22
> 10  0x000000000000433304 main + 1540
> 11  0x0000007F8C6FE3724D __libc_start_main + 239
> 12  0x000000000000433F7A _start + 42
> 
> Aborted (core dumped)

K.O.

28 Apr 2023, Martin Mueller, Forum, Problem with running midas odbxx frontends on a remote machine using the -h option

> Looks like your MIDAS is built without debug information (-O2 -g), the stack trace does not have file names and line numbers. Please rebuild with debug information and report the stack trace. Thanks. K.O.
> 
> > Connect to experiment Mu3e on host 10.32.113.210...
> > OK
> > Init hardware...
> > terminate called after throwing an instance of 'mexception'
> >   what():  
> > /home/mu3e/midas/include/odbxx.h:1102: Wrong key type in XML file
> > Stack trace:
> > 1   0x00000000000042D828 (null) + 4380712
> > 2   0x00000000000048ED4D midas::odb::odb_from_xml(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) + 605
> > 3   0x0000000000004999BD midas::odb::odb(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) + 317
> > 4   0x000000000000495383 midas::odb::read_key(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) + 1459
> > 5   0x0000000000004971E3 midas::odb::connect(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, bool, bool) + 259
> > 6   0x000000000000497636 midas::odb::connect(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, bool, bool) + 502
> > 7   0x00000000000049883B midas::odb::connect_and_fix_structure(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 171
> > 8   0x0000000000004385EF setup_odb() + 8351
> > 9   0x00000000000043B2E6 frontend_init() + 22
> > 10  0x000000000000433304 main + 1540
> > 11  0x0000007F8C6FE3724D __libc_start_main + 239
> > 12  0x000000000000433F7A _start + 42
> > 
> > Aborted (core dumped)
> 
> K.O.

As i said we can easily reproduce this with midas/examples/odbxx/odbxx_test.cpp  (with cm_connect_experiment changed to "localhost")
Stack trace of odbxx_test with line numbers:

Set ODB key "/Test/Settings/String Array 10[0...9]" = ["","","","","","","","","",""]
Created ODB key "/Test/Settings/Large String Array 10"
Set ODB key "/Test/Settings/Large String Array 10[0...9]" = ["","","","","","","","","",""]
[test,ERROR] [system.cxx:5104:recv_tcp2,ERROR] unexpected connection closure
[test,ERROR] [system.cxx:5158:ss_recv_net_command,ERROR] error receiving network command header, see messages
[test,ERROR] [midas.cxx:13900:rpc_call,ERROR] routine "db_copy_xml": error, ss_recv_net_command() status 411, program abort

Program received signal SIGABRT, Aborted.
0x00007ffff6665cdb in raise () from /lib64/libc.so.6
Missing separate debuginfos, use: zypper install libgcc_s1-debuginfo-11.3.0+git1637-150000.1.9.1.x86_64 libstdc++6-debuginfo-11.2.1+git610-1.3.9.x86_64 libz1-debuginfo-1.2.11-3.24.1.x86_64
(gdb) bt
#0  0x00007ffff6665cdb in raise () from /lib64/libc.so.6
#1  0x00007ffff6667375 in abort () from /lib64/libc.so.6
#2  0x0000000000431bba in rpc_call (routine_id=11249) at /home/labor/midas/src/midas.cxx:13904
#3  0x0000000000460c4e in db_copy_xml (hDB=1, hKey=1009608, buffer=0x7ffff7e9c010 "", buffer_size=0x7fffffffadbc, header=false) at /home/labor/midas/src/odb.cxx:8994
#4  0x000000000046fc4c in midas::odb::odb_from_xml (this=0x7fffffffb3f0, str=...) at /home/labor/midas/src/odbxx.cxx:133
#5  0x000000000040b3d9 in midas::odb::odb (this=0x7fffffffb3f0, str=...) at /home/labor/midas/include/odbxx.h:605
#6  0x000000000040b655 in midas::odb::odb (this=0x7fffffffb3f0, s=0x4a465a "/Test/Settings") at /home/labor/midas/include/odbxx.h:629
#7  0x0000000000407bba in main () at /home/labor/midas/examples/odbxx/odbxx_test.cxx:56
(gdb)

28 Apr 2023, Konstantin Olchanski, Forum, Problem with running midas odbxx frontends on a remote machine using the -h option

> As i said we can easily reproduce this with midas/examples/odbxx/odbxx_test.cpp  (with cm_connect_experiment changed to "localhost")
> [test,ERROR] [system.cxx:5104:recv_tcp2,ERROR] unexpected connection closure
> [test,ERROR] [system.cxx:5158:ss_recv_net_command,ERROR] error receiving network command header, see messages
> [test,ERROR] [midas.cxx:13900:rpc_call,ERROR] routine "db_copy_xml": error, ss_recv_net_command() status 411, program abort

ok, cool. looks like we crashed the mserver. either run mserver attached to gdb or enable mserver core dump, we need it's stack trace,
the correct stack trace should be rooted in the handler for db_copy_xml.

but most likely odbxx is asking for more data than can be returned through the MIDAS RPC.

what is the ODB key passed to db_copy_xml() and how much data is in ODB at that key? (odbedit "du", right?).

K.O.

28 Apr 2023, Martin Mueller, Forum, Problem with running midas odbxx frontends on a remote machine using the -h option

> > As i said we can easily reproduce this with midas/examples/odbxx/odbxx_test.cpp  (with cm_connect_experiment changed to "localhost")
> > [test,ERROR] [system.cxx:5104:recv_tcp2,ERROR] unexpected connection closure
> > [test,ERROR] [system.cxx:5158:ss_recv_net_command,ERROR] error receiving network command header, see messages
> > [test,ERROR] [midas.cxx:13900:rpc_call,ERROR] routine "db_copy_xml": error, ss_recv_net_command() status 411, program abort
> 
> ok, cool. looks like we crashed the mserver. either run mserver attached to gdb or enable mserver core dump, we need it's stack trace,
> the correct stack trace should be rooted in the handler for db_copy_xml.
> 
> but most likely odbxx is asking for more data than can be returned through the MIDAS RPC.
> 
> what is the ODB key passed to db_copy_xml() and how much data is in ODB at that key? (odbedit "du", right?).
> 
> K.O.

Ok. Maybe i have to make this more clear. ANY odbxx access of a remote odb reproduces this error for us on multiple machines. 
It does not matter how much data odbxx is asking for.

Something as simple as this reproduces the error, asking for a single integer:

int main() {
   cm_connect_experiment("localhost", "Mu3e", "test", NULL);
   midas::odb o = {
      {"Int32 Key", 42}
   };
   o.connect("/Test/Settings");
   cm_disconnect_experiment();
   return 1;
}

at the same time this runs fine:

int main() {
   cm_connect_experiment(NULL, NULL, "test", NULL);
   midas::odb o = {
      {"Int32 Key", 42}
   };
   o.connect("/Test/Settings");
   cm_disconnect_experiment();
   return 1;
}

in both cases mserver does not crash. I do not have a stack trace. There is also no error produced by mserver.

Last year we did not have these problems with the same midas frontends (For example in midas commit 9d2ef471 the code from above runs 
fine). I am trying to pinpoint the exact commit where this stopped working now.

28 Apr 2023, Konstantin Olchanski, Forum, Problem with running midas odbxx frontends on a remote machine using the -h option

> > > As i said we can easily reproduce this with midas/examples/odbxx/odbxx_test.cpp
> > ok, cool. looks like we crashed the mserver.
> Ok. Maybe i have to make this more clear. ANY odbxx access of a remote odb reproduces this error for us on multiple machines. 
> It does not matter how much data odbxx is asking for.
> midas commit 9d2ef471 the code from above runs fine

so, a regression. ouch.

if core dumps are turned off, you will not "see" the mserver crash, because the main mserver is still running. it's the mserver forked to 
serve your RPC connection that crashes.

> int main() {
>    cm_connect_experiment("localhost", "Mu3e", "test", NULL);
>    midas::odb o = {
>       {"Int32 Key", 42}
>    };
>    o.connect("/Test/Settings");
>    cm_disconnect_experiment();
>    return 1;
> }

to debug this, after cm_connect_experiment() one has to put ::sleep(1000000000); (not that big, obviously),
then while it is sleeping do "ps -efw | grep mserver", this will show the mserver for the test program,
connect to it with gdb, wait for ::sleep() to finish and o.connect() to crash, with luck gdb will show
the crash stack trace in the mserver.

so easy to debug? this is why back in the 1970-ies clever people invented core dumps, only to have
even more clever people in the 2020-ies turn them off and generally make debugging more difficult (attaching
gdb to a running program is also disabled-by-default in some recent linuxes).

rant-off.

to check if core dumps work, to "killall -7 mserver". to enable core dumps on ubuntu, see here:
https://daq00.triumf.ca/DaqWiki/index.php/Ubuntu

last known-working point is:

commit 9d2ef471c4e4a5a325413e972862424549fa1ed5
Author: Ben Smith <bsmith@triumf.ca>
Date:   Wed Jul 13 14:45:28 2022 -0700

    Allow odbxx to handle connecting to "/" (avoid trying to read subkeys as "//Equipment" etc.

K.O.

02 May 2023, Niklaus Berger, Forum, Problem with running midas odbxx frontends on a remote machine using the -h option

Thanks for all the helpful hints. When finally managing to evade all timeouts and attach the debugger in just the right moment, we find that we get a segfault in mserver at L827:
   case RPC_DB_COPY_XML:
      status = db_copy_xml(CHNDLE(0), CHNDLE(1), CSTRING(2), CPINT(3), CBOOL(4));
Some printf debugging then pointed us to the fact that the culprit is the pointer de-referencing in CBOOL(4). This in turn can be traced back to mrpc.cxx L282 ff, where the line with the arrow was missing:
  {RPC_DB_COPY_XML, "db_copy_xml",
    {{TID_INT32, RPC_IN},
     {TID_INT32, RPC_IN},
     {TID_ARRAY, RPC_OUT | RPC_VARARRAY},
     {TID_INT32, RPC_IN | RPC_OUT},
 ->  {TID_BOOL, RPC_IN},
     {0}}},

If we put that in, the mserver process completes peacfully and we get a segfault in the client ("Wrong key type in XML file") which we will attempt to debug next. Shall I create a pull request for the additional RPC argument or will you just fix this on the fly?

02 May 2023, Niklaus Berger, Forum, Problem with running midas odbxx frontends on a remote machine using the -h option

And now we also fixed the client segfault, odb.cxx L8992 also needs to know about the header:
if (rpc_is_remote())
      return rpc_call(RPC_DB_COPY_XML, hDB, hKey, buffer, buffer_size, header);

(last argument was missing before).

02 May 2023, Stefan Ritt, Forum, Problem with running midas odbxx frontends on a remote machine using the -h option


>  Shall I create a pull request for the additional RPC argument or will you just fix this on the fly?



Just fix it in the fly yourself. It’s an obvious bug, so please commit to develop.



Stefan

01 May 2023, Giovanni Mazzitelli, Bug Report, python issue with mathplot lib vs odb query

Ciao, 
we have a very strange issue with python lib with client.odb_get("/") function 
when running as midas process and matplotlib is used.


we are developing a remote console by means of sending via kafka producer the odb, 
camera image and pmt waveforms, in the INFN cloud where grafana make available 
data for non expert shifters, as well as sending midas events for online 
reconstruction to the htcondr queue on cloud. The process work perfectly and allow 
use to parallelise to standard midas pipeline for file production, ecc the online 
monitoring and data processing where we have computing resources (our DAQ is 
underground at LNGS). Part of the work will be presented next weak at CHEP
the full code is available at https://github.com/CYGNUS-
RD/middleware/blob/master/dev/event_producer_s3.py
but to get the strange behaviour I report here a test script:

----
def main(verbose=False):
    from matplotlib import pyplot as plt

    import time

    import midas
    import midas.client

    
    client = midas.client.MidasClient("middleware")
    buffer_handle = client.open_event_buffer("SYSTEM",None,1000000000)
    request_id = client.register_event_request(buffer_handle, sampling_type = 2) 
    
    fpath = os.path.dirname(os.path.realpath(sys.argv[0]))
    
    
    while True:
        # 
        odb = client.odb_get("/")
        if verbose:
            print(odb)
        start1 = time.time()
              
        client.communicate(10)
        time.sleep(1)
        

    client.deregister_event_request(buffer_handle, request_id)

    client.disconnect()
----
if I run it as cli interactivity including or not matplotlib the everything si ok. 
As I run it as midas "program" I get: 
-----
Traceback (most recent call last):
  File "/home/standard/daq/middleware/dev/test_midas_error.py", line 48, in 
<module>
    main(verbose=options.verbose)
  File "/home/standard/daq/middleware/dev/test_midas_error.py", line 29, in main
    odb = client.odb_get("/")
  File "/home/standard/packages/midas/python/midas/client.py", line 354, in 
odb_get
    retval = midas.safe_to_json(buf.value, use_ordered_dict=True)
  File "/home/standard/packages/midas/python/midas/__init__.py", line 552, in 
safe_to_json
    return json.loads(decoded, strict=False, 
object_pairs_hook=collections.OrderedDict)
  File "/usr/lib/python3.8/json/__init__.py", line 370, in loads
    return cls(**kw).decode(s)
  File "/usr/lib/python3.8/json/decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/usr/lib/python3.8/json/decoder.py", line 353, in raw_decode
    obj, end = self.scan_once(s, idx)
json.decoder.JSONDecodeError: Expecting property name enclosed in double quotes: 
line 300 column 26 (char 17535)
----
if I comment out the import of matplotlib every think works perfectly again also 
as midas program. 

it seams that there is a difference between the to way of use the code, and that 
is sufficient the call to matplotlib to corrupt in some way the odb. any ideas?

01 May 2023, Ben Smith, Bug Report, python issue with mathplot lib vs odb query

> it seams that there is a difference between the to way of use the code, and that 
> is sufficient the call to matplotlib to corrupt in some way the odb. any ideas?

I can't reproduce this on my machines, so this is going to be fun to debug!

Can you try running the program below please? It takes the important bits from odb_get() but prints out the string before we try to parse it as JSON. Feel free to send me the output via email (bsmith@triumf.ca) if you don't want to post your entire ODB dump in the elog.




import sys
import os
import time
import midas
import midas.client
import ctypes

def debug_get(client):
    c_path = ctypes.create_string_buffer(b"/")
    hKey = ctypes.c_int()
    client.lib.c_db_find_key(client.hDB, 0, c_path, ctypes.byref(hKey))

    buf = ctypes.c_char_p()
    bufsize = ctypes.c_int()
    bufend = ctypes.c_int()

    client.lib.c_db_copy_json_save(client.hDB, hKey, ctypes.byref(buf), ctypes.byref(bufsize), ctypes.byref(bufend))

    print("-" * 80)
    print("FULL DUMP")
    print("-" * 80)
    print(buf.value)
    print("-" * 80)
    print("Chars 17000-18000")
    print("-" * 80)
    print(buf.value[17000:18000])
    print("-" * 80)

    as_dict = midas.safe_to_json(buf.value, use_ordered_dict=True)

    client.lib.c_free(buf)

    return as_dict

def main(verbose=False):
    client = midas.client.MidasClient("middleware")
    buffer_handle = client.open_event_buffer("SYSTEM",None,1000000000)
    request_id = client.register_event_request(buffer_handle, sampling_type = 2)

    fpath = os.path.dirname(os.path.realpath(sys.argv[0]))

    while True:
        # odb = client.odb_get("/")
        odb = debug_get(client)

        if verbose:
            print(odb)
        start1 = time.time()

        client.communicate(10)
        time.sleep(1)


    client.deregister_event_request(buffer_handle, request_id)

    client.disconnect()

if __name__ == "__main__":
    main()

01 May 2023, Giovanni Mazzitelli, Bug Report, python issue with mathplot lib vs odb query

> > it seams that there is a difference between the to way of use the code, and that 
> > is sufficient the call to matplotlib to corrupt in some way the odb. any ideas?
> 
> I can't reproduce this on my machines, so this is going to be fun to debug!
> 
> Can you try running the program below please? It takes the important bits from odb_get() but prints out the string before we try to parse it as JSON. Feel free to send me the output via email (bsmith@triumf.ca) if you don't want to post your entire ODB dump in the elog.

Thank you!
if I added the matplotlib as follow:

#!/usr/bin/env python3

import sys
import os
import time
import midas
import midas.client
import ctypes
from matplotlib import pyplot as plt

def debug_get(client):
    c_path = ctypes.create_string_buffer(b"/")
    hKey = ctypes.c_int()
    client.lib.c_db_find_key(client.hDB, 0, c_path, ctypes.byref(hKey))

    buf = ctypes.c_char_p()
    bufsize = ctypes.c_int()
    bufend = ctypes.c_int()

    client.lib.c_db_copy_json_save(client.hDB, hKey, ctypes.byref(buf), ctypes.byref(bufsize), ctypes.byref(bufend))

    print("-" * 80)
    print("FULL DUMP")
    print("-" * 80)
    print(buf.value)
    print("-" * 80)
    print("Chars 17000-18000")
    print("-" * 80)
    print(buf.value[17000:18000])
    print("-" * 80)

    as_dict = midas.safe_to_json(buf.value, use_ordered_dict=True)

    client.lib.c_free(buf)

    return as_dict

def main(verbose=False):
    client = midas.client.MidasClient("middleware")
    buffer_handle = client.open_event_buffer("SYSTEM",None,1000000000)
    request_id = client.register_event_request(buffer_handle, sampling_type = 2)

    fpath = os.path.dirname(os.path.realpath(sys.argv[0]))

    while True:
        # odb = client.odb_get("/")
        odb = debug_get(client)

        if verbose:
            print(odb)
        start1 = time.time()

        client.communicate(10)
        time.sleep(1)


    client.deregister_event_request(buffer_handle, request_id)

    client.disconnect()

        
if __name__ == "__main__":
    from optparse import OptionParser
    parser = OptionParser(usage='usage: %prog\t ')
    parser.add_option('-v','--verbose', dest='verbose', action="store_true", default=False, help='verbose output;');
    (options, args) = parser.parse_args()
    main(verbose=options.verbose)


then tested the code in interactive mode without any error. as soon as I submit as midas "Program" I get the attached output.
thank you again, Giovanni

01 May 2023, Ben Smith, Bug Report, python issue with mathplot lib vs odb query

Looks like a localisation issue. Your floats are formatted as "6,6584e+01", whereas the JSON decoder expects "6.6584e+01".

Can you run the following few lines please? Then I'll be able to write a test using the same setup as you:


import locale
print(locale.getlocale())
from matplotlib import pyplot as plt
print(locale.getlocale())

01 May 2023, Ben Smith, Bug Report, python issue with mathplot lib vs odb query

> Looks like a localisation issue. Your floats are formatted as "6,6584e+01", whereas the JSON decoder expects "6.6584e+01".

This should be fixed in the latest commit to the midas develop branch. The JSON specification requires a dot for the decimal separator, so we must ignore the user's locale when formatting floats/doubles for JSON.

I've tested the fix on my machine by manually changing the locale, and also added an automated test in the python directory.

01 May 2023, Giovanni Mazzitelli, Bug Report, python issue with mathplot lib vs odb query

> > Looks like a localisation issue. Your floats are formatted as "6,6584e+01", whereas the JSON decoder expects "6.6584e+01".
> 
> This should be fixed in the latest commit to the midas develop branch. The JSON specification requires a dot for the decimal separator, so we must ignore the user's locale when formatting floats/doubles for JSON.
> 
> I've tested the fix on my machine by manually changing the locale, and also added an automated test in the python directory.

Thanks very macth Ben,
so if I understand correctly we have to update MIDAS to latest develop branch available? can you sand me the link to be sure of install the right update. 
can you also tell me how you fix manually? we are restarting and then well be difficult install and makes updete.
thank you again, regards, Giovanni

21 Apr 2023, Grzegorz Nieradka, Forum, Setup Midas with Caen vx2740 - ask for help

I'm trying to setup Midas with the Caen vx2740 VME digitizer board.
As the backend driver I used the software from Darkside located here:

https://bitbucket.org/ttriumfdaq/dsproto_vx2740/src/develop/

They implemented some helpers program and one from them should diagnose correct running of digitizer. But when I'm trying to run example program "vx2740_readout_test" I have segmentation fault:

Thread 1 "vx2740_readout_" received signal SIGSEGV, Segmentation fault.
0x00005555555c2ee1 in rpc_register_function (id=id@entry=18000, func=func@entry=0x5555555a2790 <jrpc_helper(int, void**)>) at /home/astrocent/workspace/packages/midas/src/midas.cxx:11947

During the calling this program I have running mhttpd, mlogger and the backend for vx2740 from the repository.

I'm not able to find documentation what is purpose of the RPC? Could someone give any indicators how I can start debug this behavior? Or there is some documentation about the RPC?


I'm freshman in the Midas world, so at this moment everything seems for me very complicated - and I'm learning by doing.

Regards,
Grzegorz

The backtrace from gdb which indicates the function in Midas package:

#0  0x00005555555c2ee1 in rpc_register_function (id=id@entry=18000, func=func@entry=0x5555555a2790 <jrpc_helper(int, void**)>)
    at /home/astrocent/workspace/packages/midas/src/midas.cxx:11947
#1  0x00005555555c2f12 in cm_register_function (id=id@entry=18000, func=func@entry=0x5555555a2790 <jrpc_helper(int, void**)>)
    at /home/astrocent/workspace/packages/midas/src/midas.cxx:5840
#2  0x00005555555a26f6 in VX2740GroupFrontend::init (this=this@entry=0x7fffffffcba0, group_idx=group_idx@entry=-1, hDB=hDB@entry=0, enable_jrpc=enable_jrpc@entry=true)
    at /home/astrocent/workspace/packages/ttriumfdaq-dsproto_vx2740-8122058cacd1/vx2740_fe_class.cxx:134
#3  0x000055555557e492 in do_fe (board_name=..., is_scope=<optimized out>) at /home/astrocent/workspace/packages/ttriumfdaq-dsproto_vx2740-8122058cacd1/vx2740_readout_test.cxx:185
#4  0x000055555557adc9 in main (argc=<optimized out>, argv=0x7fffffffd1f8) at /home/astrocent/workspace/packages/ttriumfdaq-dsproto_vx2740-8122058cacd1/vx2740_readout_test.cxx:253

21 Apr 2023, Ben Smith, Forum, Setup Midas with Caen vx2740 - ask for help

> I'm not able to find documentation what is purpose of the RPC? Could someone give any indicators how I can start debug this behavior? Or there is some documentation about the RPC?

The RPC system allows midas clients to issue commands to each other. In the case of the VX2740 code we use it so the midas webserver (mhttpd) can tell the frontend to perform some actions when a user clicks a button on a webpage.

I've been writing most of the code for the VX2740 for Darkside, so will contact you directly to help debug the issue.

21 Apr 2023, Konstantin Olchanski, Forum, Setup Midas with Caen vx2740 - ask for help

> I'm trying to setup Midas with the Caen vx2740 VME digitizer board.

welcome to the world of daq and midas! Ben already answered and he will help you with this specific hardware. (we work together)

> #0  0x00005555555c2ee1 in rpc_register_function (id=id@entry=18000, func=func@entry=0x5555555a2790 <jrpc_helper(int, void**)>)
>     at /home/astrocent/workspace/packages/midas/src/midas.cxx:11947

I look at this line in midas and I do not see any problems other than all functions that touch rpc_list are not thread safe,
and calling them at the same time as rpc calls are active will cause memory corruption and crash. This is not a problem
in most programs because rpc_register_function() is usually called once at the beginning of everything, before any RPCs
are received, sent or processed. I filed a bug against this problem. https://bitbucket.org/tmidas/midas/issues/362/rpc_list-is-not-thread-safe

> with is "jrpc"?

I implemented it years ago to allow web pages to call mhttpd (XHR/HTTP) to call user frontends (MIDAS RPC) to perform real-time actions,
i.e. to turn power supplied on or off. "j" stands for "json", but most experiments send very simple commands
and do not use json encoding.

K.O.

17 Mar 2023, Konstantin Olchanski, Info, T2K/ND280 - Many warning from ten year old variables in ODB

Forwarded from the T2K/ND280 elog:

Author              : Nick Hastings
Subject             : Many warning from ten year old variables in ODB
Logbook URL         : http://elog.nd280.org/elog/FGD/2553

Midas does period checks that the variables in the ODB are ok.
One of these is a check to see if each variable was set with +/- 10
years. Since this experiment has been running for longer than 10
years there are *many* variables that fail this check.

As a result the midas.log and messages in mhttpd are spammed
with many warnings. Eg

Mon Feb 13 14:49:18 2023 [ODBEdit,ERROR] [odb.c:548:db_validate_key,ERROR] Warning: invalid access time, key 
"/System/Prompt", time 1288763123

These can be removed by simply setting the variable again with its current value.

So I wonder if it would be best to just do a full odbdump and then load all the values
back in. Comments from MIDAS experts would be appreciated. Eg:

odbedit -c 'save fgddaq.odb'
odbedit -c 'load fgddaq.odb'

Note this problem is currently seen on both the FGD DAQ and the global slow control MIDAS instances.
It may also be a problem on the INGRID GSC and the DAQs of other ND280 systems but I did not check.

Goto page Previous 1, 2, 3 ... 7, 8, 9 ... 47, 48, 49 Next

ELOG V3.1.4-2e1708b5