ID |
Date |
Author |
Topic |
Subject |
2545
|
23 Jun 2023 |
Stefan Ritt | Bug Report | deferred stop transition | Deferred transitions were only implemented with a single instance of a program deferring the
transition. To have several instances, MIDAS probably needs to be extended. Certainly this
was never tested, so it's not a surprise that we get a segmentation fault.
Stefan |
2544
|
23 Jun 2023 |
Gennaro Tortone | Bug Report | deferred stop transition | Hi,
I'm facing some issues with 'stop' deferred transition and I suspect of
a MIDAS bug regarding this...
to reproduce the issue I use the 'deferredfe' MIDAS example (develop branch),
changing only the equipment name from 'Deferred' to 'Deferred%02d' in order
to be able to run multiple 'deferredfe' instances;
I run *three* 'deferredfe' frontends using:
./deferredfe -i 0
./deferredfe -i 2
./deferredfe -i 3
Everything goes fine on MIDAS web page and 'deferredfe' frontends are initialized
and ready to run; issues occour after 'start' when I stop the frontends: sometimes
at first shot and sometimes at next 'start'/'stop' the deferred 'stop' transition
seems to be handled in wrong way... and often one frontend goes in 'segmentation fault'
The odd thing is when I run *two* instances: in this case no issues are reported...
Thanks in advance,
Gennaro |
2543
|
21 Jun 2023 |
Gennaro Tortone | Bug Report | mserver and script execution |
Hi,
I have the following setup:
- MIDAS release: release/midas-2022-05-c
- host with MIDAS frontend (mclient)
- host with MIDAS server (mhttpd / mserver)
On mclient I run a frontend with:
./feodt5751 -h mserver -e develop -i 0
On mserver I see frontend ready and ODB variables in place;
I noticed a strange behavior with "/Programs/Execute on start run" and
"/Programs/Execute on stop run". In details the script to execute at start of run
is executed on "mserver" host but the script to execute at stop of run is executed on
"mclient" host (!)
Is this a bug or I'm missing some documentation links ?
Thanks in advance,
Gennaro |
2542
|
20 Jun 2023 |
Stefan Ritt | Info | New environment variable MIDAS_EXPNAME | I just realized that we had already MIDAS_EXPT_NAME, and now people get confused with
MIDAS_EXPT_NAME
and
MIDAS_EXPNAME
In trying to fix this confusion, I changed the name of the second variable to MIDAS_EXPT_NAME as well,
so we only have one variable now. If this causes any problems please report here.
Stefan |
2541
|
15 Jun 2023 |
Konstantin Olchanski | Suggestion | Maximum ODB size | >
> > are you sure? when/how often does "last midas program finishes" happen? it does not happen on a system crash, not on power loss, not on "shutdown -r now"
> > (I am pretty sure). In the experiments you run, how often do you shut down all programs (and check that you did not forget one somehow)?
>
> Indeed this is almost never the case, maybe once per months. On the other hand, we have a complete crash of the os maybe once a year. Most of the time the programs
> run continuously (we do not need odbedit), so our timestamp is typically one or two days old, so not good either.
>
> > my vote is to undo this change, it is dangerous because it causes odb to be saved to ODB.SHM never.
>
> My vote is to flush the odb either periodically or after each run.
>
So we are in agreement.
RFE filed:
https://bitbucket.org/tmidas/midas/issues/367/odb-should-be-saved-to-disk-periodically
Dangerous change reverted:
60e4c44ad66346b89ba057391acf7a02890049be
K.O.
bash-3.2$ git diff
diff --git a/src/odb.cxx b/src/odb.cxx
index 0d3b88c2..d104ff28 100644
--- a/src/odb.cxx
+++ b/src/odb.cxx
@@ -2199,7 +2199,14 @@ INT db_close_database(HNDLE hDB)
destroy_flag = (pheader->num_clients == 0);
/* flush shared memory to disk */
- if (destroy_flag)
+
+ /* if we save ODB to disk only after last client finishes, we will never save ODB to disk
+ in most experiments - none of them ever completely stop MIDAS in normal operation.
+ as result, all changes to ODB contents will be lost on system crash, power loss
+ or normal reboot. see https://daq00.triumf.ca/elog-midas/Midas/2539
+ K.O. June 2023. */
+
+ if (1 || destroy_flag)
ss_shm_flush(pheader->name, pdb->shm_adr, pdb->shm_size, pdb->shm_handle);
strlcpy(xname, pheader->name, sizeof(xname));
K.O. |
2540
|
13 Jun 2023 |
Stefan Ritt | Forum | Include subroutine through relative path in sequencer | > when I did this job for MEG II we decided not to include relative paths and the ".." folder to avoid an exploit called "XML Entity Injection".
> In short is to avoid leaking files outside the sequencer folders like /etc/password or private SSH keys.
> I do not remember in this moment why we pushed for absolute paths instead but let's keep this in mind.
I thought about that. But before we had absolute paths in the sequencer INCLUDE statement. So having "../../../etc/passwd" is as bad as the
absolute path "/etc/passwd". So nothing really changed. What we really should prevent is to LOAD files into the sequencer from outside the
sequence subdirectory. And this is prevented by the file loader. Actually we will soon replace the file loaded with a modern JS dialog, and
the code restricts all operations to within the experiment directory and below.
Stefan |
2539
|
13 Jun 2023 |
Stefan Ritt | Suggestion | Maximum ODB size |
> are you sure? when/how often does "last midas program finishes" happen? it does not happen on a system crash, not on power loss, not on "shutdown -r now"
> (I am pretty sure). In the experiments you run, how often do you shut down all programs (and check that you did not forget one somehow)?
Indeed this is almost never the case, maybe once per months. On the other hand, we have a complete crash of the os maybe once a year. Most of the time the programs
run continuously (we do not need odbedit), so our timestamp is typically one or two days old, so not good either.
> my vote is to undo this change, it is dangerous because it causes odb to be saved to ODB.SHM never.
My vote is to flush the odb either periodically or after each run.
Stefan |
2538
|
13 Jun 2023 |
Konstantin Olchanski | Suggestion | Maximum ODB size | >
> > small problem. build an experiment, start taking data, observe how ODB is never saved to disk because the "last client" never stops. as bonus, crash
> > the computer, observe how all changes to ODB are now lost. if mlogger is configured to save odb.json at the end of run, and to write ODB dumps at
> > begin and end of every data file, you can recover some of the lost
>
> The new behavior is not much worse than before. Assume 10 programs running happily for days, computer crashes, all ODB changes lost.
> So indeed a periodic flush without holding the lock might be best. Use a semaphore to prevent all programs flushing at the same time, or put
> the flush only in the logger after an end of run.
are you sure? when/how often does "last midas program finishes" happen? it does not happen on a system crash, not on power loss, not on "shutdown -r now"
(I am pretty sure). In the experiments you run, how often do you shut down all programs (and check that you did not forget one somehow)?
sanity check. dragon experiment, very active, .ODB.SHM timestamp is 1 second old. not-very-active agmini, today is June 13th, timestamp of .ODB.SHM is June
2nd. inactive TACTIC, timestamp of .ODB.SHM is May 16th.
so yes, not great, but in the new scheme, ODB.SHM timestamps would probably be from 2021 or 2020.
my vote is to undo this change, it is dangerous because it causes odb to be saved to ODB.SHM never.
K.O. |
2537
|
13 Jun 2023 |
Stefan Ritt | Suggestion | Maximum ODB size |
> small problem. build an experiment, start taking data, observe how ODB is never saved to disk because the "last client" never stops. as bonus, crash
> the computer, observe how all changes to ODB are now lost. if mlogger is configured to save odb.json at the end of run, and to write ODB dumps at
> begin and end of every data file, you can recover some of the lost
The new behavior is not much worse than before. Assume 10 programs running happily for days, computer crashes, all ODB changes lost.
So indeed a periodic flush without holding the lock might be best. Use a semaphore to prevent all programs flushing at the same time, or put
the flush only in the logger after an end of run.
Stefan |
2536
|
13 Jun 2023 |
Konstantin Olchanski | Suggestion | Maximum ODB size | > > BTW, how do I resize the ODB.
ODB cannot be resized "online". Everything has to stop, save content to odb.json, get rid of old ODB.SHM, ensure ODB shared memory is destroyed (SysV or POSIX shared memory),
create new ODB with new size, load odb.json. Feel free to punch this into chatgpt > odbresize.cxx, commit, test, push.
> I remember we discussed this some time ago, and concluded that odbedit needs a resize flag.
ODB cannot be resized online. ODB API has ODB clients holding ODB handles which are pointers (offsets) into ODB shared memory.
> Has this even been done?
> I guess this is still not done and the issue is still open: https://bitbucket.org/tmidas/midas/issues/329/need-odbresize
> I guess if we touch this maybe the problem with the wrong size should be also fixed: https://bitbucket.org/tmidas/midas/issues/328/odbinit-s-1024mb-creates-odb-with-wrong
please contribute 14 distraction-free days to my patreon. thanks in advance!
K.O. |
2535
|
13 Jun 2023 |
Konstantin Olchanski | Suggestion | Maximum ODB size | > > I remember the same, but I tracked it down in git to the very first commit, and there is no if() there,
> > odb is saved to .ODB.SHM on every client shutdown, not just the last client. I guess we both misremebered.
small problem. build an experiment, start taking data, observe how ODB is never saved to disk because the "last client" never stops. as bonus, crash
the computer, observe how all changes to ODB are now lost. if mlogger is configured to save odb.json at the end of run, and to write ODB dumps at
begin and end of every data file, you can recover some of the lost data.
for better effect, ODB should be dumped to disk at periodic intervals. but. current implementation writes odb to disk while holding the ODB
semaphore, which means all ODB access stops for the duration, specifically, there will be gaps in the history because mlogger cannot read history
data from ODB.
a better implementation could take the ODB lock, make a copy of ODB shared memory, release the ODB lock, complete writing to disk without holding the
lock. protection is needed against 100 midas programs trying to do this all at the same time. computers with 0.5 GB RAM (many ARM FPGA SoCs) will be
limited to ~100 Mbyte ODB). plus deal with memory allocation failures when taking a copy of a 2GB ODB.
in theory, the mmap() shared memory (already implemented in midas) does this automatically, but we lose control
over disk writes, we see some OSes write odb to disk "too often" and at wrong times, i.e. while we are in the middle
of creating or deleting something. current sequence of open(), atomic write() and close() ensures ODB.SHM always
contains a valid odb. (minus loss of OS and disk caches to crash or power loss).
K.O. |
2534
|
13 Jun 2023 |
Thomas Lindner | Info | MIDAS Workshop 2023 - Sept 13 | Hi All,
Thanks to everyone who filled out the doodle poll.
Based on the results we will plan to have this workshop on September 13, at 9AM-1PM (Vancouver) / 6PM-10PM (Geneva). Apologies to
those for whom this is a bad time/day; in particular for MIDAS users in Asia.
If you would like to present a report at the workshop on your experiment's MIDAS experience, then please email me (lindner@triumf.ca).
It would be great to know this in advance so that we can start preparing an agenda. Feel free to also email me if there are topics
that you would like addressed at the workshop.
Thanks,
Thomas
> Dear MIDAS users,
>
> We would like to arrange another MIDAS workshop, following on from previous successful workshops in 2015, 2017 and 2019. The
> goals of the workshop would include:
>
> - Getting updates from MIDAS developers on new features and other changes.
> - Getting reports from MIDAS users on how they are using MIDAS, what is working and what is not
> - Making plans for future MIDAS changes and improvements
>
> This would be a one-day virtual workshop, planned for about 4 hours length. The workshop will probably be after another of
> Stefan's visits to TRIUMF.
>
> If you would be interested in participating in such a workshop, please help us choose the date by filling out this doodle poll:
>
> https://doodle.com/meeting/organize/id/dBPVMQJa
>
> Please fill in the poll by June 9, if you are interested. We will announce the date soon after that.
>
> Thanks,
> Thomas |
2533
|
13 Jun 2023 |
Marco Francesconi | Forum | Include subroutine through relative path in sequencer | > > Hi, I would like to restructure our sequencer scripts and the paths. Until now many things are not generic at all. I would like to ask if it is possible to include files through a relative path for example something like
> > INCLUDE ../chip/global_basic_functions
> > Maybe I just did not found how to do it.
>
> It was not there. I implemented it in the last commit.
>
> Stefan
Hi Stefan,
when I did this job for MEG II we decided not to include relative paths and the ".." folder to avoid an exploit called "XML Entity Injection".
In short is to avoid leaking files outside the sequencer folders like /etc/password or private SSH keys.
I do not remember in this moment why we pushed for absolute paths instead but let's keep this in mind.
Marco |
2532
|
13 Jun 2023 |
Stefan Ritt | Forum | Include subroutine through relative path in sequencer | > Hi, I would like to restructure our sequencer scripts and the paths. Until now many things are not generic at all. I would like to ask if it is possible to include files through a relative path for example something like
> INCLUDE ../chip/global_basic_functions
> Maybe I just did not found how to do it.
It was not there. I implemented it in the last commit.
Stefan |
2531
|
13 Jun 2023 |
Thomas Senger | Forum | Include subroutine through relative path in sequencer | Hi, I would like to restructure our sequencer scripts and the paths. Until now many things are not generic at all. I would like to ask if it is possible to include files through a relative path for example something like
INCLUDE ../chip/global_basic_functions
Maybe I just did not found how to do it. |
2530
|
13 Jun 2023 |
Marius Koeppel | Suggestion | Maximum ODB size |
> BTW, how do I resize the ODB. I remember we discussed this some time ago, and concluded that odbedit needs a resize flag. Has this even been
> done? If not, what is the "official" way to resize the ODB. We had some documentation about that some time ago, but I can't find it anymore.
I guess this is still not done and the issue is still open: https://bitbucket.org/tmidas/midas/issues/329/need-odbresize
I guess if we touch this maybe the problem with the wrong size should be also fixed: https://bitbucket.org/tmidas/midas/issues/328/odbinit-s-1024mb-creates-odb-with-wrong
Best,
Marius |
2529
|
13 Jun 2023 |
Stefan Ritt | Suggestion | Maximum ODB size | > I remember the same, but I tracked it down in git to the very first commit, and there is no if() there,
> odb is saved to .ODB.SHM on every client shutdown, not just the last client. I guess we both misremebered.
I confirm. Really strange how your mind can trick you. I'm absolutely sure I had this planned originally (1995?), but it got never implemented.
Well, never too late. So I added the "if" and committed to develop. I did a quick test and things seem to work fine here. Actually programs stop
a bit faster now. So please everybody give it a try and report back here.
BTW, how do I resize the ODB. I remember we discussed this some time ago, and concluded that odbedit needs a resize flag. Has this even been
done? If not, what is the "official" way to resize the ODB. We had some documentation about that some time ago, but I can't find it anymore.
Stefan |
2528
|
12 Jun 2023 |
Konstantin Olchanski | Suggestion | Maximum ODB size | > > correction: ODB shared memory is saved to .ODB.SHM each time a client stops, this is db_close_database().
>
> The original design of the midas shared memory (back in the 1990's) was that the ODB shared memory file gets
> only saved into the .ODB.SHM when the *last* client exits. This ensures to keep the ODB persistent when the
> shared memory gets deleted. I vaguely remember I put something in like:
>
> db_close_database()
> ...
> destroy_flag = (pheader->num_clients == 0);
>
> if (destroy_flag)
> ss_shm_flush(pheader->name, pdb->shm_adr, pdb->shm_size, pdb->shm_handle);
I remember the same, but I tracked it down in git to the very first commit, and there is no if() there,
odb is saved to .ODB.SHM on every client shutdown, not just the last client. I guess we both misremebered.
What's more, ss_shm_flush() is done while holding the ODB semaphore, so all other midas programs that try to access
odb at the same time (including the mserver) will stall until write() and close() return. at least we do not fsync(),
and there is no waiting until data is committed to physical media.
$ git annotate 3bb04af4d^ src/odb.c
...
ef8320177 (Stefan Ritt 1998-10-08 13:46:02 +0000 875) destroy_flag = (pheader->num_clients == 0);
ef8320177 (Stefan Ritt 1998-10-08 13:46:02 +0000 876)
ef8320177 (Stefan Ritt 1998-10-08 13:46:02 +0000 877) /* flush shared memory to disk */
ef8320177 (Stefan Ritt 1998-10-08 13:46:02 +0000 878) ss_flush_shm(pheader->name, pheader, sizeof(DATABASE_HEADER)+2*pheader->data_size);
ef8320177 (Stefan Ritt 1998-10-08 13:46:02 +0000 879)
ef8320177 (Stefan Ritt 1998-10-08 13:46:02 +0000 880) /* unmap shared memory, delete it if we are the last */
ef8320177 (Stefan Ritt 1998-10-08 13:46:02 +0000 881) ss_close_shm(pheader->name, pheader,
ef8320177 (Stefan Ritt 1998-10-08 13:46:02 +0000 882) _database[hDB-1].shm_handle, destroy_flag);
...
K.O. |
2527
|
12 Jun 2023 |
Stefan Ritt | Suggestion | Maximum ODB size | > correction: ODB shared memory is saved to .ODB.SHM each time a client stops, this is db_close_database().
The original design of the midas shared memory (back in the 1990's) was that the ODB shared memory file gets
only saved into the .ODB.SHM when the *last* client exits. This ensures to keep the ODB persistent when the
shared memory gets deleted. I vaguely remember I put something in like:
db_close_database()
...
destroy_flag = (pheader->num_clients == 0);
if (destroy_flag)
ss_shm_flush(pheader->name, pdb->shm_adr, pdb->shm_size, pdb->shm_handle);
...
Now I see that the "if (destory_flag)" is missing. Not sure if it was removed once, or if it actually never
was there. But I see no point in flushing the ODB when a client ends. We need the flushing only before the
shared memory gets deleted. We we have to ensure that the share memory and the binary dump file stay in sync
(like if all midas clients die at the same time), we could add some code to flush the ODB like once per minute,
but not attach it to db_close_database(). I know several experiments using "odbedit -c xxx" in vast quantities,
so all these experiments would then benefit.
Note: Mu3e at PSI also uses 100 MB ODB, and they really need it.
Thoughts and opinions?
Best,
Stefan |
2526
|
09 Jun 2023 |
Konstantin Olchanski | Info | added IPv6 support for mserver and MIDAS RPC | as of commit 71fb5e82f3e9a1501b4dc2430f9193ee5d176c82, MIDAS RPC and the mserver
listen for connections both on IPv4 and IPv6. mserver clients and MIDAS RPC
clients can connect to MIDAS using both IPv4 and IPv6. In the default
configuration ("/Expt/Security/Enable non-localhost RPC" set to "n"), IPv4
localhost is used, as before. Support for IPv6 is a by product from switching
from obsolete non-thread-safe gethostbyname() and getaddrbyname() to modern
getaddrinfo() and getnameinfo(). This fixes bug 357, observed crash of mhttpd
inside gethostbyname(). K.O. |
|