Back Midas Rome Roody Rootana
  Midas DAQ System, Page 2 of 48  Not logged in ELOG logo
Entry  11 Mar 2025, Federico Rezzonico, Bug Report, python hist_get_events not returning events, but javascript does 
After starting midas (mhttpd &, and mlogger -D) and running the `fetest` frontend I went into the midas/python/examples directory and ran basic_hist_script.py, and, even though I could see the 'pytest' program in the Programs page,

Valid events are:
Enter event name:

was printed out, which signified that no events were found. No errors were displayed.

Instead, when trying to do the same in javascript (using mjsonrpc_send_request( mjsonrpc_make_request("hs_get_events")).then(console.log)), I was able to get the expected events.

The History page also displayed the expected data and the plots worked correctly.

Device info: Chip: Apple M1 Pro, OS: Sequoia (15.3)

MIDAS version: bitbucket commit 84c7ef7

Python version: 3.13.2
    Reply  11 Mar 2025, Ben Smith, Bug Report, python hist_get_events not returning events, but javascript does 
> Valid events are:
> Enter event name:
> 
> was printed out, which signified that no events were found. No errors were displayed.

I can't reproduce this. I made a brand new experiment, started mlogger/mhttpd/fetest, then ran the same program. I get:

```
$ python basic_hist_script.py
Valid events are:
* Run transitions
* test_slow/data
Enter event name: 
```

Are you sure you ran the python program after running mlogger and not before? Can you try again after restarting mlogger? And can you verify that your python is connecting to the correct experiment if you have multiple experiments defined?

I tested with python 3.12.8 and 3.13.1, and am on MacOS 14.5, but I can't imagine those differences matter.

The python interface is a trivial wrapper around the C++ function, so the only python-specific thing that would result in an empty list is extracting an integer from a ctypes reference. If that's broken in your version then I don't think any of the midas python code would be working.
    Reply  14 Mar 2025, Konstantin Olchanski, Bug Report, python hist_get_events not returning events, but javascript does 
> After starting midas (mhttpd &, and mlogger -D) and running the `fetest` frontend I went into the midas/python/examples directory and ran basic_hist_script.py, and, even though I could see the 'pytest' program in the Programs page,
> 
> Valid events are:
> Enter event name:
> 
> was printed out, which signified that no events were found. No errors were displayed.

To check that MIDAS itself is built correctly, you can try "make test", this will create a sample experiment,
run fetest, start, stop a run and check that data file and history file is created with correct history events.

If "make test" fails, I can help debug it.

In your experiment, you can check that history files are created correctly:

1) "mhist -l" should show all available events
2) "mhdump -L *.hst" should show all events in the .hst history files
3) if you have the newer mhf*.dat files, you can "more mhf_1449770978_20151210_hv.dat" to see what data is inside

If all of that works as expected, there must be a problem with the python side and we will have to figure
out how to reproduce it.

This reminds me, "make test" does not test any of the python code, it should be added (and python should be added
to the bitbucket builds).

K.O.
    Reply  16 Mar 2025, Federico Rezzonico, Bug Report, python hist_get_events not returning events, but javascript does 
> After starting midas (mhttpd &, and mlogger -D) and running the `fetest` frontend I went into the midas/python/examples directory and ran basic_hist_script.py, and, even though I could see the 'pytest' program in the Programs page,
> 
> Valid events are:
> Enter event name:
> 
> was printed out, which signified that no events were found. No errors were displayed.
> 
> Instead, when trying to do the same in javascript (using mjsonrpc_send_request( mjsonrpc_make_request("hs_get_events")).then(console.log)), I was able to get the expected events.
> 
> The History page also displayed the expected data and the plots worked correctly.
> 
> Device info: Chip: Apple M1 Pro, OS: Sequoia (15.3)
> 
> MIDAS version: bitbucket commit 84c7ef7
> 
> Python version: 3.13.2


I tested the command this morning and it worked. The most likely cause of the errors was that I was on a beta version of macOS: Switching beta updates off fixed the issue. Thanks for the help!
       Reply  19 Mar 2025, Konstantin Olchanski, Bug Report, python hist_get_events not returning events, but javascript does 
> beta version of macOS: Switching beta updates off fixed the issue.

I would be very surprised if that was the problem.

Bigger concern is that it fails without producing any useful error message.

Latest MacOS makes it extremely difficult to debug this kind of stuff, there
is several hoops to jump through to enable core dumps and to allow lldb
to attach and debug running programs.

K.O.
Entry  19 Mar 2025, Konstantin Olchanski, Forum, MidasWiki special page access now restricted to logged in users 
We have a problem with over aggressive AI bots. They are scanning everything and 
are putting a crazy load on the mediawiki mysql database. This makes all out 
wikis very slow and unresponsive, which is a big problem.

These AI scanners do not seem to know what they are doing and are trying to load 
all the special pages, like "what links here" and "recent changes" and complete 
revision histories for each page. I am not impressed. Old school google and bing 
scanner bots are much more respectful and do not cause problems.

AI bots are also scanning this elog, and to Stefan's credit elogd is holding the 
load just fine.

To reduce load on the MidasWiki mysql database, I now restricted access to most 
special pages to logged in users. It seems to have helped.

If this causes big difficulty or if you have suggestion on better ways to deal 
with aggressive AI scanner bots, please speak up.

One alternative is to put MidasWiki behind a generic login page with a well 
known password. Downside is it will block google and bing search engines, too.

K.O.
Entry  17 Mar 2025, Federico Rezzonico, Bug Report, python hist_get_recent_data returns no historical data 
Setup:
setting up midas, starting mhttpd and mlogger and running fetest.

The History page and the javascript mjsonrpc client are both able to fetch historical data for test_slow/data. Javascript code used is included here:

mjsonrpc_call(
  "hs_read_arraybuffer",
  {
    start_time: Math.floor((new Date()).getTime() /1000) - 1000,
    end_time: Math.floor((new Date()).getTime() /1000),
    events: ["test_slow/data"],
    tags: ["data"],
    index: [0],
  },
  "arraybuffer"
).then(console.log)

However, the python client does not find any valid events:

Setup:
An exptab is created and the environment variables MIDAS_EXPTAB and MIDAS_EXPT_NAME and MIDASSYS are set (together with the correct PATH)

Running /midas/python/examples/basic_hist_script.py and typing in data:

Valid events are:
* Run transitions
* rrandom/SLOW
* test_slow/data
Enter event name: test_slow/data
Valid tags for test_slow/data are:
* data
Enter tag name: data
Event/tag test_slow/data/data has 1 elements
How many hours: 1
Interval in seconds: 1 # other values were also tested, without success
0 entries found

We expect entries to be found, however do not.

Tested setups:
Macbook Pro Sequoia 15.3 with Python 3.13.2, ROOT latest, midas bitbucket commit 84c7ef7
Windows 11 with Python 3.11, ROOT latest, midas latest commit (development branch)
    Reply  17 Mar 2025, Ben Smith, Bug Report, python hist_get_recent_data returns no historical data 
Unfortunately I again cannot reproduce this:

$ python ~/DAQ/midas_latest/python/examples/basic_hist_script.py
Valid events are:
* Run transitions
* test_slow/data
Enter event name: test_slow/data
Valid tags for test_slow/data are:
* data
Enter tag name: data
Event/tag test_slow/data/data has 1 elements
How many hours: 1
Interval in seconds: 1
78 entries found
2025/03/17 17:00:56 => 98.097391
2025/03/17 17:00:57 => 98.982151
2025/03/17 17:00:58 => 99.589187
2025/03/17 17:00:59 => 99.926821
2025/03/17 17:01:00 => 99.989878
2025/03/17 17:01:01 => 99.778216
2025/03/17 17:01:02 => 99.292485
.......


I want to narrow down whether the issue is in the basic_hist_script.py or the lower-level code. So there are a few steps of debugging to do.



1) Run code directly in the python interpreter: 

Can you run the following and send the output please?

```
import midas.client
c = midas.client.MidasClient("history_test")
data = c.hist_get_recent_data(1,1,"test_slow/data","data")
print(f"event_name='{data[0]['event_name']}', tag_name='{data[0]['tag_name']}', num_entries={data[0]['num_entries']}, status={data[0]['status']}, arrlen={len(data[0]['values'])}")
```

For me, I get:
event_name='test_slow/data', tag_name='data', num_entries=441, status=1, arrlen=441



2) If things look sensible for you (status=1, non-zero num_entries), then the problem is in the basic_hist_script.py. Can you add the same print() statement in basic_hist_script.py immediately after the call to hist_get_recent_data(), then run that script again and send the output of that?



3) Debug the python/C conversions.

In midas/client.py add the following line to hist_get_data() immediately before the call to self.lib.c_hs_read():

```
        print(f"c_start_time={c_start_time.value}, c_end_time={c_end_time.value}, c_interval={c_interval.value}, c_event_name={c_event_name.value}, c_tag_name={c_tag_name.value}")
```

Then run the following and send the output:

```
import midas.client
c = midas.client.MidasClient("history_test")
data = c.hist_get_recent_data(1,1,"test_slow/data","data")
```

For me, I get:
c_start_time=1742254428, c_end_time=1742258028, c_interval=1, c_event_name=b'test_slow/data', c_tag_name=b'data'

I want to check that the UNIX timestamps match what you expect for your server, and that nothing weird is going on with the python/C string conversions.


Thanks,
Ben
       Reply  18 Mar 2025, Federico Rezzonico, Bug Report, python hist_get_recent_data returns no historical data 
> Unfortunately I again cannot reproduce this:
> 
> $ python ~/DAQ/midas_latest/python/examples/basic_hist_script.py
> Valid events are:
> * Run transitions
> * test_slow/data
> Enter event name: test_slow/data
> Valid tags for test_slow/data are:
> * data
> Enter tag name: data
> Event/tag test_slow/data/data has 1 elements
> How many hours: 1
> Interval in seconds: 1
> 78 entries found
> 2025/03/17 17:00:56 => 98.097391
> 2025/03/17 17:00:57 => 98.982151
> 2025/03/17 17:00:58 => 99.589187
> 2025/03/17 17:00:59 => 99.926821
> 2025/03/17 17:01:00 => 99.989878
> 2025/03/17 17:01:01 => 99.778216
> 2025/03/17 17:01:02 => 99.292485
> .......
> 
> 
> I want to narrow down whether the issue is in the basic_hist_script.py or the lower-level code. So there are a few steps of debugging to do.
> 
> 
> 
> 1) Run code directly in the python interpreter: 
> 
> Can you run the following and send the output please?
> 
> ```
> import midas.client
> c = midas.client.MidasClient("history_test")
> data = c.hist_get_recent_data(1,1,"test_slow/data","data")
> print(f"event_name='{data[0]['event_name']}', tag_name='{data[0]['tag_name']}', num_entries={data[0]['num_entries']}, status={data[0]['status']}, arrlen={len(data[0]['values'])}")
> ```
> 
> For me, I get:
> event_name='test_slow/data', tag_name='data', num_entries=441, status=1, arrlen=441
> 
> 
> 
> 2) If things look sensible for you (status=1, non-zero num_entries), then the problem is in the basic_hist_script.py. Can you add the same print() statement in basic_hist_script.py immediately after the call to hist_get_recent_data(), then run that script again and send the output of that?
> 
> 
> 
> 3) Debug the python/C conversions.
> 
> In midas/client.py add the following line to hist_get_data() immediately before the call to self.lib.c_hs_read():
> 
> ```
>         print(f"c_start_time={c_start_time.value}, c_end_time={c_end_time.value}, c_interval={c_interval.value}, c_event_name={c_event_name.value}, c_tag_name={c_tag_name.value}")
> ```
> 
> Then run the following and send the output:
> 
> ```
> import midas.client
> c = midas.client.MidasClient("history_test")
> data = c.hist_get_recent_data(1,1,"test_slow/data","data")
> ```
> 
> For me, I get:
> c_start_time=1742254428, c_end_time=1742258028, c_interval=1, c_event_name=b'test_slow/data', c_tag_name=b'data'
> 
> I want to check that the UNIX timestamps match what you expect for your server, and that nothing weird is going on with the python/C string conversions.
> 
> 
> Thanks,
> Ben

Hi, thank you for the support!

Running the commands on the Macbook pro leads to
1)
event_name='test_slow/data', tag_name='data', num_entries=0, status=1, arrlen=0
2)
The number of entries is zero
3)
I get
c_start_time=1742275653, c_end_time=1742279253, c_interval=1, c_event_name=b'test_slow/data', c_tag_name=b'data'

However right after running these commands I removed a .SHM_HOST.TXT file
(due to me working both at home and at PSI, my computer hostname changes when I switch the network, so I remove .SHM_HOST.TXT to be able to run experiments)

and reran the code, and it suddenly worked! This is good, but I do not know what fixed it... I had done more extensive tests yesterday and also had to delete .SHM_HOST.TXT multiple times, to no avail.
Do you have any ideas as to what could be happening? Similarly to my previous bug report, which was fixed by updating macOS to a more stable version, could this have been due to an automatic update?

If the problem still persists on the Windows machine I will post an update.
          Reply  18 Mar 2025, Konstantin Olchanski, Bug Report, python hist_get_recent_data returns no historical data 
>
> However right after running these commands I removed a .SHM_HOST.TXT file
>

Instead of deleting .SHM_HOSTS.TXT, please create it as an empty file. I thought the documentation is clear about it?

Also we recommend installing MIDAS to $HOME/packages/midas. There is a number of problems if installed at top level.

If you want to be compliant with the Linux LFS, /opt/midas is also a good place.

> 
> ... and it suddenly worked!
>

We still did not establish if mhist, mhdump and the other commands I sent you work correctly,
to confirm MIDAS is creating correct history files. (before you try to read them with python).

Also we did not establish that you have correct paths setup in ODB /Logger/History.

Many things can go wrong.

Next time python history malfunctions, please do all those other things and report to us. Thanks!

K.O.
Entry  28 Feb 2025, Zaher Salman, Info, Syntax validation in sequencer Screenshot_20250228_165543.png
Hello,

I've implemented a very basic syntax validation in the sequencer GUI. Click the validation button to check the syntax in the current tab.

Please note that this does only a simple syntax validation, the correctness of the logic is still on you :)
Entry  05 Feb 2025, Andreas Suter, Forum, Transition from mana -> manalyzer 
Hi,

we are planning to migrate from mana to manalyzer. I started to have a look into it and realized that I have some lose ends.
Is there a clear migration docu somewhere?

Currently I understand it the following way (which might be wrong):
The class TARunObject is used to write analyzer modules which are registered by TAFactory. I hope this is right?

However, in mana there is an analyzer implemented by the user which binds the modules and has additional routines:
analyzer_init(), analyzer_exit(), analyzer_loop()
ana_begin_of_run(), ana_end_of_run(), ana_pause_run(), ana_resume_run()
which we are using.

This part I somehow miss in manalyzer, most probably due to lack of understanding, and missing documentation.

Could somebody please give me a boost?
    Reply  05 Feb 2025, Andrea Capra, Forum, Transition from mana -> manalyzer dsadc_analyzer_midas_elog.pdf
Hi Andreas,

please find in elog:2938/1 a short introduction that I wrote sometime ago.
I'm glad to offer additional support, if needed.

> Hi,
> 
> we are planning to migrate from mana to manalyzer. I started to have a look into it and realized that I have some lose ends.
> Is there a clear migration docu somewhere?
> 
> Currently I understand it the following way (which might be wrong):
> The class TARunObject is used to write analyzer modules which are registered by TAFactory. I hope this is right?
> 
> However, in mana there is an analyzer implemented by the user which binds the modules and has additional routines:
> analyzer_init(), analyzer_exit(), analyzer_loop()
> ana_begin_of_run(), ana_end_of_run(), ana_pause_run(), ana_resume_run()
> which we are using.
> 
> This part I somehow miss in manalyzer, most probably due to lack of understanding, and missing documentation.
> 
> Could somebody please give me a boost?
    Reply  06 Feb 2025, Konstantin Olchanski, Forum, Transition from mana -> manalyzer 
> Could somebody please give me a boost?

no need to shout into the void, it is pretty easy to identify the author of manalyzer and ask me directly.

> we are planning to migrate from mana to manalyzer. I started to have a look into it and realized that I have some lose ends.
> Is there a clear migration docu somewhere?

README.md and examples in the manalyzer git repository.

If something is missing, or unclear, please ask.

> Currently I understand it the following way (which might be wrong):
> The class TARunObject is used to write analyzer modules which are registered by TAFactory. I hope this is right?

Please read the README file. It explains what is going on there.

Design of manalyzer had 2 main goals:
a) lifetime of all c++ objects (and ROOT objects) is well defined (to fix a design defect in rootana)
b) event flow and data flow are documentable (problem in mfe.c frontends, etc)

> However, in mana there is an analyzer implemented by the user which binds the modules and has additional routines:
> analyzer_init(), analyzer_exit(), analyzer_loop()
> ana_begin_of_run(), ana_end_of_run(), ana_pause_run(), ana_resume_run()
> which we are using.

I have never used the mana analyzer, I wrote the c++ rootana analyzer very early on (first commit in 2006).

But the basic steps should all be there for you:
- initialization (create histograms, open files) can be done in the module constructor or in BeginRun()
- finalization (fit histograms, close files) should be done in EndRun()
- event processing (obviously) in Analyze()
- pause run, resume run and switch to next subrun file have corresponding methods
- all the "flow" and multithreading stuff you can ignore to first order.

To start the migration, I recommend you take manalyzer_example_root.cxx and start stuffing it with your code.

If you run into any problems, I am happy to help with them. Ask here or contact me directly.

> This part I somehow miss in manalyzer, most probably due to lack of understanding, and missing documentation.

True, I wrote a migration guide for the frontend mfe.c to c++ tmfe, because we do this migration
quite often. But I never wrote a migration guide from mana.c analyzer, because we never did such
migration. Most experiments at TRIUMF are post-2006 and use rootana in it's different incarnations.

P.S. I designed the C++ TMFe frontend after manalyzer and I think it came out quite better, I especially
value the design input from Stefan, Thomas, Pierre, Joseph and Ben.

P.P.S. Be free to ignore all this manalyzer business and write your own analyzer based
on the midasio library:

int main()
{
   TMReaderInteraface* f = TMNewReader(file.mid.gz");
   while (1) {
      TMEvent* e = TMReadEvent(f);
      dwim(e);
      delete e;
   }
   delete f;
}

For online processing I use the TMFe class, it has enough bits to be a frontend and an analyzer,
or you can use the older TMidasOnline from rootana.

Access to ODB is via the mvodb library, which is new in midas, but has been part of rootana
and my frontend toolkit since at least 2011 or earlier, inspired by Peter Green's even
older "myload" ODB access library.

K.O.
       Reply  06 Feb 2025, Konstantin Olchanski, Forum, Transition from mana -> manalyzer 
> > Is there a clear migration docu somewhere?

I can also give you links to the alpha-g analyzer (very complex) and the DarkLight trigger TDC analyzer (very simple),
there is also analyzer examples of in-between complexity.

K.O.
Entry  18 Jan 2025, Pavel Murat, Forum, mjsonrpc: how to increase the max allowed length of the callback response ? 
Dear MIDAS experts,

I'm using MIDAS javascript interface (mjsonrpc) to communicate with a frontend from a custom web page 
and observe that the if the frontend's response exceeds certain number of bytes, it is always truncated. 

MIDAS C/C++ RPC interface allows users to specify the max response length :

https://daq00.triumf.ca/MidasWiki/index.php/Remote_Procedure_Calls_(RPC)#C++_2

How would one do the same from with mjsonrpc ? 

-- many thanks, regards, Pasha
    Reply  20 Jan 2025, Ben Smith, Forum, mjsonrpc: how to increase the max allowed length of the callback response ? 
> I'm using MIDAS javascript interface (mjsonrpc) to communicate with a frontend from a custom web page 
> and observe that the if the frontend's response exceeds certain number of bytes, it is always truncated. 
> 
> MIDAS C/C++ RPC interface allows users to specify the max response length :
> 
> https://daq00.triumf.ca/MidasWiki/index.php/Remote_Procedure_Calls_(RPC)#C++_2
> 
> How would one do the same from with mjsonrpc ? 

I just documented the max_reply_length (javascript) and max_len (python) parameters on that page. Both are optional and default to 1024 bytes.

I also added a link to the full mjsonrpc schema https://daq00.triumf.ca/MidasWiki/index.php/Mjsonrpc#Schema_(List_of_all_RPC_methods) . You can also find the auto-generated schema on any midas installation by going to the "Help" webpage served by mhttpd and clicking the "JSON-RPC schema" > "text table format" link.
Entry  29 Dec 2024, Pavel Murat, Forum, time ordering of run transition calls to TMFeEquipment things 
Dear MIDAS experts, 

I have a question about "tmfe approach" to implementing MIDAS frontends. If I read the code correctly, 
within this approach it is the TMFeEquipment things, not the TMFrontend's themselves, 
which handle the run transitions - the TMFrontend class 

https://bitbucket.org/tmidas/midas/src/423082fb67c7711813fcda61f7cd03784c398f49/include/tmfe.h#lines-306:378

simply doesn't have methods to handle those directly. 

So how does a user control the sequence in which TMFeEquipment::HandleBeginRun functions of different 
TMFeEquipment pieces are called at begin run? - there are two cases to consider: TMFeEquipment things 
defined by the same TMFrontend and by different TMFrontend's.

Many thanks and happy holidays to everyone! 

-- regards, Pasha
 
    Reply  01 Jan 2025, Konstantin Olchanski, Forum, time ordering of run transition calls to TMFeEquipment things 
> I have a question about "tmfe approach" to implementing MIDAS frontends. If I read the code correctly, 
> within this approach it is the TMFeEquipment things, not the TMFrontend's themselves, 
> which handle the run transitions - the TMFrontend class

that's correct and it is documented so in https://bitbucket.org/tmidas/midas/src/develop/tmfe.md

> So how does a user control the sequence in which TMFeEquipment::HandleBeginRun functions of different 
> TMFeEquipment pieces are called at begin run? - there are two cases to consider: TMFeEquipment things 
> defined by the same TMFrontend and by different TMFrontend's.

I am not sure what you are trying to do. It is always easier to suggest a solution to a specific problem.

But I will try to answer anyway:

1) "time ordering of run transitions" - of course midas transitions are ordered by transition sequence numbers 
and the tmfe class provides methods to control this. ditto for the mfe.cxx frontends.

2) for one TMFrontend, the order of calling HandleBeginRun() is the order in which equipments were added to the 
equipment using FeAddEquipment(). HandleEndRun() is called in reverse order. (I better check this).

3) to have multiple TMFrontends in one program would be unusual (mfe.cxx frontends completely do not support 
this), but should work. Everything was coded to support this, but it was never tested in practice because we 
cannot invent any useful use-case for it. HandleBeginRun() handlers are likely to be called in the frontends are 
created. (I could check this and confirm it works, as long as you have a valid use-case for this configuration).

4) Frontend X has EquipmentA and EquipmentB, you want EqA::HandleBeginRun() to be called at run transition 200 
and EqB::HandleBeginRun() to be called at run transition 400.

This is not directly supported by mfe.cxx frontends (the begin_run() handler is a global function) and I did not 
directly implement it in the TMFE frontend.

But I think this would be a useful improvement. I will look into this.

Likely I will add per-equipment data members fEqConfBeginRunSeqNo, fEqConfEndRunSeqno, etc. Value 0 would 
unregister the corresponding run transition handler. This would cleanup the code quite a bit, a bunch
of RegisterTranstionXXX functions could go away.

K.O.
       Reply  02 Jan 2025, Pavel Murat, Forum, time ordering of run transition calls to TMFeEquipment things 
Hi K.O., your clarification is much appreciated! 
"
> I am not sure what you are trying to do. It is always easier to suggest a solution to a specific problem.

I think, I owe you an explanation :) :

Consider ~ 40 nodes with two FPGAs (PCIE cards) per node, talking to the detector hardware. 
One of those FPGAs, in addition to reading the data, performs the global timing synchronization.
The high-bandwidth data readout is not controlled by MIDAS, so all frontends perform only 'slow control'-type functions.
In MIDAS language, an FPGA implements two different units of slow control equipment: 
one - configuring and controlling a single FPGA (equipment type A), and another one - synchronizing 
multiple FPGAs (equipment type B). On one of the nodes, unit A and unit B share the FPGA card, 
so they better be controlled by the same frontend. 

For one, I need to make sure that all type A equipment units, managed by multiple frontends, 
are initialized before the [single] type B unit which shares the frontend with the type A unit. 
And, of course, the end of a run transition has to be handled in the opposite order - type B unit 
shuts down first. 

As 'periodic' actions for all registered pieces of equipment are performed in the same loop [thread], 
registering the equipment in the needed order - first A, then B - should give a solution - thanks for making that clear.     

> 
> 1) "time ordering of run transitions" - of course midas transitions are ordered by transition sequence numbers 
> and the tmfe class provides methods to control this. ditto for the mfe.cxx frontends.
> 
> 2) for one TMFrontend, the order of calling HandleBeginRun() is the order in which equipments were added to the 
> equipment using FeAddEquipment(). HandleEndRun() is called in reverse order. (I better check this).

the ordering of the rpc handler calls in tmfe's tr_stop/tr_pause/tr_resume functions is ok.

> 
> 3) to have multiple TMFrontends in one program would be unusual (mfe.cxx frontends completely do not support 
> this), but should work. Everything was coded to support this, but it was never tested in practice because we 
> cannot invent any useful use-case for it. HandleBeginRun() handlers are likely to be called in the frontends are 
> created. (I could check this and confirm it works, as long as you have a valid use-case for this configuration).

agreed, I don't think there is a good use case for that, so no need to spend time checking.

> 
> 4) Frontend X has EquipmentA and EquipmentB, you want EqA::HandleBeginRun() to be called at run transition 200 
> and EqB::HandleBeginRun() to be called at run transition 400.
> 
> This is not directly supported by mfe.cxx frontends (the begin_run() handler is a global function) and I did not 
> directly implement it in the TMFE frontend.
> 
> But I think this would be a useful improvement. I will look into this. 

In the simplest case, registering the equipment units in the right order is definitely the answer. 
However a single FPGA can perform multiple logically independent tasks and thus represent 
multiple logical units of equipment. Those units however are not independent: they share the hardware (FPGA) 
and thus do depend on each other. Giving users a full control over the sequence in which those logical units 
execute their run transitions is quite likely to be needed, for example, to work around peculiarities of the 
custom-made kernel drivers.
 
> 
> Likely I will add per-equipment data members fEqConfBeginRunSeqNo, fEqConfEndRunSeqno, etc. Value 0 would 
> unregister the corresponding run transition handler. This would cleanup the code quite a bit, a bunch
> of RegisterTranstionXXX functions could go away.

this also makes sense. -- thanks again, regards, Pasha

> 
> K.O.
          Reply  05 Jan 2025, Stefan Ritt, Forum, time ordering of run transition calls to TMFeEquipment things 
Hi Pavel,

have you looked into 

  cm_set_transition_sequence()

which let's you define the sequence number for every midas client. You give any number between 1 and 1000 (default is 500 for frontends I believe).

The default value of 500 is defined in mfe.cxx:2641 where you have 500 for all four transitions, but it can be overwritten in the frontend_init function via 
cm_set_transition_sequence(). Since you have separate values for the start and stop transition, you can get the different sequencing for both transitions as you need. Like set 
all type A to 400, type B to 600 for TR_START, and set type A to 600 and type B to 400 for TR_STOP.

Since this works on the midas client level, it should also work for the tmfe.cxx framework. I'm however not sure if you have a similar default of 500 as in mfe.cxx:2641. But K.O. 
should know.

Best,
Stefan 
Entry  17 Dec 2024, Lukas Gerritzen, Bug Report, [History plots] "Jump to current time" resets x range to 7d 
To reproduce:
- Open a history plot, click [-] a few times until the x axis shows more than 7 days.
- Scroll to the past (left)
- Click "Jump to current time" (the triangle)

Expected result:
The upper limit of the x axis is at the current time and the lower range is now - whatever range you had before 
(>7d)

Actual result:
The upper limit is the current time, the lower limit is now - 7d

(The interval seems unchanged if the range was < 7d before clicking "Jump to current time")
    Reply  19 Dec 2024, Stefan Ritt, Bug Report, [History plots] "Jump to current time" resets x range to 7d 
I had put in a check which limits the range to 7d into the past if you press the "play" button, but now I'm not sure why this was needed. I removed it again and 
things seem to be fine. Change is committed to develop.

Stefan
Entry  13 Dec 2024, Marius Koeppel, Info, New Feature: Message Search filters.pdf
Dear all,

a new feature was implemented which allows to search the log messages in MIDAS. Attached one can find a more detailed explanation of how to use the feature.

If you see any issues / bugs feel don't hesitate to report them. For now the code was tested on Linux / Mac OS using Chrome, Firefox and Safari.

Best,
Marius 
Entry  18 Nov 2024, Lukas Gerritzen, Suggestion, Comma-separated indices in alarm conditions 
I have the following use case: I would like to check if two elements of an array exceed a certain threshold. 
However, they are not consecutive. Currently, I have to write two alarms, one checking Array[8] and one 
checking Array [10].

It would be nice if we could enter conditions such as "/Path/To/Array[8,10] > 0.5".

I looked into the code of al_evaluate_condition() and it seems very C-style. I know that you have been 
refactoring a lot of code to work with STL strings and their functions. If you find the time to refactor 
alarm.cxx, I ask that you consider adding comma-separated lists as a new feature.

Cheers
Lukas
    Reply  10 Dec 2024, Stefan Ritt, Suggestion, Comma-separated indices in alarm conditions 
These kind of alarm conditions have been implemented and committed. The documentation at 

  https://daq00.triumf.ca/MidasWiki/index.php/Alarm_System

has been updated.

/Stefan
Entry  26 Nov 2024, Nick Hastings, Bug Report, TMFE::Sleep() errors 
Hello,

I've noticed that SC FEs that use the TMFE class with midas-2022-05-c often report errors when calling TMFE:Sleep().
The error is :

[tmfe.cxx:1033:TMFE::Sleep,ERROR] select() returned -1, errno 22 (Invalid argument).

This seems to happen in two different ways:

1. Error being reported repeatedly
2. Occasional single errors being reported

When the first of these presents, we typically restart the FE to "solve" the problem.
Case 2. is typically ignored.

The code in question is:

void TMFE::Sleep(double time)
{
   int status;
   fd_set fdset;
   struct timeval timeout;
      
   FD_ZERO(&fdset);
      
   timeout.tv_sec = time;
   timeout.tv_usec = (time-timeout.tv_sec)*1000000.0;

   while (1) {
      status = select(1, &fdset, NULL, NULL, &timeout);
#ifdef EINTR
      if (status < 0 && errno == EINTR) {
         continue;
      }
#endif
      break;
   }
      
   if (status < 0) {
      TMFE::Instance()->Msg(MERROR, "TMFE::Sleep", "select() returned %d, errno %d (%s)", status, errno, strerror(errno));
   }
}

So it looks like either file descriptor of the timeval struct must have a problem.
From some reading it seems that invalid timeval structs are often caused by one or both
of tv_sec or tv_usec not being set. In the code above we can see that both appear to be
correctly set initially.

From the select() man page I see:

RETURN VALUE
       On success, select() and pselect() return the number of file descriptors contained in
       the three returned descriptor sets (that is, the total number of bits that are set in
       readfds,  writefds,  exceptfds).  The return value may be zero if the timeout expired
       before any file descriptors became ready.

       On error, -1 is returned, and errno is set to indicate the error; the file descriptor
       sets are unmodified, and timeout becomes undefined.

The second paragraph quoted from the man page above would indicate to me that perhaps the
timeout needs to be reset inside the if block. eg:

      if (status < 0 && errno == EINTR) {
         timeout.tv_sec = time;
         timeout.tv_usec = (time-timeout.tv_sec)*1000000.0;
         continue;
      }

Please note that I've only just briefly looked at this and was hoping someone more
familiar with using select() as a way to sleep() might be better able to understand
what is happening.

I wonder also if now that midas requires stricter/newer c++ standards if there maybe
some more straightforward method to sleep that is sufficiently robust and portable.

Thanks,

Nick.
    Reply  26 Nov 2024, Maia Henriksson-Ward, Bug Report, TMFE::Sleep() errors 
> Hello,
> 
> I've noticed that SC FEs that use the TMFE class with midas-2022-05-c often report errors when calling TMFE:Sleep().
> The error is :
> 
> [tmfe.cxx:1033:TMFE::Sleep,ERROR] select() returned -1, errno 22 (Invalid argument).
> 
> This seems to happen in two different ways:
> 
> 1. Error being reported repeatedly
> 2. Occasional single errors being reported
> 
> When the first of these presents, we typically restart the FE to "solve" the problem.
> Case 2. is typically ignored.
> 
> The code in question is:
> 
> void TMFE::Sleep(double time)
> {
>    int status;
>    fd_set fdset;
>    struct timeval timeout;
>       
>    FD_ZERO(&fdset);
>       
>    timeout.tv_sec = time;
>    timeout.tv_usec = (time-timeout.tv_sec)*1000000.0;
> 
>    while (1) {
>       status = select(1, &fdset, NULL, NULL, &timeout);
> #ifdef EINTR
>       if (status < 0 && errno == EINTR) {
>          continue;
>       }
> #endif
>       break;
>    }
>       
>    if (status < 0) {
>       TMFE::Instance()->Msg(MERROR, "TMFE::Sleep", "select() returned %d, errno %d (%s)", status, errno, strerror(errno));
>    }
> }
> 
> So it looks like either file descriptor of the timeval struct must have a problem.
> From some reading it seems that invalid timeval structs are often caused by one or both
> of tv_sec or tv_usec not being set. In the code above we can see that both appear to be
> correctly set initially.
> 
> From the select() man page I see:
> 
> RETURN VALUE
>        On success, select() and pselect() return the number of file descriptors contained in
>        the three returned descriptor sets (that is, the total number of bits that are set in
>        readfds,  writefds,  exceptfds).  The return value may be zero if the timeout expired
>        before any file descriptors became ready.
> 
>        On error, -1 is returned, and errno is set to indicate the error; the file descriptor
>        sets are unmodified, and timeout becomes undefined.
> 
> The second paragraph quoted from the man page above would indicate to me that perhaps the
> timeout needs to be reset inside the if block. eg:
> 
>       if (status < 0 && errno == EINTR) {
>          timeout.tv_sec = time;
>          timeout.tv_usec = (time-timeout.tv_sec)*1000000.0;
>          continue;
>       }
> 
> Please note that I've only just briefly looked at this and was hoping someone more
> familiar with using select() as a way to sleep() might be better able to understand
> what is happening.
> 
> I wonder also if now that midas requires stricter/newer c++ standards if there maybe
> some more straightforward method to sleep that is sufficiently robust and portable.
> 
> Thanks,
> 
> Nick.

I had the same error a few months ago, though I wasn't using a tagged release. It happened because I was calling TMFE::Sleep() 
with a negative time. If your issues were caused by the same reason, TMFE::Sleep() can handle negative times since commit 
591f78f (https://bitbucket.org/tmidas/midas/commits/591f78f52893d5ffd64bf4e52a1daac537ebd672).

Early in my debugging, I did come to the same conclusions you did, and actually tried a similar solution the one you suggested. 
This was a few months ago and I didn't write down what happened, but I believe it didn't work because in my case the errno was 
something other than EINTR, and/or the timeval was still an invalid argument for sleep because the timeout was still negative. I 
never followed it up because I was able to fix my problem by fixing my frontend.
    Reply  27 Nov 2024, Konstantin Olchanski, Bug Report, TMFE::Sleep() errors 
> [tmfe.cxx:1033:TMFE::Sleep,ERROR] select() returned -1, errno 22 (Invalid argument).

The very original copy of this function had an error and was spewing out this error quite often,
this was a missing handler for EINTR.

Now it looks like we are missing a handler for EINVAL.

Most likely sleep is called with a funny sleep time value that fills struct timeval with
values select() does not like.

I see Ben added a check for negative sleep times, and this is good.

I think I will do these changes:

a) add an error message for negative sleep time, I think user should never call ::Sleep with negative or zero sleep times and 
if they do it is a bug and they should fix it, the error message will inform them so.

b) add a handler for EINVAL, which will report the requested sleep time and the values in struct timeval

K.O.
    Reply  27 Nov 2024, Konstantin Olchanski, Bug Report, TMFE::Sleep() errors 
> 
> I wonder also if now that midas requires stricter/newer c++ standards if there maybe
> some more straightforward method to sleep that is sufficiently robust and portable.
> 

I believe POSIX defined clock_nanosleep() & co, so on most recent machines that is the most portable way to sleep.

Historically, select() was the only way to sleep for less than 1 sec, but it was never portable
because of differences between BSD UNIX and Linux implementations. (MacOS is BSD UNIX via FreeBSD).

On difference is the update of struct timeval is select() is interrupted.

In this elog entry, I compare sleep using select() with sleep using clock_nanosleep() and see that there is no difference:
https://daq00.triumf.ca/elog-midas/Midas/2115

As you can see tmfe.cxx has both implementations, select() and clock_nanosleep(), and anybody can try which one works better on 
their computer.

K.O.
    Reply  27 Nov 2024, Konstantin Olchanski, Bug Report, TMFE::Sleep() errors 
>       status = select(1, &fdset, NULL, NULL, &timeout);
>
>       On error, -1 is returned, ... timeout becomes undefined.

I have been reading "man select" for 30 years and I do not remember seeing this text.

I believe on BSD UNIX (MacOS) it says timeout is unchanged and on Linux is says timeout is updated to time actually slept.

I will have to investigate, but I suspect the man page was posix-ized, by sweeping BSD/MacOS and Linux implementations
under the same "instead of saying what it actually does, we will just say 'undefined'".

In any case, EINTR is not an error, it's an artefact of UNIX signal handling. Linux implementations always try
very hard to handle signals without causing EINTR to select(), read() and write(). This is most painful
when reading and writing to TCP sockets, because one most handle partial reads and EINTR.

K.O.
       Reply  06 Dec 2024, Konstantin Olchanski, Bug Report, TMFE::Sleep() errors 
> >       status = select(1, &fdset, NULL, NULL, &timeout);
> >       On error, -1 is returned, ... timeout becomes undefined.

I confirm Linux and MacOS man pages and select() with EINTR work as I remember, Linux updates "timeout" to account for the 
time already slept, MacOS does not ("timeout" is unchanged).

So the original code is roughly correct, but long sleeps will not work right if SIGALRM fires during sleeping.

Note that MIDAS no longer uses SIGALRM to fire cm_watchdog() (it was moved to a thread) and MIDAS does not use signals,
so handling of EINTR is now moot.

(Please correct me if I missed something).

The original bug report was about EINVAL, and best I can tell, it was caused by calls to TMFE::Sleep()
with strange sleep times that caused invalid values to be computed into the select() timeout.

To improve on this, I make these changes:

1) TMFE::Sleep(0) will report an error and will not sleep
2) TMFE::Sleep(negative number) will report an error and will not sleep

(please check the sleep time before calling TMFE::Sleep())

3) TMFE::Sleep(1 sec or less) will sleep using select(). (I also looked into using poll(), ppoll() and pselect()).
4) TMFE::Sleep(more than 1 second) will use a loop to sleep in increments of 1 second and will use one additional syscall to 
read the current time to decide how much more to sleep.

5) if select() returns EINVAL, the error message will reporting the sleep time and the values in "timeout".

A side effect of this is that on both Linux and MacOS long sleeps work correctly if interrupted by SIGALRM,
because SIGALRM granularity is 1 sec and our sleep time is also 1 sec.

Commit [develop 06735d29] improve TMFE::Sleep()

K.O.
          Reply  06 Dec 2024, Konstantin Olchanski, Bug Report, TMFE::Sleep() errors 
> Commit [develop 06735d29] improve TMFE::Sleep()

Report of test_sleep on Ubuntu-22 (Intel E-2236) and MacOS 14.6.1 (M1 MAX Mac Studio).

It is easy to see Ubuntu-22 (kernel 6.2.x) sleep granularity is ~60 usec, MacOS sleep granularity is <2 usec. (Sub 1 usec sleep likely measures the syscall() speed, 500 ns on Intel and 200 
ns on ARM M1 MAX). (NOTE: long sleep is interrupted by an alarm roughly 10 seconds into the sleep, see progs/test_sleep.cxx)

daq00:midas$ uname -a
Linux daq00.triumf.ca 6.2.0-39-generic #40~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Thu Nov 16 10:53:04 UTC 2 x86_64 x86_64 x86_64 GNU/Linux
daq00:midas$ ./bin/test_sleep 
test short sleep:
sleep      10 loops,   100000.000 usec per loop,  1.000000000 sec total,  1.002568007 sec actual total,   100256.801 usec actual per loop, oversleep  256.801 usec,    0.3%
sleep     100 loops,    10000.000 usec per loop,  1.000000000 sec total,  1.025897980 sec actual total,    10258.980 usec actual per loop, oversleep  258.980 usec,    2.6%
sleep    1000 loops,     1000.000 usec per loop,  1.000000000 sec total,  1.169670105 sec actual total,     1169.670 usec actual per loop, oversleep  169.670 usec,   17.0%
sleep   10000 loops,      100.000 usec per loop,  1.000000000 sec total,  1.573357105 sec actual total,      157.336 usec actual per loop, oversleep   57.336 usec,   57.3%
sleep   99999 loops,       10.000 usec per loop,  0.999990000 sec total,  6.963442087 sec actual total,       69.635 usec actual per loop, oversleep   59.635 usec,  596.4%
sleep 1000000 loops,        1.000 usec per loop,  1.000000000 sec total, 60.939687967 sec actual total,       60.940 usec actual per loop, oversleep   59.940 usec, 5994.0%
sleep 1000000 loops,        0.100 usec per loop,  0.100000000 sec total,  0.613572121 sec actual total,        0.614 usec actual per loop, oversleep    0.514 usec,  513.6%
sleep 1000000 loops,        0.010 usec per loop,  0.010000000 sec total,  0.576359987 sec actual total,        0.576 usec actual per loop, oversleep    0.566 usec, 5663.6%
test long sleep: requested 126.000000000 sec ... sleeping ...
alarm!
test long sleep: requested 126.000000000 sec, actual 126.000875950 sec

bash-3.2$ uname -a
Darwin send.triumf.ca 23.6.0 Darwin Kernel Version 23.6.0: Mon Jul 29 21:14:30 PDT 2024; root:xnu-10063.141.2~1/RELEASE_ARM64_T6000 arm64
bash-3.2$ ./bin/test_sleep 
test short sleep:
sleep      10 loops,   100000.000 usec per loop,  1.000000000 sec total,  1.032556057 sec actual total,   103255.606 usec actual per loop, oversleep 3255.606 usec,    3.3%
sleep     100 loops,    10000.000 usec per loop,  1.000000000 sec total,  1.245460033 sec actual total,    12454.600 usec actual per loop, oversleep 2454.600 usec,   24.5%
sleep    1000 loops,     1000.000 usec per loop,  1.000000000 sec total,  1.331466913 sec actual total,     1331.467 usec actual per loop, oversleep  331.467 usec,   33.1%
sleep   10000 loops,      100.000 usec per loop,  1.000000000 sec total,  1.281141996 sec actual total,      128.114 usec actual per loop, oversleep   28.114 usec,   28.1%
sleep   99999 loops,       10.000 usec per loop,  0.999990000 sec total,  1.410759926 sec actual total,       14.108 usec actual per loop, oversleep    4.108 usec,   41.1%
sleep 1000000 loops,        1.000 usec per loop,  1.000000000 sec total,  2.400593996 sec actual total,        2.401 usec actual per loop, oversleep    1.401 usec,  140.1%
sleep 1000000 loops,        0.100 usec per loop,  0.100000000 sec total,  0.188431025 sec actual total,        0.188 usec actual per loop, oversleep    0.088 usec,   88.4%
sleep 1000000 loops,        0.010 usec per loop,  0.010000000 sec total,  0.188102007 sec actual total,        0.188 usec actual per loop, oversleep    0.178 usec, 1781.0%
test long sleep: requested 126.000000000 sec ... sleeping ...
alarm!
test long sleep: requested 126.000000000 sec, actual 126.001244068 sec

K.O.
Entry  06 Dec 2024, Stefan Ritt, Info, New slow control framework "mdev" PDCC.png
A new slow control mini-framework has been developed for MIDAS and been successfully tested in the Mu3e experiment. It 
might be suited for other experiments as well.

Background

Since the late 90’s we have the three-tier bases slow control framework in MIDAS with class drivers, device drivers and bus 
drivers. While it was used successfully since many years, it is complicated to understand and limited in its flexibility. If we 
have a HV device with a demand value, a measured voltage and a current it’s fine, but if we want to control more things like 
trip voltage, temperature and status readout etc. it soon hits its limits. With the development of the new odbxx API 
(https://daq00.triumf.ca/MidasWiki/index.php/Odbxx) there is now an opportunity to make everything much simpler.

Design principles

Instead of a three-tier system, the new “mdev” framework (“m”idas “dev”ices) uses a simple base class which is attached to 
a certain MIDAS equipment. It implements five simple functions:

- odb_setup() to setup /Equipment/<name>/Settings and /Equipment/<name>/Variables to its desired structure

- init() to initialize the slow control device

- exit() to close the connection to the device

- loop() which is called periodically to read the device

- read_event() which returns a MIDAS event going to the data stream

A device driver inherits from this base class and implements the functions. A simple example can be found in 

  midas/drivers/mdev/mdev_mscb.[h,cxx]

for the MSCB field bus system used at TRIUMF and PSI. It basically boils down to two calls:

Init:
   m_variables.connect(“/Equipment/<name>/Variables”);
   m_variables[“Output”].watch(midas::odb &o) {
      m_mscb[“HV”] = o[0]; // transfer value from ODB to MSCB device
   }

Reading a value in the loop function:
   m_variables[“Input”][0] = m_mscb[“HVMeas”];

The member variable m_variables is a midas::odb variable attached to the “Input” and “Output” variables in the ODB. The 
watch() functions executes the lambda function whenever the “Output” in the ODB changes. It then simply transfers the new 
value to the device. The reading of measured values just work in the other direction from the device to the ODB.

If you look at the mdev_mscb.cxx code, you see of course some more things like connecting to the MSCB device with proper 
error handling, looping over several devices and variables, setting up the “Setting” directory in the ODB to define labels for 
all variables. In addition we have a mirror for output variables, so that new values are only sent to the device if they differ 
from the previous variable (needed to reduce some communication traffic). 

The midas/drivers/mdev directory contains also an example frontend in the mfe.cxx framework, but this is no a requirement. 
The mdev framework can also be used in the tmfe framework and others as well. Please note how compact the frontend 
code now looks.

User interface

Since the beginning, MIDAS allows access to the the slow control devices through the “equipment” page (on the main status 
page, click on one equipment). A few more options can control now the behavior of this page, allowing quite some flexibility 
without having to write a dedicated custom page (which of course can still be done). Attached is an example from Mu3e where 
the details of the equipment display are controlled through some options in the setting subdirectory as described in 
https://daq00.triumf.ca/MidasWiki/index.php//Equipment_ODB_tree (especially the “grid display”, “Editable” and “Format” 
flags).

Conclusions

The new “mdev” framework offers a compact and effective way to communicate from MIDAS to slow control devices. Since 
all interface code is now not “hidden” any more in system class and device drivers, the user has much higher flexibility in 
controlling different devices. If a device has a new parameter, the user can add a single line of code to connect this 
parameter to an ODB entry.

The framework is very simple and misses some features of the old system. Ramping of HV voltages and current trips are not 
available in the framework (like with the old HV class driver), but modern devices usually implement this in hardware which 
is much better. The new framework is not multi-threaded, but modern devices are these day much faster than in the ‘90s. 
Since the ODB is thread save, nothing prevents us from putting a device readout into its own thread in the frontend.

We will use the new system for all devices in Mu3e, with probably some new features being added soon, so stay tuned.

/Stefan
Entry  23 Oct 2024, Lukas Gerritzen, Bug Report, ODB key picker does not close when creating link / Edit-on-run string box too large Screenshot_2024-10-23_at_11.38.38.png
To reproduce:
In the interactive ODB, click the &#128279; icon to create a link. Next to the target,  click the "..." button to open 
the key picker browser. Then try to close it by either:
- Selecting a key and clicking ok
- Clicking "cancel"
- Clicking the red circle at the top left

Expected result:
The key picker closes

Actual result:
The key picker does not close.

Depending on how you trying to close the picker, the error messages in the debug console differ slightly.

On the red circle:
Uncaught TypeError: dlg is null
    dlgClose http://localhost:8080/controls.js:791
    onclick http://localhost:8080/?cmd=ODB&odb_path=/Test:1

On "ok" or "cancel":
Uncaught TypeError: dlg is null
    dlgMessageDestroy http://localhost:8080/controls.js:828
    pickerButton http://localhost:8080/odbbrowser.js:453
    onclick http://localhost:8080/?cmd=ODB&odb_path=/Test:1


Another more minor visual problem is the edit-on-start dialog. There seems to be no upper bound to the 
size of the text box. In the attached screenshot, ShortString has a maximum length of 32 characters, 
LongString has 255. Both are empty at the time of the screenshot. Maybe, the size should be limited to a 
reasonable width.
    Reply  02 Dec 2024, Stefan Ritt, Bug Report, ODB key picker does not close when creating link / Edit-on-run string box too large 
> Actual result:
> The key picker does not close.

Thanks for reporting that bug. It has been fixed in the current commit (installed already on megon02)

Stefan
       Reply  04 Dec 2024, Konstantin Olchanski, Bug Report, ODB key picker does not close when creating link / Edit-on-run string box too large 
> > Actual result:
> > The key picker does not close.
> 
> Thanks for reporting that bug. It has been fixed in the current commit (installed already on megon02)
>

Stefan, thank you for fixing both problems, I have seen them, too, but no time to deal with them.

K.O.
    Reply  02 Dec 2024, Stefan Ritt, Bug Report, ODB key picker does not close when creating link / Edit-on-run string box too large 
> Another more minor visual problem is the edit-on-start dialog. There seems to be no upper bound to the 
> size of the text box. In the attached screenshot, ShortString has a maximum length of 32 characters, 
> LongString has 255. Both are empty at the time of the screenshot. Maybe, the size should be limited to a 
> reasonable width.

I limited the input size now to (arbitrarily) 100 chars. The string can still be longer than 100 chars, and you start then scrolling inside the input box. Let me know if 
that's ok this way.

Stefan
       Reply  05 Dec 2024, Konstantin Olchanski, Bug Report, ODB key picker does not close when creating link / Edit-on-run string box too large 
> > Another more minor visual problem is the edit-on-start dialog. There seems to be no upper bound to the 
> > size of the text box. In the attached screenshot, ShortString has a maximum length of 32 characters, 
> > LongString has 255. Both are empty at the time of the screenshot. Maybe, the size should be limited to a 
> > reasonable width.
> 
> I limited the input size now to (arbitrarily) 100 chars. The string can still be longer than 100 chars, and you start then scrolling inside the input box. Let me know if 
> that's ok this way.

I am moving the dragon experiment to the new midas and we see this problem on the begin-of-run page.

Old midas: no horizontal scroll bar, edit-on-start names, values and comments are all squeezed in into the visible frame.

New midas: page is very wide, values entry fields are very long and there is a horizontal scroll bar.

So something got broken in the htlm formatting or sizing. I should be able to spot the change by doing
a diff between old resources/start.html and the new one.

K.O.
Entry  15 Aug 2024, Scott Oser, Forum, "Safe" abort of sequencer scripts 
We often use the MIDAS sequencer to temporarily control detector settings, such as:

* <change some setting>
* WAIT 60 seconds
* <revert setting to original value>

The question arises of what happens if the sequencer scripts gets aborted during that wait, preventing the value from being reset.  Depending on the setting, this could be undesirable or even damage something if left uncorrected for too long.

Is there any way to have a "safe abort" from the sequencer so that the "Stop immediately" button will call some cleanup script to leave things in a safe state?  Or what about if the sequencer process itself gets killed in the middle of a script?

How have other experiments using MIDAS protected themselves from unplanned terminations of sequencer scripts?
    Reply  19 Aug 2024, Stefan Ritt, Forum, "Safe" abort of sequencer scripts 
This request came more than once in the past. One thing I could implement is a "atexit" function similarly to the C funciton atexit().

Then we would have a function in the script which gets called whenever one does "stop immediately". This function can then restore
some ODB values or do whatever is necessary. 

If the sequencer gets killed in the middle, it can safely be restarted since the complete sequencer state is kept in the ODB under
/Sequencer/State. After the restart, the sequencer continues exactly where it has been killed before.

Would that solve your problem?

Stefan
       Reply  22 Aug 2024, Scott Oser, Forum, "Safe" abort of sequencer scripts 
> This request came more than once in the past. One thing I could implement is a "atexit" function similarly to the C funciton atexit().
> 
> Then we would have a function in the script which gets called whenever one does "stop immediately". This function can then restore
> some ODB values or do whatever is necessary. 
> 
> If the sequencer gets killed in the middle, it can safely be restarted since the complete sequencer state is kept in the ODB under
> /Sequencer/State. After the restart, the sequencer continues exactly where it has been killed before.
> 
> Would that solve your problem?
> 
> Stefan

Yes, an "atexit" functionality within the Midas Sequencer Language would be useful for us with this issue.  Is this easy for you to implement?

Thanks,
Scott Oser
          Reply  02 Dec 2024, Stefan Ritt, Forum, "Safe" abort of sequencer scripts 
The atexit() function has been implemented in the current develop branch of midas, see

  https://daq00.triumf.ca/MidasWiki/index.php/Sequencer#ATEXIT_subroutine


Stefan
    Reply  11 Sep 2024, Konstantin Olchanski, Forum, "Safe" abort of sequencer scripts 
> We often use the MIDAS sequencer to temporarily control detector settings, such as:
> 
> * <change some setting>
> * WAIT 60 seconds
> * <revert setting to original value>
> 
> The question arises of what happens if the sequencer scripts gets aborted during that wait, preventing the value from being reset.

Common problem. Go have an elegant solution using the "defer" keyword.

https://go.dev/tour/flowcontrol/12

K.O.
Entry  30 Nov 2024, Pavel Murat, Bug Report, EQ_PERIODIC-only equipment ?  
Dear Midas experts, 

I'm running into something which looks like an initialization problem. 
I have a mfe.cxx-style frontend which introduces an equipment of the EQ_PERIODIC type (EQ_PERIODIC-only!). 
When Midas enters the running state, I see the frontend crashing. 
Stepping through the code shows that the frontend is crashing because its equipment has been ignored 
by the initialize_equipment@mfe.cxx - see
 
https://bitbucket.org/tmidas/midas/src/5d0dae001712164ae43137dced2fbbb594f0201e/src/mfe.cxx#lines-630

Is there an assumption that the initialization of the EQ_PERIODIC-only equipment is the user responsibility? 
Or EQ_PERIODIC should always come paired with some other type?

-- many thanks, regards, Pasha
    Reply  01 Dec 2024, Stefan Ritt, Bug Report, EQ_PERIODIC-only equipment ?  
There is no requirement that you pair an EQ_PERIODIC with an EQ_TRIGGER. Take for exmaple

  midas/examples/experiment/frontend.cxx

and remove there the triggered event. The frontend runs happily with the periodic event only (I just tried that myself). You have probably some problem in 
your event definition. Start with the running example frontend, and add your code line by line until you see the problem.

Stefan
       Reply  01 Dec 2024, Pavel Murat, Bug Report, EQ_PERIODIC-only equipment ?  
> There is no requirement that you pair an EQ_PERIODIC with an EQ_TRIGGER. Take for exmaple
> 
>   midas/examples/experiment/frontend.cxx
> 
> and remove there the triggered event. The frontend runs happily with the periodic event only (I just tried that myself). You have probably some problem in 
> your event definition. Start with the running example frontend, and add your code line by line until you see the problem.

Hi Stefan, thank you very much! 

As the pointer to the readout function and pointers to device drivers are all defined in the same structure (EQUIPMENT), 
I was naively assuming that the readout function should be set during the class driver initialization.
Now it is clear that the equipment responding to EQ_PERIODIC events doesn't have to have drivers, 
and specifying the readout function is the responsibility of the user.

I got around exactly this way yesterday, but was thinking that I was hacking the system :)
 
-- regards, Pasha
Entry  07 Oct 2024, Amy Roberts, Bug Report, Difficulty running MIDAS on Rocky 9.4 
We're trying to install the SuperCDMS version of MIDAS on a Rocky 9.4 Virtual 
Machine and are getting a persistent error when we run mserver.  As far as I 
know there are minimal changes between this and the MIDAS branch, but Ben Smith 
may have more to say on this.

[lekhraj@sdfcdmsdaq online]$ mserver
mserver started interactively
[mserver,INFO] Client 'ODBEdit' on buffer 'SYSMSG' removed by bm_open_buffer 
because process pid 481051 does not exist
mserver will listen on TCP port 1175
[mserver,ERROR] [odb.cxx:2498:db_lock_database,ERROR] cannot lock ODB semaphore, 
timeout 10000 ms, exiting...
[mserver,ERROR] [midas.cxx:2205:cm_check_connect,ERROR] cm_disconnect_experiment 
not called at end of program
db_lock_database: Detected recursive call to db_{lock,unlock}_database() while 
already inside db_{lock,unlock}_database(). Maybe this is a call from a signal 
handler. Cannot continue, aborting...
Aborted (core dumped)

We thought perhaps we had a corrupted ODB file, so we removed the ODB file and 
tried to create a new one (sized correctly for our experiment):

[lekhraj@sdfcdmsdaq online]$ odbedit -s 50000000
[ODBEdit,ERROR] [odb.cxx:2052:db_open_database,ERROR] Removed ODB client 
'mserver', index 0 because process pid 481326 does not exists
[ODBEdit,INFO] Removed open record flag from "/Experiment/Security/RPC 
hosts/Allowed hosts"
[ODBEdit,INFO] Removed exclusive access mode from "/Experiment/Security/RPC 
hosts/Allowed hosts"
[ODBEdit,INFO] Corrected 1 ODB entries
[ODBEdit,INFO] Deleted entry '/System/Clients/481326' for client 'mserver' 
because it is not connected to ODB
[ODBEdit,INFO] Client 'mserver' on buffer 'SYSMSG' removed by bm_open_buffer 
because process pid 481326 does not exist
[local:test:S]/>Bus error (core dumped)
    Reply  07 Oct 2024, Ben Smith, Bug Report, Difficulty running MIDAS on Rocky 9.4 
> We're trying to install the SuperCDMS version of MIDAS on a Rocky 9.4 Virtual 
> Machine and are getting a persistent error when we run mserver.  As far as I 
> know there are minimal changes between this and the MIDAS branch, but Ben Smith 
> may have more to say on this.

For reference, "the SuperCDMS version of MIDAS" is just a fork that no longer has any meaningful differences vs the main MIDAS repo, but we only pull updates infrequently after testing a bunch. We last pulled from the develop branch in November 2023. But that should be irrelevant here as semaphore code hasn't been touched for a very long time.

We're running Alma 9.4 on a machine at TRIUMF and the same version of midas works fine there (Amy, you may already have access to scdms-zeus). I believe Alma and Rocky should be basically identical for this.

So the questions are:
* Have you tried other midas programs, or only mserver? E.g. did odbedit and mhttpd work?
* If other programs work, have you been running them all as the same user? In particular, if you ran one program as root and another as an unprivileged user, then you will likely get odd permissions issues.
* What do you see if you run `ls -l /dev/shm` and `ls -l ~/packages/SuperCDMS_DAQ/MidasDAQ/online/.*SHM`? (Or wherever your online dir is for the 2nd one).
* Did you follow the full instructions for recovering from a corrupt ODB? https://daq00.triumf.ca/MidasWiki/index.php/FAQ#How_to_recover_from_a_corrupted_ODB   In particular the bit about running odbinit with the --cleanup flag?
       Reply  08 Oct 2024, Amy Roberts, Bug Report, Difficulty running MIDAS on Rocky 9.4 
> > We're trying to install the SuperCDMS version of MIDAS on a Rocky 9.4 Virtual 
> > Machine and are getting a persistent error when we run mserver.  As far as I 
> > know there are minimal changes between this and the MIDAS branch, but Ben Smith 
> > may have more to say on this.
> 
> For reference, "the SuperCDMS version of MIDAS" is just a fork that no longer has any meaningful differences vs the main MIDAS repo, but we only pull updates infrequently after testing a bunch. We last pulled from the develop branch in November 2023. But that should be irrelevant here as semaphore code hasn't been touched for a very long time.
> 
> We're running Alma 9.4 on a machine at TRIUMF and the same version of midas works fine there (Amy, you may already have access to scdms-zeus). I believe Alma and Rocky should be basically identical for this.
> 
> So the questions are:
> * Have you tried other midas programs, or only mserver? E.g. did odbedit and mhttpd work?
> * If other programs work, have you been running them all as the same user? In particular, if you ran one program as root and another as an unprivileged user, then you will likely get odd permissions issues.
> * What do you see if you run `ls -l /dev/shm` and `ls -l ~/packages/SuperCDMS_DAQ/MidasDAQ/online/.*SHM`? (Or wherever your online dir is for the 2nd one).
> * Did you follow the full instructions for recovering from a corrupt ODB? https://daq00.triumf.ca/MidasWiki/index.php/FAQ#How_to_recover_from_a_corrupted_ODB   In particular the bit about running odbinit with the --cleanup flag?

Here's what happens when I try to run odbedit:

[lekhraj@sdfcdmsdaq setup]$ odbedit
[ODBEdit,ERROR] [odb.cxx:2052:db_open_database,ERROR] Removed ODB client 'ODBEdit', index 0 because process pid 481823 does not exists
[ODBEdit,INFO] Removed open record flag from "/Experiment/Security/RPC hosts/Allowed hosts"
[ODBEdit,INFO] Removed exclusive access mode from "/Experiment/Security/RPC hosts/Allowed hosts"
[ODBEdit,INFO] Corrected 1 ODB entries
[ODBEdit,INFO] Deleted entry '/System/Clients/481823' for client 'ODBEdit' because it is not connected to ODB
[ODBEdit,INFO] Client 'ODBEdit' on buffer 'SYSMSG' removed by bm_open_buffer because process pid 481823 does not exist
[ODBEdit,ERROR] [odb.cxx:2498:db_lock_database,ERROR] cannot lock ODB semaphore, timeout 10000 ms, exiting...
[ODBEdit,ERROR] [midas.cxx:2205:cm_check_connect,ERROR] cm_disconnect_experiment not called at end of program
db_lock_database: Detected recursive call to db_{lock,unlock}_database() while already inside db_{lock,unlock}_database(). Maybe this is a call from a signal handler. Cannot continue, aborting...
Aborted (core dumped)
[lekhraj@sdfcdmsdaq setup]$

And mhttpd:

[lekhraj@sdfcdmsdaq setup]$ mhttpd
[mhttpd,ERROR] [odb.cxx:2052:db_open_database,ERROR] Removed ODB client 'ODBEdit', index 0 because process pid 601054 does not exists
[mhttpd,INFO] Removed open record flag from "/Experiment/Security/RPC hosts/Allowed hosts"
[mhttpd,INFO] Removed exclusive access mode from "/Experiment/Security/RPC hosts/Allowed hosts"
[mhttpd,INFO] Corrected 1 ODB entries
[mhttpd,INFO] Deleted entry '/System/Clients/601054' for client 'ODBEdit' because it is not connected to ODB
[mhttpd,INFO] Client 'ODBEdit' on buffer 'SYSMSG' removed by bm_open_buffer because process pid 601054 does not exist
[mhttpd,INFO] ODB subtree /Runinfo corrected successfully
Password protection is off
Hostlist off, connections from anywhere will be accepted
Listening on "http://localhost:8080", passwords OFF, hostlist OFF
Listening on "http://[::1]:8080", passwords OFF, hostlist OFF
bm_lock_buffer: Lock buffer "SYSMSG" is taking longer than 1 second!
[mhttpd,ERROR] [odb.cxx:2498:db_lock_database,ERROR] cannot lock ODB semaphore, timeout 10000 ms, exiting...
[mhttpd,ERROR] [midas.cxx:2205:cm_check_connect,ERROR] cm_disconnect_experiment not called at end of program
db_lock_database: Detected recursive call to db_{lock,unlock}_database() while already inside db_{lock,unlock}_database(). Maybe this is a call from a signal handler. Cannot continue, aborting...
Aborted (core dumped)
[lekhraj@sdfcdmsdaq setup]$

We have been running everything as a single user, the user who cloned the repositories and owns the directories.

We did follow the corrupted-ODB cleanup instructions.

[lekhraj@sdfcdmsdaq setup]$ ls -lh /dev/shm
total 1.3M
-rw------- 1 lekhraj dm 1.2M Oct  8 14:13 17468_test_ODB__sdf_home_l_lekhraj_packages_SuperCDMS_DAQ_MidasDAQ_online_
-rw------- 1 lekhraj dm 114K Oct  7 14:06 17468_test_SYSMSG__sdf_home_l_lekhraj_packages_SuperCDMS_DAQ_MidasDAQ_online_

[lekhraj@sdfcdmsdaq setup]$ ls -lh ~/packages/SuperCDMS_DAQ/MidasDAQ/online/.*SHM
-rw-r--r-- 1 lekhraj dm    0 Oct  3 08:46 /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/MidasDAQ/online/.ALARM.SHM
-rw-r--r-- 1 lekhraj dm    0 Oct  3 08:46 /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/MidasDAQ/online/.ELOG.SHM
-rw-r--r-- 1 lekhraj dm    0 Oct  3 08:46 /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/MidasDAQ/online/.HISTORY.SHM
-rw-r--r-- 1 lekhraj dm    0 Oct  3 08:46 /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/MidasDAQ/online/.LAZY.SHM
-rw-r--r-- 1 lekhraj dm    0 Oct  3 08:46 /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/MidasDAQ/online/.MSG.SHM
-rw-r--r-- 1 lekhraj dm 1.2M Oct  8 14:12 /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/MidasDAQ/online/.ODB.SHM
-rw-r--r-- 1 lekhraj dm    0 Oct  3 08:46 /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/MidasDAQ/online/.SYSMSG.SHM
-rw-r--r-- 1 lekhraj dm    0 Oct  3 08:46 /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/MidasDAQ/online/.SYSTEM.SHM
          Reply  08 Oct 2024, Konstantin Olchanski, Bug Report, Difficulty running MIDAS on Rocky 9.4 
I read these error messages. There is no ODB corruption. ODB semaphore is locked and all midas programs will fail, they will timeout trying to get the lock, report the timeout, then it looks like a bug was introduced where instead of hard exit or abort() they attempt a clean shutdown which crashes from a recursive call in db_lock_database(). Amy's 
core dump should confirm this.

K.O.


> > > We're trying to install the SuperCDMS version of MIDAS on a Rocky 9.4 Virtual 
> > > Machine and are getting a persistent error when we run mserver.  As far as I 
> > > know there are minimal changes between this and the MIDAS branch, but Ben Smith 
> > > may have more to say on this.
> > 
> > For reference, "the SuperCDMS version of MIDAS" is just a fork that no longer has any meaningful differences vs the main MIDAS repo, but we only pull updates infrequently after testing a bunch. We last pulled from the develop branch in November 2023. But that should be irrelevant here as semaphore code hasn't been touched for a very long time.
> > 
> > We're running Alma 9.4 on a machine at TRIUMF and the same version of midas works fine there (Amy, you may already have access to scdms-zeus). I believe Alma and Rocky should be basically identical for this.
> > 
> > So the questions are:
> > * Have you tried other midas programs, or only mserver? E.g. did odbedit and mhttpd work?
> > * If other programs work, have you been running them all as the same user? In particular, if you ran one program as root and another as an unprivileged user, then you will likely get odd permissions issues.
> > * What do you see if you run `ls -l /dev/shm` and `ls -l ~/packages/SuperCDMS_DAQ/MidasDAQ/online/.*SHM`? (Or wherever your online dir is for the 2nd one).
> > * Did you follow the full instructions for recovering from a corrupt ODB? https://daq00.triumf.ca/MidasWiki/index.php/FAQ#How_to_recover_from_a_corrupted_ODB   In particular the bit about running odbinit with the --cleanup flag?
> 
> Here's what happens when I try to run odbedit:
> 
> [lekhraj@sdfcdmsdaq setup]$ odbedit
> [ODBEdit,ERROR] [odb.cxx:2052:db_open_database,ERROR] Removed ODB client 'ODBEdit', index 0 because process pid 481823 does not exists
> [ODBEdit,INFO] Removed open record flag from "/Experiment/Security/RPC hosts/Allowed hosts"
> [ODBEdit,INFO] Removed exclusive access mode from "/Experiment/Security/RPC hosts/Allowed hosts"
> [ODBEdit,INFO] Corrected 1 ODB entries
> [ODBEdit,INFO] Deleted entry '/System/Clients/481823' for client 'ODBEdit' because it is not connected to ODB
> [ODBEdit,INFO] Client 'ODBEdit' on buffer 'SYSMSG' removed by bm_open_buffer because process pid 481823 does not exist
> [ODBEdit,ERROR] [odb.cxx:2498:db_lock_database,ERROR] cannot lock ODB semaphore, timeout 10000 ms, exiting...
> [ODBEdit,ERROR] [midas.cxx:2205:cm_check_connect,ERROR] cm_disconnect_experiment not called at end of program
> db_lock_database: Detected recursive call to db_{lock,unlock}_database() while already inside db_{lock,unlock}_database(). Maybe this is a call from a signal handler. Cannot continue, aborting...
> Aborted (core dumped)
> [lekhraj@sdfcdmsdaq setup]$
> 
> And mhttpd:
> 
> [lekhraj@sdfcdmsdaq setup]$ mhttpd
> [mhttpd,ERROR] [odb.cxx:2052:db_open_database,ERROR] Removed ODB client 'ODBEdit', index 0 because process pid 601054 does not exists
> [mhttpd,INFO] Removed open record flag from "/Experiment/Security/RPC hosts/Allowed hosts"
> [mhttpd,INFO] Removed exclusive access mode from "/Experiment/Security/RPC hosts/Allowed hosts"
> [mhttpd,INFO] Corrected 1 ODB entries
> [mhttpd,INFO] Deleted entry '/System/Clients/601054' for client 'ODBEdit' because it is not connected to ODB
> [mhttpd,INFO] Client 'ODBEdit' on buffer 'SYSMSG' removed by bm_open_buffer because process pid 601054 does not exist
> [mhttpd,INFO] ODB subtree /Runinfo corrected successfully
> Password protection is off
> Hostlist off, connections from anywhere will be accepted
> Listening on "http://localhost:8080", passwords OFF, hostlist OFF
> Listening on "http://[::1]:8080", passwords OFF, hostlist OFF
> bm_lock_buffer: Lock buffer "SYSMSG" is taking longer than 1 second!
> [mhttpd,ERROR] [odb.cxx:2498:db_lock_database,ERROR] cannot lock ODB semaphore, timeout 10000 ms, exiting...
> [mhttpd,ERROR] [midas.cxx:2205:cm_check_connect,ERROR] cm_disconnect_experiment not called at end of program
> db_lock_database: Detected recursive call to db_{lock,unlock}_database() while already inside db_{lock,unlock}_database(). Maybe this is a call from a signal handler. Cannot continue, aborting...
> Aborted (core dumped)
> [lekhraj@sdfcdmsdaq setup]$
> 
> We have been running everything as a single user, the user who cloned the repositories and owns the directories.
> 
> We did follow the corrupted-ODB cleanup instructions.
> 
> [lekhraj@sdfcdmsdaq setup]$ ls -lh /dev/shm
> total 1.3M
> -rw------- 1 lekhraj dm 1.2M Oct  8 14:13 17468_test_ODB__sdf_home_l_lekhraj_packages_SuperCDMS_DAQ_MidasDAQ_online_
> -rw------- 1 lekhraj dm 114K Oct  7 14:06 17468_test_SYSMSG__sdf_home_l_lekhraj_packages_SuperCDMS_DAQ_MidasDAQ_online_
> 
> [lekhraj@sdfcdmsdaq setup]$ ls -lh ~/packages/SuperCDMS_DAQ/MidasDAQ/online/.*SHM
> -rw-r--r-- 1 lekhraj dm    0 Oct  3 08:46 /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/MidasDAQ/online/.ALARM.SHM
> -rw-r--r-- 1 lekhraj dm    0 Oct  3 08:46 /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/MidasDAQ/online/.ELOG.SHM
> -rw-r--r-- 1 lekhraj dm    0 Oct  3 08:46 /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/MidasDAQ/online/.HISTORY.SHM
> -rw-r--r-- 1 lekhraj dm    0 Oct  3 08:46 /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/MidasDAQ/online/.LAZY.SHM
> -rw-r--r-- 1 lekhraj dm    0 Oct  3 08:46 /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/MidasDAQ/online/.MSG.SHM
> -rw-r--r-- 1 lekhraj dm 1.2M Oct  8 14:12 /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/MidasDAQ/online/.ODB.SHM
> -rw-r--r-- 1 lekhraj dm    0 Oct  3 08:46 /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/MidasDAQ/online/.SYSMSG.SHM
> -rw-r--r-- 1 lekhraj dm    0 Oct  3 08:46 /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/MidasDAQ/online/.SYSTEM.SHM
             Reply  08 Oct 2024, Konstantin Olchanski, Bug Report, Difficulty running MIDAS on Rocky 9.4 
> I read these error messages. There is no ODB corruption. ODB semaphore is locked and all midas programs will fail...

Recovery from this is:
- stop all midas programs (actually they should have all crashed by now)
- identify the ODB semaphore with: ipcs -s -t
- remove the ODB semaphore with: ipcrm sem <semid>
- where <semid> is from the first column of ipcs
- keep deleting semaphores until odbedit works.
- if you delete extra midas sempahores, odbedit will recreate them
- if you delete non-midas semaphores, oh, well...

Little bit better steps for this recovery may be written up by Suzannah in this forum
or in the midas wiki... good luck finding them.

K.O.
    Reply  07 Oct 2024, Konstantin Olchanski, Bug Report, Difficulty running MIDAS on Rocky 9.4 
> We're trying to install the SuperCDMS version of MIDAS on a Rocky 9.4 Virtual 
> Machine and are getting a persistent error when we run mserver.
>
> [mserver,ERROR] [odb.cxx:2498:db_lock_database,ERROR] cannot lock ODB semaphore, 
> timeout 10000 ms, exiting...
> db_lock_database: Detected recursive call to db_{lock,unlock}_database() while 
> already inside db_{lock,unlock}_database(). Maybe this is a call from a signal 
> handler. Cannot continue, aborting...
> Aborted (core dumped)

This is super very bad. Since you have a core dump, please post the stack trace here (or email it to me).

I probably cannot debug your private version of midas and I will recommend that you install and run vanilla midas 
mserver (just while we debug this problem).

Let's look at the core dump stack trace first, but likely we see a problem with System-V semaphores and hopefully it 
is not some breakage due to Red Hat bogosity or due to something specific to running on a virtual machine.

If indeed this is Linux-kernel level breakage of System-V semaphores, solution would be to start using Posix 
semaphores, something I wanted to do for a long time. We already switched MIDAS shared memory from System-V to Posix 
shared memory.

If we are lucky it is just one more crasher bug in ODB. Let's see that core dump stack trace.

K.O.
       Reply  08 Oct 2024, Mark Grimes, Bug Report, Difficulty running MIDAS on Rocky 9.4 
We run Midas with no problems on Rocky 9.4, although not in a virtual machine.  We're very close to the head of `develop`.

I'm fairly sure I've seen an error like this before.  I didn't pay it much attention because it was transitory and I was doing something weird at the time - probably 
stepping through with a debugger and hit a timeout.  It was definitely about an ODB semaphore but I can't recall if it was about a recursive call.

Basically I think Rocky 9.4 is a red herring.  Do you have another crashed copy of mserver/mhttpd running somewhere and stuck in limbo?  If it's a virtual machine 
are you sharing the shared memory location with the host, and running another midas on there?


> > We're trying to install the SuperCDMS version of MIDAS on a Rocky 9.4 Virtual 
> > Machine and are getting a persistent error when we run mserver.
> >
> > [mserver,ERROR] [odb.cxx:2498:db_lock_database,ERROR] cannot lock ODB semaphore, 
> > timeout 10000 ms, exiting...
> > db_lock_database: Detected recursive call to db_{lock,unlock}_database() while 
> > already inside db_{lock,unlock}_database(). Maybe this is a call from a signal 
> > handler. Cannot continue, aborting...
> > Aborted (core dumped)
> 
> This is super very bad. Since you have a core dump, please post the stack trace here (or email it to me).
> 
> I probably cannot debug your private version of midas and I will recommend that you install and run vanilla midas 
> mserver (just while we debug this problem).
> 
> Let's look at the core dump stack trace first, but likely we see a problem with System-V semaphores and hopefully it 
> is not some breakage due to Red Hat bogosity or due to something specific to running on a virtual machine.
> 
> If indeed this is Linux-kernel level breakage of System-V semaphores, solution would be to start using Posix 
> semaphores, something I wanted to do for a long time. We already switched MIDAS shared memory from System-V to Posix 
> shared memory.
> 
> If we are lucky it is just one more crasher bug in ODB. Let's see that core dump stack trace.
> 
> K.O.
          Reply  08 Oct 2024, Konstantin Olchanski, Bug Report, Difficulty running MIDAS on Rocky 9.4 
> Basically I think Rocky 9.4 is a red herring.

This is what likely happened:
- some program crashed while holding the ODB lock semaphore (by ctrl-C at the wrong time or by kill -KILL at the wring time)
- semaphore is locked with a flag "unlock if this program stops"
- this is supposed to ensure ODB lock semaphore never gets stuck in the locked state
- (there is no code in MIDAS to unlock the ODB lock semaphore without locking it first)
- we have observed a malfunction in the Linux kernel, where this automatic unlock does not happen.
- it is rare and so far cannot be reproduced. you can find more about it by searching this forum.
- I think it is a bug in the System-V semaphore code or in the Linux "program stop" code (a path where they fail to call the semaphore unlock handler, and who knows what 
other handlers).
- System-V semaphores are very obsolete, replaced by POSIX semaphores.
- POSIX semaphores do not have the "unlock if this program stops" magic, the user (MIDAS) is responsible with detecting that the program who locked the semaphore is gone 
and and with taking corrective action, i.e. release the lock, automatically.

I do not know if this problem with System-V semaphores is in the generic Linux kernel, or if it is specific
to the Red Hat kernels (they are known to have many patches, deviating quite far from vanilla kernels).

I do not know if this problem is somehow sensitive to virtual machines.

So yes/no, Red Hat derived linux on a virtual machine could be where this problem happens more often that elsewhere.

K.O.
       Reply  08 Oct 2024, Amy Roberts, Bug Report, Difficulty running MIDAS on Rocky 9.4 
> > We're trying to install the SuperCDMS version of MIDAS on a Rocky 9.4 Virtual 
> > Machine and are getting a persistent error when we run mserver.
> >
> > [mserver,ERROR] [odb.cxx:2498:db_lock_database,ERROR] cannot lock ODB semaphore, 
> > timeout 10000 ms, exiting...
> > db_lock_database: Detected recursive call to db_{lock,unlock}_database() while 
> > already inside db_{lock,unlock}_database(). Maybe this is a call from a signal 
> > handler. Cannot continue, aborting...
> > Aborted (core dumped)
> 
> This is super very bad. Since you have a core dump, please post the stack trace here (or email it to me).
> 
> I probably cannot debug your private version of midas and I will recommend that you install and run vanilla midas 
> mserver (just while we debug this problem).
> 
> Let's look at the core dump stack trace first, but likely we see a problem with System-V semaphores and hopefully it 
> is not some breakage due to Red Hat bogosity or due to something specific to running on a virtual machine.
> 
> If indeed this is Linux-kernel level breakage of System-V semaphores, solution would be to start using Posix 
> semaphores, something I wanted to do for a long time. We already switched MIDAS shared memory from System-V to Posix 
> shared memory.
> 
> If we are lucky it is just one more crasher bug in ODB. Let's see that core dump stack trace.
> 
> K.O.

I've uploaded the current core dump at: https://gitlab.com/det-lab/coredumps#.

This was done using the "CDMS" version of MIDAS, I'll compile the current MIDAS repository just to be sure we're seeing 
the same error and report back here!
          Reply  08 Oct 2024, Konstantin Olchanski, Bug Report, Difficulty running MIDAS on Rocky 9.4 
> I've uploaded the current core dump at: https://gitlab.com/det-lab/coredumps#.

I cannot read the core dump without the corresponding executable (and likely all it's shared libraries).

It is best if you run gdb and extract the stack traces on your end.

In case you are not familiar with gdb:

gdb mserver core # start gdb
bt # stack trace of crashed thread
info thr # get list of threads
thr 1
bt
thr 2
bt
# etc, get stack trace of each thread, there should not be too many of them

K.O.
             Reply  10 Oct 2024, Amy Roberts, Bug Report, Difficulty running MIDAS on Rocky 9.4 
> > I've uploaded the current core dump at: https://gitlab.com/det-lab/coredumps#.
> 
> I cannot read the core dump without the corresponding executable (and likely all it's shared libraries).
> 
> It is best if you run gdb and extract the stack traces on your end.
> 
> In case you are not familiar with gdb:
> 
> gdb mserver core # start gdb
> bt # stack trace of crashed thread
> info thr # get list of threads
> thr 1
> bt
> thr 2
> bt
> # etc, get stack trace of each thread, there should not be too many of them
> 
> K.O.

Hi Konstantin, thanks for the instructions.  I do appear to be missing some debug symbols, but the output 
looks potentially useful:

[lekhraj@sdfcdmsdaq ~]$ gdb mserver 
core.mserver.17468.b174bb74f2bb44f9a0905e78ec6b2677.601715.1728422354000000
GNU gdb (GDB) Rocky Linux 10.2-11.1.el9_3
...
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from mserver...
[New LWP 601715]

warning: Section `.reg-xstate/601715' in core file too small.
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `mserver'.
Program terminated with signal SIGABRT, Aborted.

warning: Section `.reg-xstate/601715' in core file too small.
#0  0x00007fbdeaca154c in __pthread_kill_implementation () from /lib64/libc.so.6
Missing separate debuginfos, use: dnf debuginfo-install glibc-2.34-83.el9.12.x86_64 libgcc-11.4.1-
3.el9.x86_64 libstdc++-11.4.1-2.1.el9.x86_64 libzstd-1.5.1-2.el9.x86_64 mysql-libs-8.0.36-1.el9_3.x86_64 
openssl-libs-3.0.7-25.el9_3.x86_64 zlib-1.2.11-40.el9.x86_64
(gdb)
(gdb) bt
#0  0x00007fbdeaca154c in __pthread_kill_implementation () from /lib64/libc.so.6
#1  0x00007fbdeac54d06 in raise () from /lib64/libc.so.6
#2  0x00007fbdeac287f3 in abort () from /lib64/libc.so.6
#3  0x0000000000430ee4 in db_lock_database (hDB=hDB@entry=1)
    at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/odb.cxx:2473
#4  0x0000000000437e9c in db_find_key (subhKey=0x7ffcc536d348, key_name=0x4687a8 "/Logger/Message file date 
format",
    hKey=0, hDB=1) at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/odb.cxx:4099
#5  db_find_key (hDB=1, hKey=0, key_name=0x4687a8 "/Logger/Message file date format", 
subhKey=0x7ffcc536d348)
    at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/odb.cxx:4075
#6  0x0000000000448297 in db_get_value_string (hdb=1, hKeyRoot=hKeyRoot@entry=0,
    key_name=key_name@entry=0x4687a8 "/Logger/Message file date format", index=index@entry=0,
    s=s@entry=0x7ffcc536d470, create=create@entry=1, create_string_length=0)
    at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/odb.cxx:13950
#7  0x000000000040a690 in cm_msg_get_logfile (fac=<optimized out>, t=<optimized out>, 
filename=0x7ffcc536d690,
    linkname=0x7ffcc536d6b0, linktarget=0x7ffcc536d6d0)
    at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/midas.cxx:573
#8  0x000000000041a307 in cm_msg_log (message_type=1, facility=0x46db0e "midas",
    message=0x7e4290 "[mserver,ERROR] [odb.cxx:2498:db_lock_database,ERROR] cannot lock ODB semaphore, 
timeout 10000 ms, exiting...") at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/midas.cxx:685
#9  0x0000000000421fcd in cm_msg_flush_buffer () at /usr/include/c++/11/bits/basic_string.h:194
#10 0x00007fbdeac574dd in __run_exit_handlers () from /lib64/libc.so.6
#11 0x00007fbdeac57620 in exit () from /lib64/libc.so.6
#12 0x0000000000430f7a in db_lock_database (hDB=hDB@entry=1)
    at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/odb.cxx:2499
#13 0x0000000000437e9c in db_find_key (subhKey=0x7ffcc536da04, key_name=0x476a21 "/Alarms/Alarms", hKey=0, 
hDB=1)
    at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/odb.cxx:4099
#14 db_find_key (hDB=1, hKey=hKey@entry=0, key_name=key_name@entry=0x476a21 "/Alarms/Alarms",
    subhKey=subhKey@entry=0x7ffcc536da04) at 
/sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/odb.cxx:4075
#15 0x0000000000455fd2 in al_check () at 
/sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/alarm.cxx:614
--Type <RET> for more, q to quit, c to continue without paging--
#16 0x000000000041ff85 in cm_periodic_tasks ()
    at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/midas.cxx:5596
#17 0x00000000004235c5 in cm_yield (millisec=millisec@entry=1000)
    at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/midas.cxx:5676
#18 0x00000000004065c2 in main (argc=<optimized out>, argv=0x7ffcc536e628)
    at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/progs/mserver.cxx:295
(gdb) info thr
  Id   Target Id                          Frame
* 1    Thread 0x7fbdec0b1740 (LWP 601715) 0x00007fbdeaca154c in __pthread_kill_implementation () from 
/lib64/libc.so.6
(gdb) thr 1
[Switching to thread 1 (Thread 0x7fbdec0b1740 (LWP 601715))]
#0  0x00007fbdeaca154c in __pthread_kill_implementation () from /lib64/libc.so.6
(gdb) bt
#0  0x00007fbdeaca154c in __pthread_kill_implementation () from /lib64/libc.so.6
#1  0x00007fbdeac54d06 in raise () from /lib64/libc.so.6
#2  0x00007fbdeac287f3 in abort () from /lib64/libc.so.6
#3  0x0000000000430ee4 in db_lock_database (hDB=hDB@entry=1)
    at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/odb.cxx:2473
#4  0x0000000000437e9c in db_find_key (subhKey=0x7ffcc536d348, key_name=0x4687a8 "/Logger/Message file date 
format",
    hKey=0, hDB=1) at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/odb.cxx:4099
#5  db_find_key (hDB=1, hKey=0, key_name=0x4687a8 "/Logger/Message file date format", 
subhKey=0x7ffcc536d348)
    at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/odb.cxx:4075
#6  0x0000000000448297 in db_get_value_string (hdb=1, hKeyRoot=hKeyRoot@entry=0,
    key_name=key_name@entry=0x4687a8 "/Logger/Message file date format", index=index@entry=0,
    s=s@entry=0x7ffcc536d470, create=create@entry=1, create_string_length=0)
    at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/odb.cxx:13950
#7  0x000000000040a690 in cm_msg_get_logfile (fac=<optimized out>, t=<optimized out>, 
filename=0x7ffcc536d690,
    linkname=0x7ffcc536d6b0, linktarget=0x7ffcc536d6d0)
    at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/midas.cxx:573
#8  0x000000000041a307 in cm_msg_log (message_type=1, facility=0x46db0e "midas",
    message=0x7e4290 "[mserver,ERROR] [odb.cxx:2498:db_lock_database,ERROR] cannot lock ODB semaphore, 
timeout 10000 ms, exiting...") at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/midas.cxx:685
#9  0x0000000000421fcd in cm_msg_flush_buffer () at /usr/include/c++/11/bits/basic_string.h:194
#10 0x00007fbdeac574dd in __run_exit_handlers () from /lib64/libc.so.6
#11 0x00007fbdeac57620 in exit () from /lib64/libc.so.6
#12 0x0000000000430f7a in db_lock_database (hDB=hDB@entry=1)
    at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/odb.cxx:2499
#13 0x0000000000437e9c in db_find_key (subhKey=0x7ffcc536da04, key_name=0x476a21 "/Alarms/Alarms", hKey=0, 
hDB=1)
    at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/odb.cxx:4099
#14 db_find_key (hDB=1, hKey=hKey@entry=0, key_name=key_name@entry=0x476a21 "/Alarms/Alarms",
    subhKey=subhKey@entry=0x7ffcc536da04) at 
/sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/odb.cxx:4075
#15 0x0000000000455fd2 in al_check () at 
/sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/alarm.cxx:614
#16 0x000000000041ff85 in cm_periodic_tasks ()
    at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/midas.cxx:5596
#17 0x00000000004235c5 in cm_yield (millisec=millisec@entry=1000)
    at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/midas.cxx:5676
#18 0x00000000004065c2 in main (argc=<optimized out>, argv=0x7ffcc536e628)
    at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/progs/mserver.cxx:295
(gdb)
                Reply  13 Oct 2024, Konstantin Olchanski, Bug Report, Difficulty running MIDAS on Rocky 9.4 
Thank you for the stack trace, I fixed the buglet that cause midas programs to crash twice,
once on failure to lock ODB, then call exit() -> atexit() handlers -> cm_check_connect() -> crash on ODB lock 
failure is the cm_msg() codes.

Replaced exit(1) with abort(). Could have used kill(getpid(),SIGKILL) to avoid making a core dump, but what the 
heck...

Of course this does nothing to the original bug where ODB was locked and nobody will ever unlock it (reboot will 
unlock it!).

commit bdd1d7fdc093b5a8d54a1b8467002bb3cac3ac11

K.O.


> > > I've uploaded the current core dump at: https://gitlab.com/det-lab/coredumps#.
> > 
> > I cannot read the core dump without the corresponding executable (and likely all it's shared libraries).
> > 
> > It is best if you run gdb and extract the stack traces on your end.
> > 
> > In case you are not familiar with gdb:
> > 
> > gdb mserver core # start gdb
> > bt # stack trace of crashed thread
> > info thr # get list of threads
> > thr 1
> > bt
> > thr 2
> > bt
> > # etc, get stack trace of each thread, there should not be too many of them
> > 
> > K.O.
> 
> Hi Konstantin, thanks for the instructions.  I do appear to be missing some debug symbols, but the output 
> looks potentially useful:
> 
> [lekhraj@sdfcdmsdaq ~]$ gdb mserver 
> core.mserver.17468.b174bb74f2bb44f9a0905e78ec6b2677.601715.1728422354000000
> GNU gdb (GDB) Rocky Linux 10.2-11.1.el9_3
> ...
> For help, type "help".
> Type "apropos word" to search for commands related to "word"...
> Reading symbols from mserver...
> [New LWP 601715]
> 
> warning: Section `.reg-xstate/601715' in core file too small.
> [Thread debugging using libthread_db enabled]
> Using host libthread_db library "/lib64/libthread_db.so.1".
> Core was generated by `mserver'.
> Program terminated with signal SIGABRT, Aborted.
> 
> warning: Section `.reg-xstate/601715' in core file too small.
> #0  0x00007fbdeaca154c in __pthread_kill_implementation () from /lib64/libc.so.6
> Missing separate debuginfos, use: dnf debuginfo-install glibc-2.34-83.el9.12.x86_64 libgcc-11.4.1-
> 3.el9.x86_64 libstdc++-11.4.1-2.1.el9.x86_64 libzstd-1.5.1-2.el9.x86_64 mysql-libs-8.0.36-1.el9_3.x86_64 
> openssl-libs-3.0.7-25.el9_3.x86_64 zlib-1.2.11-40.el9.x86_64
> (gdb)
> (gdb) bt
> #0  0x00007fbdeaca154c in __pthread_kill_implementation () from /lib64/libc.so.6
> #1  0x00007fbdeac54d06 in raise () from /lib64/libc.so.6
> #2  0x00007fbdeac287f3 in abort () from /lib64/libc.so.6
> #3  0x0000000000430ee4 in db_lock_database (hDB=hDB@entry=1)
>     at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/odb.cxx:2473
> #4  0x0000000000437e9c in db_find_key (subhKey=0x7ffcc536d348, key_name=0x4687a8 "/Logger/Message file date 
> format",
>     hKey=0, hDB=1) at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/odb.cxx:4099
> #5  db_find_key (hDB=1, hKey=0, key_name=0x4687a8 "/Logger/Message file date format", 
> subhKey=0x7ffcc536d348)
>     at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/odb.cxx:4075
> #6  0x0000000000448297 in db_get_value_string (hdb=1, hKeyRoot=hKeyRoot@entry=0,
>     key_name=key_name@entry=0x4687a8 "/Logger/Message file date format", index=index@entry=0,
>     s=s@entry=0x7ffcc536d470, create=create@entry=1, create_string_length=0)
>     at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/odb.cxx:13950
> #7  0x000000000040a690 in cm_msg_get_logfile (fac=<optimized out>, t=<optimized out>, 
> filename=0x7ffcc536d690,
>     linkname=0x7ffcc536d6b0, linktarget=0x7ffcc536d6d0)
>     at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/midas.cxx:573
> #8  0x000000000041a307 in cm_msg_log (message_type=1, facility=0x46db0e "midas",
>     message=0x7e4290 "[mserver,ERROR] [odb.cxx:2498:db_lock_database,ERROR] cannot lock ODB semaphore, 
> timeout 10000 ms, exiting...") at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/midas.cxx:685
> #9  0x0000000000421fcd in cm_msg_flush_buffer () at /usr/include/c++/11/bits/basic_string.h:194
> #10 0x00007fbdeac574dd in __run_exit_handlers () from /lib64/libc.so.6
> #11 0x00007fbdeac57620 in exit () from /lib64/libc.so.6
> #12 0x0000000000430f7a in db_lock_database (hDB=hDB@entry=1)
>     at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/odb.cxx:2499
> #13 0x0000000000437e9c in db_find_key (subhKey=0x7ffcc536da04, key_name=0x476a21 "/Alarms/Alarms", hKey=0, 
> hDB=1)
>     at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/odb.cxx:4099
> #14 db_find_key (hDB=1, hKey=hKey@entry=0, key_name=key_name@entry=0x476a21 "/Alarms/Alarms",
>     subhKey=subhKey@entry=0x7ffcc536da04) at 
> /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/odb.cxx:4075
> #15 0x0000000000455fd2 in al_check () at 
> /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/alarm.cxx:614
> --Type <RET> for more, q to quit, c to continue without paging--
> #16 0x000000000041ff85 in cm_periodic_tasks ()
>     at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/midas.cxx:5596
> #17 0x00000000004235c5 in cm_yield (millisec=millisec@entry=1000)
>     at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/midas.cxx:5676
> #18 0x00000000004065c2 in main (argc=<optimized out>, argv=0x7ffcc536e628)
>     at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/progs/mserver.cxx:295
> (gdb) info thr
>   Id   Target Id                          Frame
> * 1    Thread 0x7fbdec0b1740 (LWP 601715) 0x00007fbdeaca154c in __pthread_kill_implementation () from 
> /lib64/libc.so.6
> (gdb) thr 1
> [Switching to thread 1 (Thread 0x7fbdec0b1740 (LWP 601715))]
> #0  0x00007fbdeaca154c in __pthread_kill_implementation () from /lib64/libc.so.6
> (gdb) bt
> #0  0x00007fbdeaca154c in __pthread_kill_implementation () from /lib64/libc.so.6
> #1  0x00007fbdeac54d06 in raise () from /lib64/libc.so.6
> #2  0x00007fbdeac287f3 in abort () from /lib64/libc.so.6
> #3  0x0000000000430ee4 in db_lock_database (hDB=hDB@entry=1)
>     at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/odb.cxx:2473
> #4  0x0000000000437e9c in db_find_key (subhKey=0x7ffcc536d348, key_name=0x4687a8 "/Logger/Message file date 
> format",
>     hKey=0, hDB=1) at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/odb.cxx:4099
> #5  db_find_key (hDB=1, hKey=0, key_name=0x4687a8 "/Logger/Message file date format", 
> subhKey=0x7ffcc536d348)
>     at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/odb.cxx:4075
> #6  0x0000000000448297 in db_get_value_string (hdb=1, hKeyRoot=hKeyRoot@entry=0,
>     key_name=key_name@entry=0x4687a8 "/Logger/Message file date format", index=index@entry=0,
>     s=s@entry=0x7ffcc536d470, create=create@entry=1, create_string_length=0)
>     at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/odb.cxx:13950
> #7  0x000000000040a690 in cm_msg_get_logfile (fac=<optimized out>, t=<optimized out>, 
> filename=0x7ffcc536d690,
>     linkname=0x7ffcc536d6b0, linktarget=0x7ffcc536d6d0)
>     at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/midas.cxx:573
> #8  0x000000000041a307 in cm_msg_log (message_type=1, facility=0x46db0e "midas",
>     message=0x7e4290 "[mserver,ERROR] [odb.cxx:2498:db_lock_database,ERROR] cannot lock ODB semaphore, 
> timeout 10000 ms, exiting...") at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/midas.cxx:685
> #9  0x0000000000421fcd in cm_msg_flush_buffer () at /usr/include/c++/11/bits/basic_string.h:194
> #10 0x00007fbdeac574dd in __run_exit_handlers () from /lib64/libc.so.6
> #11 0x00007fbdeac57620 in exit () from /lib64/libc.so.6
> #12 0x0000000000430f7a in db_lock_database (hDB=hDB@entry=1)
>     at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/odb.cxx:2499
> #13 0x0000000000437e9c in db_find_key (subhKey=0x7ffcc536da04, key_name=0x476a21 "/Alarms/Alarms", hKey=0, 
> hDB=1)
>     at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/odb.cxx:4099
> #14 db_find_key (hDB=1, hKey=hKey@entry=0, key_name=key_name@entry=0x476a21 "/Alarms/Alarms",
>     subhKey=subhKey@entry=0x7ffcc536da04) at 
> /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/odb.cxx:4075
> #15 0x0000000000455fd2 in al_check () at 
> /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/alarm.cxx:614
> #16 0x000000000041ff85 in cm_periodic_tasks ()
>     at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/midas.cxx:5596
> #17 0x00000000004235c5 in cm_yield (millisec=millisec@entry=1000)
>     at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/midas.cxx:5676
> #18 0x00000000004065c2 in main (argc=<optimized out>, argv=0x7ffcc536e628)
>     at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/progs/mserver.cxx:295
> (gdb)
                   Reply  16 Oct 2024, Amy Roberts, Bug Report, Difficulty running MIDAS on Rocky 9.4 
> Thank you for the stack trace, I fixed the buglet that cause midas programs to crash twice,
> once on failure to lock ODB, then call exit() -> atexit() handlers -> cm_check_connect() -> crash on ODB lock 
> failure is the cm_msg() codes.
> 
> Replaced exit(1) with abort(). Could have used kill(getpid(),SIGKILL) to avoid making a core dump, but what the 
> heck...
> 
> Of course this does nothing to the original bug where ODB was locked and nobody will ever unlock it (reboot will 
> unlock it!).
> 
> commit bdd1d7fdc093b5a8d54a1b8467002bb3cac3ac11
> 
> K.O.
> 
> 
> > > > I've uploaded the current core dump at: https://gitlab.com/det-lab/coredumps#.
> > > 
> > > I cannot read the core dump without the corresponding executable (and likely all it's shared libraries).
> > > 
> > > It is best if you run gdb and extract the stack traces on your end.
> > > 
> > > In case you are not familiar with gdb:
> > > 
> > > gdb mserver core # start gdb
> > > bt # stack trace of crashed thread
> > > info thr # get list of threads
> > > thr 1
> > > bt
> > > thr 2
> > > bt
> > > # etc, get stack trace of each thread, there should not be too many of them
> > > 
> > > K.O.
> > 
> > Hi Konstantin, thanks for the instructions.  I do appear to be missing some debug symbols, but the output 
> > looks potentially useful:
> > 
> > [lekhraj@sdfcdmsdaq ~]$ gdb mserver 
> > core.mserver.17468.b174bb74f2bb44f9a0905e78ec6b2677.601715.1728422354000000
> > GNU gdb (GDB) Rocky Linux 10.2-11.1.el9_3
> > ...
> > For help, type "help".
> > Type "apropos word" to search for commands related to "word"...
> > Reading symbols from mserver...
> > [New LWP 601715]
> > 
> > warning: Section `.reg-xstate/601715' in core file too small.
> > [Thread debugging using libthread_db enabled]
> > Using host libthread_db library "/lib64/libthread_db.so.1".
> > Core was generated by `mserver'.
> > Program terminated with signal SIGABRT, Aborted.
> > 
> > warning: Section `.reg-xstate/601715' in core file too small.
> > #0  0x00007fbdeaca154c in __pthread_kill_implementation () from /lib64/libc.so.6
> > Missing separate debuginfos, use: dnf debuginfo-install glibc-2.34-83.el9.12.x86_64 libgcc-11.4.1-
> > 3.el9.x86_64 libstdc++-11.4.1-2.1.el9.x86_64 libzstd-1.5.1-2.el9.x86_64 mysql-libs-8.0.36-1.el9_3.x86_64 
> > openssl-libs-3.0.7-25.el9_3.x86_64 zlib-1.2.11-40.el9.x86_64
> > (gdb)
> > (gdb) bt
> > #0  0x00007fbdeaca154c in __pthread_kill_implementation () from /lib64/libc.so.6
> > #1  0x00007fbdeac54d06 in raise () from /lib64/libc.so.6
> > #2  0x00007fbdeac287f3 in abort () from /lib64/libc.so.6
> > #3  0x0000000000430ee4 in db_lock_database (hDB=hDB@entry=1)
> >     at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/odb.cxx:2473
> > #4  0x0000000000437e9c in db_find_key (subhKey=0x7ffcc536d348, key_name=0x4687a8 "/Logger/Message file date 
> > format",
> >     hKey=0, hDB=1) at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/odb.cxx:4099
> > #5  db_find_key (hDB=1, hKey=0, key_name=0x4687a8 "/Logger/Message file date format", 
> > subhKey=0x7ffcc536d348)
> >     at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/odb.cxx:4075
> > #6  0x0000000000448297 in db_get_value_string (hdb=1, hKeyRoot=hKeyRoot@entry=0,
> >     key_name=key_name@entry=0x4687a8 "/Logger/Message file date format", index=index@entry=0,
> >     s=s@entry=0x7ffcc536d470, create=create@entry=1, create_string_length=0)
> >     at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/odb.cxx:13950
> > #7  0x000000000040a690 in cm_msg_get_logfile (fac=<optimized out>, t=<optimized out>, 
> > filename=0x7ffcc536d690,
> >     linkname=0x7ffcc536d6b0, linktarget=0x7ffcc536d6d0)
> >     at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/midas.cxx:573
> > #8  0x000000000041a307 in cm_msg_log (message_type=1, facility=0x46db0e "midas",
> >     message=0x7e4290 "[mserver,ERROR] [odb.cxx:2498:db_lock_database,ERROR] cannot lock ODB semaphore, 
> > timeout 10000 ms, exiting...") at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/midas.cxx:685
> > #9  0x0000000000421fcd in cm_msg_flush_buffer () at /usr/include/c++/11/bits/basic_string.h:194
> > #10 0x00007fbdeac574dd in __run_exit_handlers () from /lib64/libc.so.6
> > #11 0x00007fbdeac57620 in exit () from /lib64/libc.so.6
> > #12 0x0000000000430f7a in db_lock_database (hDB=hDB@entry=1)
> >     at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/odb.cxx:2499
> > #13 0x0000000000437e9c in db_find_key (subhKey=0x7ffcc536da04, key_name=0x476a21 "/Alarms/Alarms", hKey=0, 
> > hDB=1)
> >     at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/odb.cxx:4099
> > #14 db_find_key (hDB=1, hKey=hKey@entry=0, key_name=key_name@entry=0x476a21 "/Alarms/Alarms",
> >     subhKey=subhKey@entry=0x7ffcc536da04) at 
> > /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/odb.cxx:4075
> > #15 0x0000000000455fd2 in al_check () at 
> > /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/alarm.cxx:614
> > --Type <RET> for more, q to quit, c to continue without paging--
> > #16 0x000000000041ff85 in cm_periodic_tasks ()
> >     at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/midas.cxx:5596
> > #17 0x00000000004235c5 in cm_yield (millisec=millisec@entry=1000)
> >     at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/midas.cxx:5676
> > #18 0x00000000004065c2 in main (argc=<optimized out>, argv=0x7ffcc536e628)
> >     at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/progs/mserver.cxx:295
> > (gdb) info thr
> >   Id   Target Id                          Frame
> > * 1    Thread 0x7fbdec0b1740 (LWP 601715) 0x00007fbdeaca154c in __pthread_kill_implementation () from 
> > /lib64/libc.so.6
> > (gdb) thr 1
> > [Switching to thread 1 (Thread 0x7fbdec0b1740 (LWP 601715))]
> > #0  0x00007fbdeaca154c in __pthread_kill_implementation () from /lib64/libc.so.6
> > (gdb) bt
> > #0  0x00007fbdeaca154c in __pthread_kill_implementation () from /lib64/libc.so.6
> > #1  0x00007fbdeac54d06 in raise () from /lib64/libc.so.6
> > #2  0x00007fbdeac287f3 in abort () from /lib64/libc.so.6
> > #3  0x0000000000430ee4 in db_lock_database (hDB=hDB@entry=1)
> >     at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/odb.cxx:2473
> > #4  0x0000000000437e9c in db_find_key (subhKey=0x7ffcc536d348, key_name=0x4687a8 "/Logger/Message file date 
> > format",
> >     hKey=0, hDB=1) at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/odb.cxx:4099
> > #5  db_find_key (hDB=1, hKey=0, key_name=0x4687a8 "/Logger/Message file date format", 
> > subhKey=0x7ffcc536d348)
> >     at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/odb.cxx:4075
> > #6  0x0000000000448297 in db_get_value_string (hdb=1, hKeyRoot=hKeyRoot@entry=0,
> >     key_name=key_name@entry=0x4687a8 "/Logger/Message file date format", index=index@entry=0,
> >     s=s@entry=0x7ffcc536d470, create=create@entry=1, create_string_length=0)
> >     at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/odb.cxx:13950
> > #7  0x000000000040a690 in cm_msg_get_logfile (fac=<optimized out>, t=<optimized out>, 
> > filename=0x7ffcc536d690,
> >     linkname=0x7ffcc536d6b0, linktarget=0x7ffcc536d6d0)
> >     at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/midas.cxx:573
> > #8  0x000000000041a307 in cm_msg_log (message_type=1, facility=0x46db0e "midas",
> >     message=0x7e4290 "[mserver,ERROR] [odb.cxx:2498:db_lock_database,ERROR] cannot lock ODB semaphore, 
> > timeout 10000 ms, exiting...") at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/midas.cxx:685
> > #9  0x0000000000421fcd in cm_msg_flush_buffer () at /usr/include/c++/11/bits/basic_string.h:194
> > #10 0x00007fbdeac574dd in __run_exit_handlers () from /lib64/libc.so.6
> > #11 0x00007fbdeac57620 in exit () from /lib64/libc.so.6
> > #12 0x0000000000430f7a in db_lock_database (hDB=hDB@entry=1)
> >     at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/odb.cxx:2499
> > #13 0x0000000000437e9c in db_find_key (subhKey=0x7ffcc536da04, key_name=0x476a21 "/Alarms/Alarms", hKey=0, 
> > hDB=1)
> >     at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/odb.cxx:4099
> > #14 db_find_key (hDB=1, hKey=hKey@entry=0, key_name=key_name@entry=0x476a21 "/Alarms/Alarms",
> >     subhKey=subhKey@entry=0x7ffcc536da04) at 
> > /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/odb.cxx:4075
> > #15 0x0000000000455fd2 in al_check () at 
> > /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/alarm.cxx:614
> > #16 0x000000000041ff85 in cm_periodic_tasks ()
> >     at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/midas.cxx:5596
> > #17 0x00000000004235c5 in cm_yield (millisec=millisec@entry=1000)
> >     at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/midas.cxx:5676
> > #18 0x00000000004065c2 in main (argc=<optimized out>, argv=0x7ffcc536e628)
> >     at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/progs/mserver.cxx:295
> > (gdb)

I checked out the modified version of Midas and recompiled, and am still getting a similar error when I try to run 
odbedit:

[aroberts@sdfcdmsdaq midas]$ odbedit
[ODBEdit,ERROR] [odb.cxx:2043:db_open_database,ERROR] Removed ODB client 'ODBEdit', index 0 because process pid 
1615051 does not exists
[ODBEdit,INFO] Removed open record flag from "/Experiment/Security/RPC hosts/Allowed hosts"
[ODBEdit,INFO] Removed exclusive access mode from "/Experiment/Security/RPC hosts/Allowed hosts"
[ODBEdit,INFO] Corrected 1 ODB entries
[ODBEdit,INFO] Deleted entry '/System/Clients/1615051' for client 'ODBEdit' because it is not connected to ODB
[ODBEdit,INFO] Client 'ODBEdit' on buffer 'SYSMSG' removed by bm_open_buffer because process pid 1615051 does not 
exist
[ODBEdit,ERROR] [odb.cxx:2489:db_lock_database,ERROR] cannot lock ODB semaphore, timeout 10000 ms, aborting...
Aborted (core dumped)

I'm not sure what's causing the call to lock the database, all I'm doing is typing "odbedit" in the command prompt.

I should add that I followed the instructions for unlocking the ODB, but when I call "odbedit" this error still 
appears:

[aroberts@sdfcdmsdaq midas]$ ipcs -s -t

------ Semaphore Operation/Change Times --------
semid    owner      last-op                    last-changed
4        aroberts    Wed Oct 16 11:44:10 2024   Wed Oct 16 11:44:00 2024

[aroberts@sdfcdmsdaq midas]$ ipcrm sem 4
resource(s) deleted
[aroberts@sdfcdmsdaq midas]$ odbedit
[ODBEdit,ERROR] [odb.cxx:2043:db_open_database,ERROR] Removed ODB client 'ODBEdit', index 0 because process pid 
1617050 does not exists
[ODBEdit,INFO] Removed open record flag from "/Experiment/Security/RPC hosts/Allowed hosts"
[ODBEdit,INFO] Removed exclusive access mode from "/Experiment/Security/RPC hosts/Allowed hosts"
[ODBEdit,INFO] Corrected 1 ODB entries
[ODBEdit,INFO] Deleted entry '/System/Clients/1617050' for client 'ODBEdit' because it is not connected to ODB
[ODBEdit,INFO] Client 'ODBEdit' on buffer 'SYSMSG' removed by bm_open_buffer because process pid 1617050 does not 
exist
[ODBEdit,ERROR] [odb.cxx:2489:db_lock_database,ERROR] cannot lock ODB semaphore, timeout 10000 ms, aborting...
Aborted (core dumped)
                      Reply  18 Oct 2024, Konstantin Olchanski, Bug Report, Difficulty running MIDAS on Rocky 9.4 
> [aroberts@sdfcdmsdaq midas]$ odbedit
> [ODBEdit,ERROR] [odb.cxx:2043:db_open_database,ERROR] Removed ODB client 'ODBEdit', index 0 because process pid 
> 1615051 does not exists
> [ODBEdit,INFO] Removed open record flag from "/Experiment/Security/RPC hosts/Allowed hosts"
> [ODBEdit,INFO] Removed exclusive access mode from "/Experiment/Security/RPC hosts/Allowed hosts"
> [ODBEdit,INFO] Corrected 1 ODB entries
> [ODBEdit,INFO] Deleted entry '/System/Clients/1615051' for client 'ODBEdit' because it is not connected to ODB
> [ODBEdit,INFO] Client 'ODBEdit' on buffer 'SYSMSG' removed by bm_open_buffer because process pid 1615051 does not 
> exist

so far, so good, we connected to ODB (lock was not stuck), cleared out client "odbedit" with pid 1615051 that crashed 
without properly disconnecting. ODB semaphore is working correctly.

> [ODBEdit,ERROR] [odb.cxx:2489:db_lock_database,ERROR] cannot lock ODB semaphore, timeout 10000 ms, aborting...
> Aborted (core dumped)

suddenly, an ODB semaphore timeout...

can you post the stack trace from this core dump? I am pretty sure it will be boring, but just in case...

K.O.
                         Reply  18 Oct 2024, Konstantin Olchanski, Bug Report, Difficulty running MIDAS on Rocky 9.4 
> suddenly, an ODB semaphore timeout...

I am wondering if something bizarre is going on, like the system clock going backwards. I heard of things like that 
happening in virtual environments.

https://stackoverflow.com/questions/4801122/how-to-stop-time-from-running-backwards-on-linux

I added some debugging information to the semaphore locking code. Please update to commit 
eb625af119067f6d702211542d88a28ccb57ad2c of src/system.cxx (plus small change in include/msystem.h) and try again.

Now for each timeout it will print detailed syscall and timing information, if time goes backwards, it should catch it.

K.O.
                            Reply  28 Oct 2024, Amy Roberts, Bug Report, Difficulty running MIDAS on Rocky 9.4 
> Now for each timeout it will print detailed syscall and timing information, if time goes backwards, it should catch it.

It appears that time is moving forward:

[aroberts@sdfcdmsdaq build]$ odbedit
[ODBEdit,ERROR] [odb.cxx:2043:db_open_database,ERROR] Removed ODB client 'ODBEdit', index 0 because process pid 1617119 does 
not exists
[ODBEdit,INFO] Removed open record flag from "/Experiment/Security/RPC hosts/Allowed hosts"
[ODBEdit,INFO] Removed exclusive access mode from "/Experiment/Security/RPC hosts/Allowed hosts"
[ODBEdit,INFO] Corrected 1 ODB entries
[ODBEdit,INFO] Deleted entry '/System/Clients/1617119' for client 'ODBEdit' because it is not connected to ODB
[ODBEdit,INFO] Client 'ODBEdit' on buffer 'SYSMSG' removed by bm_open_buffer because process pid 1617119 does not exist
[local:amy_test:S]/>ss_semaphore_wait_for: semop/semtimedop(5) returned -1, errno 11 (Resource temporarily unavailable), 
start time 0xd4fd98f6, now 0xd4fdc0ef, dt 0x000027f9, timeout 0x00002710 ms, SEMAPHORE TIMEOUT!
[ODBEdit,ERROR] [odb.cxx:2489:db_lock_database,ERROR] cannot lock ODB semaphore, timeout 10000 ms, aborting...
Aborted (core dumped)
    Reply  06 Nov 2024, Amy Roberts, Bug Report, Difficulty running MIDAS on Rocky 9.4 
After following Konstantin's debugging suggestions, I thought I would try to replicate 
the issue on my own computer.  My hope was that I could provide instructions for 
replicating the bug so that the MIDAS team could try debugging things more easily.

However, when I ran the current version of MIDAS in a Rocky 9.4 VM on my laptop (both 
VMWare and VirtualBox), mserver and odbedit ran just fine (!).

I'm currently trying to find out if there's a way to compare the VMs on my machine and 
the machine that's being problematic, I'll report back if I learn anything.
       Reply  22 Nov 2024, Konstantin Olchanski, Bug Report, ODB lock timeout, Difficulty running MIDAS on Rocky 9.4 
> try to replicate the issue ...

I see ODB lock timeout (and abort() of everything) in the dsvslice test station. We have 
about 15-20 MIDAS clients connected.

I am pretty sure we have not seen this problem until recently (and I have not seen it 
personally for a very long time). There were no changes to the MIDAS ODB locking code in a 
long time.

I suspect a recent change in the linux kernel. But I am likely to be wrong.

I have 1000 core dumps from this crash of dsvslice, and among them should be the 1 thread
that has ODB locked. Wish me luck finding it. Worst case is to discover that ODB is locked 
but nobody is holding a lock ("missing unlock bug"). This is hard to debug, I would have add 
tracking of "who was the last one to lock it, who forgot to unlock it".

K.O.
          Reply  24 Nov 2024, Pavel Murat, Bug Report, ODB lock timeout, Difficulty running MIDAS on Rocky 9.4 
there is a really good software tool developed by the Fermilab DAQ group, called TRACE - 

https://github.com/art-daq/trace ,

It could be useful for debugging cases like this one. In short, TRACE instruments the code 
with the printouts which could be selectively turned on and off without recompiling the executable. 

TRACE output could go to /dev/stdout (slow output) and/or to a circular buffer implemented via a shared 
memory segment (fast output). Sending unlimited output to the shared memory segment is extremely useful.

TRACE also allows to trigger on certain conditions, again, w/o recompiling the executable. 
For debugging cases like the one in question, that could turn out even more useful, 
however I didn't try the triggering functionality myself. 

-- regards, Pasha
Entry  21 Nov 2024, Mann Gandhi, Info, What do the status numbers mean and where can I find more information about them? 
Hello, 

This is the error message I got:

[RP Streaming Frontend,ERROR] [midas.cxx:17806:cm_write_event_to_odb,ERROR] 
cannot create key for bank "DATA" with tid 24 in ODB, db_create_key() status 309
read_periodic_event: No data in ring buffer or error occurred

[RP Streaming Frontend,ERROR] [odb.cxx:3373:db_create_key,ERROR] invalid key type 
24 to create 'DATA' in '/Equipment/Periodic/Variables'

[RP Streaming Frontend,ERROR] [midas.cxx:17806:cm_write_event_to_odb,ERROR] 
cannot create key for bank "DATA" with tid 24 in ODB, db_create_key() status 309



I just need more information on what the error message means. Which data type 
refers to tid 24 and what does status 309 indicate?? 

There is definitely data in the ring buffer but I keep on getting this error.

Thank you!

M.G 
    Reply  21 Nov 2024, Stefan Ritt, Info, What do the status numbers mean and where can I find more information about them? 
> [RP Streaming Frontend,ERROR] [midas.cxx:17806:cm_write_event_to_odb,ERROR] 
> cannot create key for bank "DATA" with tid 24 in ODB, db_create_key() status 309
> 
> 
> 
> I just need more information on what the error message means. Which data type 
> refers to tid 24 and what does status 309 indicate?? 

A tid (type identification) of 24 does actually not exist. See midas.h:327, so this tells
me that your bank header got corrupted. Somewhere you write over your data.

Stefan
Entry  13 Nov 2024, Stefan Ritt, Info, New sequencer command ODBLOOKUP 
A new sequencer command "ODBLOOKUP" has been implemented, which does a lookup of a string in a string 
array in the ODB given by a path and returns its index as a number. If we have for example an array

/Examples/Names
   [0] Hello
   [1] Test
   [2] Other

and do a
 
ODBLOOKUP "/Examples/Names", "Test", index

we get a index equal 1.


/Stefan
    Reply  15 Nov 2024, Konstantin Olchanski, Info, New sequencer command ODBLOOKUP 
> A new sequencer command "ODBLOOKUP" has been implemented, which does a lookup of a string in a string 
> array in the ODB given by a path and returns its index as a number. If we have for example an array
> 
> /Examples/Names
>    [0] Hello
>    [1] Test
>    [2] Other
> 
> and do a
>  
> ODBLOOKUP "/Examples/Names", "Test", index
> 
> we get a index equal 1.
> 

"value not found" sets "index" to ?
"odb key not found" sets "index" to ?
link to documentation?

K.O.
       Reply  18 Nov 2024, Stefan Ritt, Info, New sequencer command ODBLOOKUP 
> "value not found" sets "index" to ?

It sets it actually to "not found". Since all variables are stings in the sequencer, you can then do a test like

ODBLOOKUP ..., index
if ($index == "not found")
  ...


> "odb key not found" sets "index" to ?

If the odb key is not found, the sequencer aborts.

> link to documentation?

The documentation is where it always has been: 

https://daq00.triumf.ca/MidasWiki/index.php/Sequencer#Sequencer_Commands

/Stefan
Entry  14 Nov 2024, Mann Gandhi, Suggestion, Issue with creating banks Screenshot_from_2024-11-14_12-35-06.png
Hello, I am a coop student working at SNOLAB. I am currently setting up a frontend 
program to collect data for an experiment I am currently having with my bank being 
initialized correctly with the correct name. I will attach an image of the error and 
a code snippet for clarity. This is a multi-thread program using ring buffers. The 
first thread  is only responsible for data collection of ADC values from the Red 
Pitaya (FPGA) and the second thread does a simple derivative calculation. The 
frontend makes use of the TCP connection to stream data from the Red Pitaya. 

Here is the code snippet. This is the only place in the frontend code where I 
initialize and create a bank to store the ADC values from the Red Pitaya. 

void* data_acquisition_thread(void* param)
{
	printf("Data acquisition thread started\n");
	// Obtain ring buffer for inter-thread data exchange
	EVENT_HEADER *pevent;
	WORD *pdata;
	int status;

	//Set a timeout for the recv function to prevent indefinite blocking
	struct timeval timeout;
	timeout.tv_sec = 10; //seconds
	timeout.tv_usec = 0; // 0 microseconds
	setsockopt(stream_sockfd, SOL_SOCKET, SO_RCVTIMEO, (char *)&timeout, 
sizeof(timeout));



	while (is_readout_thread_enabled())
	{

		if (!readout_enabled())
		{
			usleep(10); // do not produce events when run is stopped
			continue;
		}
		// Acquire a write pointer in the ring buffer
		int status;
		do {
			status = rb_get_wp(rbh, (void **) &pevent, 0);
			if (status == DB_TIMEOUT)
			{
				usleep(5);
				if (!is_readout_thread_enabled()) break;
			}
		} while (status != DB_SUCCESS);

		if (status != DB_SUCCESS) continue;

		// Lock mutex before accessing shared resources
		pthread_mutex_lock(&lock);

		// Buffer for incoming data
		//int16_t temp_buffer[4096] = {0};

		bm_compose_event_threadsafe(pevent, 1, 0, 0, 
&equipment[0].serial_number);
        pdata = (WORD *)(pevent + 1);  // Set pdata to point to the data section of 
the event

		// Initialize the bank and read data directly into the bank
        bk_init32(pevent);
        bk_create(pevent, "RPD0", TID_WORD, (void **)&pdata);

		int bytes_read = recv(stream_sockfd, pdata, max_event_size * 
sizeof(WORD), 0);
		printf("Data received: %d bytes\n", bytes_read);


		if (bytes_read <= 0)
		{
			if (bytes_read == 0)
			{
				printf("Red Pitaya disconnected\n");
				pthread_mutex_unlock(&lock);
				break;

			} else if (errno == EWOULDBLOCK || errno ==EAGAIN)
			{
				printf("Receive timeout\n");
				pthread_mutex_unlock(&lock);
				continue;
			}

			else
			{
				printf("Error reading from the Red Pitaya: %s\n", 
strerror(errno));
				pthread_mutex_unlock(&lock);
				continue;
			}

		}
		
		 // Adjust data pointers after reading
        pdata += bytes_read / sizeof(WORD);
        bk_close(pevent, pdata);

        pevent->data_size = bk_size(pevent);

		// Unlock mutex after writing to the buffer
		pthread_mutex_unlock(&lock);

		// Send event to ring buffer
		rb_increment_wp(rbh, sizeof(EVENT_HEADER) + pevent->data_size);
	}
	pthread_mutex_unlock(&lock);

	return NULL;
}
    Reply  14 Nov 2024, Stefan Ritt, Suggestion, Issue with creating banks 
All I can see is that your bank header gets corrupted along the way. The funny character reported by 
cm_write_event_to_odb indicates that your original name "RPD0" got overwritten somewhere, but I could not spot any 
mistake in your code. 

I would play around: change max_event_size, produce dummy data of size N instead of the recv() and so on. Also monitor 
the bank header to see when it gets overwritten. I guess you only write form one thread, so that should be safe, right?

Best,
Stefan
       Reply  14 Nov 2024, Mann Gandhi, Suggestion, Issue with creating banks 
> All I can see is that your bank header gets corrupted along the way. The funny character reported by 
> cm_write_event_to_odb indicates that your original name "RPD0" got overwritten somewhere, but I could not spot any 
> mistake in your code. 
> 
> I would play around: change max_event_size, produce dummy data of size N instead of the recv() and so on. Also monitor 
> the bank header to see when it gets overwritten. I guess you only write form one thread, so that should be safe, right?
> 
> Best,
> Stefan

Hello Stefan, 

Thank you for the advice. On inspection, I noticed that my event size (when I print bk_size(pevent)) is around 1.4 billion 
which seems absurd so I am not sure why this is the case as well. In addition, is mdump the way to monitor the bank header?
I just recently started using MIDAS so I am a little bit confused. I can attach a link to the github repository where I am 
currently working on this for further clarity since I am sure there is an issue in my code somewhere. 
(https://github.com/mgandhi-1/red-pitaya-frontend/blob/10-issue-with-bank-creation-neeed-to-figure-out-why-banks-are-not-
being-created-correctly/frontend.cxx)

I appreciate the help. Thank you once more.

Best, 
Mann
    Reply  15 Nov 2024, Konstantin Olchanski, Suggestion, Issue with creating banks 
> Hello, I am a coop student working at SNOLAB.
> void* data_acquisition_thread(void* param)
> {
> 	EVENT_HEADER *pevent;
>       if (complicated) {
> 			status = rb_get_wp(rbh, (void **) &pevent, 0);
>       }
>       bm_compose_event_threadsafe(pevent, 1, 0, 0, &equipment[0].serial_number);
> }

this code is buggy. it should read "EVENT_HEADER *pevent = NULL;" to avoid an uninitialized variable
and bm_compose_event() & co should be inside an "if (pevent != NULL)" block, unless you can absolutely
proove that rb_get_wp() is always called and pevent is never NULL. (even is somebody changes the code later).

if you build your code with "gcc -O2 -g -Wall -Wuninitialized" it would probably warn you about use of uninitilialized 
"pevent".

P.S. for building multithreaded frontends, you are much better off starting from the c++ tmfe frontend framework,
a good starting point is study tmfe_example_everything.cxx.

K.O.
Entry  07 Nov 2024, Lukas Gerritzen, Suggestion, Stop run and sequencer button 
Due to popular demand among our students, I added a button to the sequencer that stops the run and the sequence. If you find it useful, please consider merging this upstream.
$ git diff sequencer.html
diff --git a/resources/sequencer.html b/resources/sequencer.html
index e7f8a79d..95c7e3d8 100644
--- a/resources/sequencer.html
+++ b/resources/sequencer.html
@@ -115,6 +115,7 @@
               <img src="icons/play.svg" title="Start" class="seqbtn Stopped" onclick="startSeq();">
               <img src="icons/debug.svg" title="Debug" class="seqbtn Stopped" onclick="debugSeq();">
               <img src="icons/square.svg" title="Stop" class="seqbtn Running Paused" onclick="stopSeq();">
+              <img src="icons/x-octagon.svg" title="Stop Run and Sequencer immediately" class="seqbtn Running Paused" onclick="stopRunAndSeq();">
               <img src="icons/pause.svg" title="Pause" class="seqbtn Running" onclick="modbset('/Sequencer/Command/Pause script',true);">
               <img src="icons/resume.svg" title="Resume" class="seqbtn Paused" onclick="modbset('/Sequencer/Command/Resume script',true);">
               <img src="icons/step-over.svg" title="Step Over" class="seqbtn Running Paused" onclick="modbset('/Sequencer/Command/Step over',true);">
[gac-megj@pc13513 resources]$ git diff sequencer.js
diff --git a/resources/sequencer.js b/resources/sequencer.js
index cc5398ef..b75c926c 100644
--- a/resources/sequencer.js
+++ b/resources/sequencer.js
@@ -1582,6 +1582,23 @@ function stopSeq() {
    });
 }

+function stopRunAndSeq() {
+   const message = `Are you sure you want to stop the run and sequence?`;
+   dlgConfirm(message,function(resp) {
+      if (resp) {
+         modbset('/Sequencer/Command/Stop immediately',true);
+
+         mjsonrpc_call("cm_transition", {"transition": "TR_STOP"}).then(function (rpc) {
+            if (rpc.result.status !== 1) {
+               throw new Error("Cannot stop run, cm_transition() status " + rpc.result.status + ", see MIDAS messages");
+            }
+         }).catch(function (error) {
+            mjsonrpc_error_alert(error);
+         });
+      }
+   });
+}
+
 // Show or hide parameters table
 function showParTable(varContainer) {
    let e = document.getElementById(varContainer);
    Reply  07 Nov 2024, Stefan Ritt, Suggestion, Stop run and sequencer button 
I don't find this very useful. Some experiments do not only want to stop the run, but also do other cleanup things. To do that, I proposed and "atexit" function like C has it. Then the user can put a run stop there, plus any other cleanup. This will be much more flexible. Think about the "reset" script we have to manually run if we abort a sequencer. The atexit function will come next week, so you should consider to use it instead your additional button.

Stefan
ELOG V3.1.4-2e1708b5