ID |
Date |
Author |
Topic |
Subject |
2886
|
05 Nov 2024 |
Maia Henriksson-Ward | Forum | How to properly write a client listens for events on a given buffer? | > If there's some template for writing a client to access event data, that would be
> very useful (and you can probably just ignore the context I gave below in that
> case).
>
>
> Some context:
>
> Quite a while ago, I wrote the attached "data pipeline" client whose job was to
> listen for events, copy their data, and pipe them to a python script. I believe I
> just stole bits and pieces from mdump.cxx to accomplish this. Later I wrote the
> attached wrapper class "MidasConnector.cpp" and a main.cpp to generalize
> data_pipeline.cxx a bit. There were a lot of iterations to the code where I had the
> below problems; so don't take the logic in the attached code as the exact code that
> caused the issues below.
>
> However, I'm unable to resolve a couple issues:
>
> 1. If a timeout is set, everything will work until that timeout is reached. Then
> regardless of what kind of logic I tried to implement (retry receiving event,
> disconnect and reconnect client, etc.) the client would refuse to receive more data.
>
> 2. When I ctrl-C main, it hangs; this is expected because it's stuck in a while
> loop. But because I can't set a timeout I have to ctrl-C twice; this would
> occasionally corrupt the ODB which was not ideal. I was able to get around this with
> some impractical solution involving ncurses I believe.
>
>
> Thanks,
> Jack
midas/examples/lowlevel/consume.cxx might be what you're looking for, but I think all
you're missing is a call to cm_yield() in your loop, so your midas client doesn't get
killed when the timeout is reached (and also so you can act on shutdown requests from
midas)
Something like
int status = cm_yield(100);
if (status == SS_ABORT || status == RPC_SHUTDOWN)
break;
There might be a recommended way to handle the ctrl-c and disconnect from the ODB, but
off the top of my head I don't remember it.
Also check out Ben's new(ish) python library, midas/python/examples/event_receiver.py
might be a much easier solution. And you can use the context manager, which will take
care of safely disconnecting from midas after you ctrl-C. |
2885
|
05 Nov 2024 |
Jack Carlton | Forum | How to properly write a client listens for events on a given buffer? | If there's some template for writing a client to access event data, that would be
very useful (and you can probably just ignore the context I gave below in that
case).
Some context:
Quite a while ago, I wrote the attached "data pipeline" client whose job was to
listen for events, copy their data, and pipe them to a python script. I believe I
just stole bits and pieces from mdump.cxx to accomplish this. Later I wrote the
attached wrapper class "MidasConnector.cpp" and a main.cpp to generalize
data_pipeline.cxx a bit. There were a lot of iterations to the code where I had the
below problems; so don't take the logic in the attached code as the exact code that
caused the issues below.
However, I'm unable to resolve a couple issues:
1. If a timeout is set, everything will work until that timeout is reached. Then
regardless of what kind of logic I tried to implement (retry receiving event,
disconnect and reconnect client, etc.) the client would refuse to receive more data.
2. When I ctrl-C main, it hangs; this is expected because it's stuck in a while
loop. But because I can't set a timeout I have to ctrl-C twice; this would
occasionally corrupt the ODB which was not ideal. I was able to get around this with
some impractical solution involving ncurses I believe.
Thanks,
Jack |
Attachment 1: data_pipeline_(2).cxx
|
#include "midas.h"
#include "msystem.h"
#include "mrpc.h"
#include "mdsupport.h"
#include <iostream>
#include <unistd.h>
#include <stdio.h> // Added for popen
#include <stdlib.h> // Added for malloc and free
INT hBufEvent;
void process_event(EVENT_HEADER *pheader) {
printf("Received event #%d\n", pheader->serial_number);
printf("Event ID: %d\n", pheader->event_id);
printf("Data Size: %d bytes\n", pheader->data_size);
printf("Timestamp: %d\n", pheader->time_stamp);
printf("Trigger mask: %d\n", pheader->trigger_mask);
// Print a marker to indicate the start of serialized data
printf("EVENT_DATA_START\n");
// Serialize and print the event data
int* eventData = (int*)((char*)pheader + sizeof(EVENT_HEADER));
int numIntegers = (pheader->data_size - sizeof(EVENT_HEADER)) / sizeof(int);
for (int i = 0; i < 8; ++i) {
printf("%d ", eventData[i]);
}
printf("\n");
// Process the event here
}
int main() {
HNDLE hDB, hKey;
char host_name[HOST_NAME_LENGTH], expt_name[NAME_LENGTH], str[80];
char buf_name[32] = EVENT_BUFFER_NAME, rep_file[128];
unsigned int status, start_time, stop_time;
INT ch, request_id, size, get_flag, action, single, i;
// Define the maximum event size you expect to receive
INT max_event_size = 4000;
// Allocate memory for storing event data dynamically
void* event_data = malloc(max_event_size);
printf("1\n");
/* Get if existing the pre-defined experiment */
cm_get_environment(host_name, sizeof(host_name), expt_name, sizeof(expt_name));
// Print host_name
printf("host_name = %s\n", host_name);
// Print expt_name
printf("expt_name = %s\n", expt_name);
printf("2\n");
/* connect to the experiment */
status = cm_connect_experiment(host_name, expt_name, "data_pipeline", 0);
if (status != CM_SUCCESS) {
return 1;
}
printf("3\n");
status = bm_open_buffer(buf_name, DEFAULT_BUFFER_SIZE, &hBufEvent);
if (status != BM_SUCCESS && status != BM_CREATED) {
cm_msg(MERROR, "data_pipeline", "Cannot open buffer \"%s\", bm_open_buffer() status %d", buf_name, status);
return 1;
}
printf("4\n");
/* set the buffer cache size if requested */
bm_set_cache_size(hBufEvent, 100000, 0);
printf("5\n");
/* place a request for a specific event id */
status = bm_request_event(hBufEvent, EVENTID_ALL, TRIGGER_ALL, GET_ALL, &request_id, NULL); // Use NULL as the callback routine
printf("6\n");
printf("status = %d\n",status);
// Open a pipe to a Python script for data transfer
FILE* pipe = popen("python3 data_pipeline.py", "w");
if (pipe == NULL) {
perror("popen");
return 1;
}
// Enter the event processing loop
while (1) {
// Use the address of max_event_size in bm_receive_event
status = bm_receive_event(hBufEvent, event_data, &max_event_size, BM_WAIT); // Wait for new data indefinitely
if (status == BM_SUCCESS) {
//process_event((EVENT_HEADER*)((char*)event_data + sizeof(EVENT_HEADER)));
// Send the event data to the Python script via the pipe
fprintf(pipe, "EVENT_DATA_START\n");
int* eventData = (int*)((char*)event_data + sizeof(EVENT_HEADER));
int numIntegers = (max_event_size - sizeof(EVENT_HEADER)) / sizeof(int);
for (int i = 4; i < 12; ++i) {
fprintf(pipe, "%d ", eventData[i]);
}
fprintf(pipe, "\n");
fflush(pipe); // Flush the buffer to ensure data is sent immediately
} else {
printf("Error receiving event: %d\n", status);
break; // Exit the loop if an error occurs
}
}
// Close the pipe
pclose(pipe);
// Free the dynamically allocated memory
free(event_data);
cm_disconnect_experiment();
printf("7\n");
return 1;
}
|
Attachment 2: MidasConnector.cpp
|
#include "MidasConnector.h"
MidasConnector::MidasConnector(const char* clientName) {
// Initialize client name
strncpy(client_name_, clientName, NAME_LENGTH);
// Get host name and experiment name from environment
cm_get_environment(host_name_, sizeof(host_name_), experiment_name_, sizeof(experiment_name_));
// Initialize other private variables if needed
event_id = EVENTID_ALL; // Initialize with default value
trigger_mask = TRIGGER_ALL; // Initialize with default value
sampling_type = GET_ALL; // Initialize with default value (renamed from get_flags)
buffer_size = DEFAULT_BUFFER_SIZE; // Initialize with default value
timeout_millis = BM_WAIT;
strncpy(buffer_name, EVENT_BUFFER_NAME, sizeof(buffer_name)); // Initialize with default value
}
// Getters for the private variables
short MidasConnector::getEventId() const {
return event_id;
}
short MidasConnector::getTriggerMask() const {
return trigger_mask;
}
int MidasConnector::getSamplingType() const {
return sampling_type;
}
int MidasConnector::getBufferSize() const {
return buffer_size;
}
const char* MidasConnector::getBufferName() const {
return buffer_name;
}
int MidasConnector::getTimeout() const {
return timeout_millis;
}
HNDLE MidasConnector::getEventBufferHandle() const {
return hBufEvent;
}
// Setters for the private variables
void MidasConnector::setEventId(short eventId) {
event_id = eventId;
}
void MidasConnector::setTriggerMask(short triggerMask) {
trigger_mask = triggerMask;
}
void MidasConnector::setSamplingType(int samplingType) {
sampling_type = samplingType;
}
void MidasConnector::setBufferSize(int bufferSize) {
buffer_size = bufferSize;
}
void MidasConnector::setBufferName(const char* bufferName) {
strncpy(buffer_name, bufferName, sizeof(buffer_name));
}
void MidasConnector::setTimeout(int timeoutMillis) {
timeout_millis = timeoutMillis;
}
void MidasConnector::setEventBufferHandle(HNDLE eventBufferHandle) {
hBufEvent = eventBufferHandle;
}
bool MidasConnector::ConnectToExperiment() {
// Connect to the experiment
int status = cm_connect_experiment(host_name_, experiment_name_, client_name_, NULL);
if (status != CM_SUCCESS) {
// Handle connection error
return false;
}
return true;
}
void MidasConnector::DisconnectFromExperiment() {
// Disconnect from the experiment
cm_disconnect_experiment();
}
bool MidasConnector::OpenEventBuffer() {
int status = bm_open_buffer(buffer_name, buffer_size, &hBufEvent);
if (status != BM_SUCCESS && status != BM_CREATED) {
cm_msg(MERROR, client_name_, "Cannot open buffer \"%s\", bm_open_buffer() status %d", buffer_name, status);
return false;
}
return true;
}
bool MidasConnector::SetCacheSize(int cacheSize) {
bm_set_cache_size(hBufEvent, cacheSize, 0);
return true;
}
bool MidasConnector::RequestEvent() {
int request_id;
int status = bm_request_event(hBufEvent, event_id, trigger_mask, sampling_type, &request_id, NULL);
return status == BM_SUCCESS;
}
bool MidasConnector::ReceiveEvent(void* eventBuffer, int& maxEventSize) {
int status = bm_receive_event(hBufEvent, eventBuffer, &maxEventSize, timeout_millis);
return status == BM_SUCCESS;
}
|
Attachment 3: main.cpp
|
#include "event_processor/EventProcessor.h"
#include "data_transmitter/DataTransmitter.h"
#include "midas_connector/MidasConnector.h"
#include "json.hpp"
#include <fstream>
INT hBufEvent1;
INT hBufEvent2;
// Function to initialize MIDAS and open an event buffer
bool initializeMidas(MidasConnector& midasConnector, const nlohmann::json& config) {
// Set the MidasConnector properties based on the config
midasConnector.setEventId(config["eventId"].get<short>());
midasConnector.setTriggerMask(config["triggerMask"].get<short>());
midasConnector.setSamplingType(config["samplingType"].get<int>());
midasConnector.setBufferSize(config["bufferSize"].get<int>());
midasConnector.setBufferName(config["bufferName"].get<std::string>().c_str());
midasConnector.setBufferSize(config["bufferSize"].get<int>());
// Call the ConnectToExperiment method
if (!midasConnector.ConnectToExperiment()) {
return false;
}
// Call the OpenEventBuffer method
if (!midasConnector.OpenEventBuffer()) {
return false;
}
// Set the buffer cache size if requested
midasConnector.SetCacheSize(config["cacheSize"].get<int>());
// Place a request for a specific event id
if (!midasConnector.RequestEvent()) {
return false;
}
return true;
}
int main() {
// Read configuration from a JSON file
nlohmann::json config;
std::ifstream configFile("config.json");
configFile >> config;
configFile.close();
// Initialize MidasConnector and connect to the MIDAS experiment
MidasConnector midasConnector(config["clientName"].get<std::string>().c_str());
if (!initializeMidas(midasConnector, config)) {
printf("Error: Failed to initialize MIDAS.\n");
return 1;
}
// Read the maximum event size from the JSON configuration
INT max_event_size = config["maxEventSize"].get<int>();
// Allocate memory for storing event data dynamically
void* event_data = malloc(max_event_size);
// Initialize EventProcessor with detector mapping file and verbosity flag
EventProcessor eventProcessor(config["detectorMappingFile"].get<std::string>(), config["verbose"].get<bool>());
// Initialize DataTransmitter with the ZeroMQ address
DataTransmitter dataPublisher(config["zmqAddress"].get<std::string>());
// Connect to the ZeroMQ server
if (!dataPublisher.bind()) {
// Handle connection error
printf("Error: Failed to bind to port %s.\n", config["zmqAddress"].get<std::string>().c_str());
return 1;
} else {
printf("Connected to the ZeroMQ server.\n");
}
// Event processing loop
while (true) {
midasConnector.ReceiveEvent(event_data, max_event_size);
//Prcoess data once we have it
eventProcessor.processEvent(event_data, max_event_size);
// Serialize the event data with EventProcessor and store it in serializedData
std::string serializedData = eventProcessor.getSerializedData();
// Send the serialized data to the ZeroMQ server with DataTransmitter
if (!dataPublisher.publish(serializedData)) {
// Handle send error
printf("Error: Failed to send serialized data.\n");
}
}
// Cleanup and finalize your application
midasConnector.DisconnectFromExperiment(); // Disconnect from the MIDAS experiment
return 0;
}
|
2884
|
28 Oct 2024 |
Amy Roberts | Bug Report | Difficulty running MIDAS on Rocky 9.4 | > Now for each timeout it will print detailed syscall and timing information, if time goes backwards, it should catch it.
It appears that time is moving forward:
[aroberts@sdfcdmsdaq build]$ odbedit
[ODBEdit,ERROR] [odb.cxx:2043:db_open_database,ERROR] Removed ODB client 'ODBEdit', index 0 because process pid 1617119 does
not exists
[ODBEdit,INFO] Removed open record flag from "/Experiment/Security/RPC hosts/Allowed hosts"
[ODBEdit,INFO] Removed exclusive access mode from "/Experiment/Security/RPC hosts/Allowed hosts"
[ODBEdit,INFO] Corrected 1 ODB entries
[ODBEdit,INFO] Deleted entry '/System/Clients/1617119' for client 'ODBEdit' because it is not connected to ODB
[ODBEdit,INFO] Client 'ODBEdit' on buffer 'SYSMSG' removed by bm_open_buffer because process pid 1617119 does not exist
[local:amy_test:S]/>ss_semaphore_wait_for: semop/semtimedop(5) returned -1, errno 11 (Resource temporarily unavailable),
start time 0xd4fd98f6, now 0xd4fdc0ef, dt 0x000027f9, timeout 0x00002710 ms, SEMAPHORE TIMEOUT!
[ODBEdit,ERROR] [odb.cxx:2489:db_lock_database,ERROR] cannot lock ODB semaphore, timeout 10000 ms, aborting...
Aborted (core dumped) |
Draft
|
28 Oct 2024 |
Lukas Gerritzen | Bug Report | Visual glitch in history system | |
Attachment 1: Screenshot_2024-10-28_at_17.34.26.png
|
|
2882
|
28 Oct 2024 |
Lukas Gerritzen | Bug Report | Visual glitch in history system | Today, I encountered the bug shown in the attached video. The value of the plotted curve does not match the mouseover number.
When trying to understand it better, I stopped being able to replicate. Has anyone else observed a similar problem? |
Attachment 1: Screen_Recording_2024-10-28_at_17.23.57.mov
|
Attachment 2: Screenshot_2024-10-28_at_17.29.34.png
|
|
2881
|
23 Oct 2024 |
Lukas Gerritzen | Bug Report | ODB key picker does not close when creating link / Edit-on-run string box too large | To reproduce:
In the interactive ODB, click the 🔗 icon to create a link. Next to the target, click the "..." button to open
the key picker browser. Then try to close it by either:
- Selecting a key and clicking ok
- Clicking "cancel"
- Clicking the red circle at the top left
Expected result:
The key picker closes
Actual result:
The key picker does not close.
Depending on how you trying to close the picker, the error messages in the debug console differ slightly.
On the red circle:
Uncaught TypeError: dlg is null
dlgClose http://localhost:8080/controls.js:791
onclick http://localhost:8080/?cmd=ODB&odb_path=/Test:1
On "ok" or "cancel":
Uncaught TypeError: dlg is null
dlgMessageDestroy http://localhost:8080/controls.js:828
pickerButton http://localhost:8080/odbbrowser.js:453
onclick http://localhost:8080/?cmd=ODB&odb_path=/Test:1
Another more minor visual problem is the edit-on-start dialog. There seems to be no upper bound to the
size of the text box. In the attached screenshot, ShortString has a maximum length of 32 characters,
LongString has 255. Both are empty at the time of the screenshot. Maybe, the size should be limited to a
reasonable width. |
Attachment 1: Screenshot_2024-10-23_at_11.38.38.png
|
|
2880
|
18 Oct 2024 |
Konstantin Olchanski | Bug Report | Difficulty running MIDAS on Rocky 9.4 | > suddenly, an ODB semaphore timeout...
I am wondering if something bizarre is going on, like the system clock going backwards. I heard of things like that
happening in virtual environments.
https://stackoverflow.com/questions/4801122/how-to-stop-time-from-running-backwards-on-linux
I added some debugging information to the semaphore locking code. Please update to commit
eb625af119067f6d702211542d88a28ccb57ad2c of src/system.cxx (plus small change in include/msystem.h) and try again.
Now for each timeout it will print detailed syscall and timing information, if time goes backwards, it should catch it.
K.O. |
2879
|
18 Oct 2024 |
Konstantin Olchanski | Bug Report | Difficulty running MIDAS on Rocky 9.4 | > [aroberts@sdfcdmsdaq midas]$ odbedit
> [ODBEdit,ERROR] [odb.cxx:2043:db_open_database,ERROR] Removed ODB client 'ODBEdit', index 0 because process pid
> 1615051 does not exists
> [ODBEdit,INFO] Removed open record flag from "/Experiment/Security/RPC hosts/Allowed hosts"
> [ODBEdit,INFO] Removed exclusive access mode from "/Experiment/Security/RPC hosts/Allowed hosts"
> [ODBEdit,INFO] Corrected 1 ODB entries
> [ODBEdit,INFO] Deleted entry '/System/Clients/1615051' for client 'ODBEdit' because it is not connected to ODB
> [ODBEdit,INFO] Client 'ODBEdit' on buffer 'SYSMSG' removed by bm_open_buffer because process pid 1615051 does not
> exist
so far, so good, we connected to ODB (lock was not stuck), cleared out client "odbedit" with pid 1615051 that crashed
without properly disconnecting. ODB semaphore is working correctly.
> [ODBEdit,ERROR] [odb.cxx:2489:db_lock_database,ERROR] cannot lock ODB semaphore, timeout 10000 ms, aborting...
> Aborted (core dumped)
suddenly, an ODB semaphore timeout...
can you post the stack trace from this core dump? I am pretty sure it will be boring, but just in case...
K.O. |
2878
|
18 Oct 2024 |
Stefan Ritt | Bug Report | Frontend name must differ from others by more than the last three characters | Fixed and committed.
Best,
Stefan
> Hi Denis,
>
> indeed a bug. Will fix it next week.
>
> Best,
> Stefan
>
>
> > Hi,
> > I have developed two Midas front-end programs for different hardware. The frontend_name of the first one is "FSCD_SC" (slow control) and that of the second one is "FSCD_PS" (power supply).
> >
> > Each front-end program runs fine separately, but when attempting to start FSCD_SC while FSCD_PS is running, FSCD_PS is terminated and Midas indicates "Previous frontend stopped" in the window where it starts FSCD_SC.
> >
> > The problem is that these two frontend names only differ in their last two characters, and Midas currently does not distinguish them properly. |
2877
|
16 Oct 2024 |
Amy Roberts | Bug Report | Difficulty running MIDAS on Rocky 9.4 | > Thank you for the stack trace, I fixed the buglet that cause midas programs to crash twice,
> once on failure to lock ODB, then call exit() -> atexit() handlers -> cm_check_connect() -> crash on ODB lock
> failure is the cm_msg() codes.
>
> Replaced exit(1) with abort(). Could have used kill(getpid(),SIGKILL) to avoid making a core dump, but what the
> heck...
>
> Of course this does nothing to the original bug where ODB was locked and nobody will ever unlock it (reboot will
> unlock it!).
>
> commit bdd1d7fdc093b5a8d54a1b8467002bb3cac3ac11
>
> K.O.
>
>
> > > > I've uploaded the current core dump at: https://gitlab.com/det-lab/coredumps#.
> > >
> > > I cannot read the core dump without the corresponding executable (and likely all it's shared libraries).
> > >
> > > It is best if you run gdb and extract the stack traces on your end.
> > >
> > > In case you are not familiar with gdb:
> > >
> > > gdb mserver core # start gdb
> > > bt # stack trace of crashed thread
> > > info thr # get list of threads
> > > thr 1
> > > bt
> > > thr 2
> > > bt
> > > # etc, get stack trace of each thread, there should not be too many of them
> > >
> > > K.O.
> >
> > Hi Konstantin, thanks for the instructions. I do appear to be missing some debug symbols, but the output
> > looks potentially useful:
> >
> > [lekhraj@sdfcdmsdaq ~]$ gdb mserver
> > core.mserver.17468.b174bb74f2bb44f9a0905e78ec6b2677.601715.1728422354000000
> > GNU gdb (GDB) Rocky Linux 10.2-11.1.el9_3
> > ...
> > For help, type "help".
> > Type "apropos word" to search for commands related to "word"...
> > Reading symbols from mserver...
> > [New LWP 601715]
> >
> > warning: Section `.reg-xstate/601715' in core file too small.
> > [Thread debugging using libthread_db enabled]
> > Using host libthread_db library "/lib64/libthread_db.so.1".
> > Core was generated by `mserver'.
> > Program terminated with signal SIGABRT, Aborted.
> >
> > warning: Section `.reg-xstate/601715' in core file too small.
> > #0 0x00007fbdeaca154c in __pthread_kill_implementation () from /lib64/libc.so.6
> > Missing separate debuginfos, use: dnf debuginfo-install glibc-2.34-83.el9.12.x86_64 libgcc-11.4.1-
> > 3.el9.x86_64 libstdc++-11.4.1-2.1.el9.x86_64 libzstd-1.5.1-2.el9.x86_64 mysql-libs-8.0.36-1.el9_3.x86_64
> > openssl-libs-3.0.7-25.el9_3.x86_64 zlib-1.2.11-40.el9.x86_64
> > (gdb)
> > (gdb) bt
> > #0 0x00007fbdeaca154c in __pthread_kill_implementation () from /lib64/libc.so.6
> > #1 0x00007fbdeac54d06 in raise () from /lib64/libc.so.6
> > #2 0x00007fbdeac287f3 in abort () from /lib64/libc.so.6
> > #3 0x0000000000430ee4 in db_lock_database (hDB=hDB@entry=1)
> > at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/odb.cxx:2473
> > #4 0x0000000000437e9c in db_find_key (subhKey=0x7ffcc536d348, key_name=0x4687a8 "/Logger/Message file date
> > format",
> > hKey=0, hDB=1) at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/odb.cxx:4099
> > #5 db_find_key (hDB=1, hKey=0, key_name=0x4687a8 "/Logger/Message file date format",
> > subhKey=0x7ffcc536d348)
> > at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/odb.cxx:4075
> > #6 0x0000000000448297 in db_get_value_string (hdb=1, hKeyRoot=hKeyRoot@entry=0,
> > key_name=key_name@entry=0x4687a8 "/Logger/Message file date format", index=index@entry=0,
> > s=s@entry=0x7ffcc536d470, create=create@entry=1, create_string_length=0)
> > at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/odb.cxx:13950
> > #7 0x000000000040a690 in cm_msg_get_logfile (fac=<optimized out>, t=<optimized out>,
> > filename=0x7ffcc536d690,
> > linkname=0x7ffcc536d6b0, linktarget=0x7ffcc536d6d0)
> > at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/midas.cxx:573
> > #8 0x000000000041a307 in cm_msg_log (message_type=1, facility=0x46db0e "midas",
> > message=0x7e4290 "[mserver,ERROR] [odb.cxx:2498:db_lock_database,ERROR] cannot lock ODB semaphore,
> > timeout 10000 ms, exiting...") at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/midas.cxx:685
> > #9 0x0000000000421fcd in cm_msg_flush_buffer () at /usr/include/c++/11/bits/basic_string.h:194
> > #10 0x00007fbdeac574dd in __run_exit_handlers () from /lib64/libc.so.6
> > #11 0x00007fbdeac57620 in exit () from /lib64/libc.so.6
> > #12 0x0000000000430f7a in db_lock_database (hDB=hDB@entry=1)
> > at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/odb.cxx:2499
> > #13 0x0000000000437e9c in db_find_key (subhKey=0x7ffcc536da04, key_name=0x476a21 "/Alarms/Alarms", hKey=0,
> > hDB=1)
> > at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/odb.cxx:4099
> > #14 db_find_key (hDB=1, hKey=hKey@entry=0, key_name=key_name@entry=0x476a21 "/Alarms/Alarms",
> > subhKey=subhKey@entry=0x7ffcc536da04) at
> > /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/odb.cxx:4075
> > #15 0x0000000000455fd2 in al_check () at
> > /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/alarm.cxx:614
> > --Type <RET> for more, q to quit, c to continue without paging--
> > #16 0x000000000041ff85 in cm_periodic_tasks ()
> > at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/midas.cxx:5596
> > #17 0x00000000004235c5 in cm_yield (millisec=millisec@entry=1000)
> > at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/midas.cxx:5676
> > #18 0x00000000004065c2 in main (argc=<optimized out>, argv=0x7ffcc536e628)
> > at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/progs/mserver.cxx:295
> > (gdb) info thr
> > Id Target Id Frame
> > * 1 Thread 0x7fbdec0b1740 (LWP 601715) 0x00007fbdeaca154c in __pthread_kill_implementation () from
> > /lib64/libc.so.6
> > (gdb) thr 1
> > [Switching to thread 1 (Thread 0x7fbdec0b1740 (LWP 601715))]
> > #0 0x00007fbdeaca154c in __pthread_kill_implementation () from /lib64/libc.so.6
> > (gdb) bt
> > #0 0x00007fbdeaca154c in __pthread_kill_implementation () from /lib64/libc.so.6
> > #1 0x00007fbdeac54d06 in raise () from /lib64/libc.so.6
> > #2 0x00007fbdeac287f3 in abort () from /lib64/libc.so.6
> > #3 0x0000000000430ee4 in db_lock_database (hDB=hDB@entry=1)
> > at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/odb.cxx:2473
> > #4 0x0000000000437e9c in db_find_key (subhKey=0x7ffcc536d348, key_name=0x4687a8 "/Logger/Message file date
> > format",
> > hKey=0, hDB=1) at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/odb.cxx:4099
> > #5 db_find_key (hDB=1, hKey=0, key_name=0x4687a8 "/Logger/Message file date format",
> > subhKey=0x7ffcc536d348)
> > at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/odb.cxx:4075
> > #6 0x0000000000448297 in db_get_value_string (hdb=1, hKeyRoot=hKeyRoot@entry=0,
> > key_name=key_name@entry=0x4687a8 "/Logger/Message file date format", index=index@entry=0,
> > s=s@entry=0x7ffcc536d470, create=create@entry=1, create_string_length=0)
> > at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/odb.cxx:13950
> > #7 0x000000000040a690 in cm_msg_get_logfile (fac=<optimized out>, t=<optimized out>,
> > filename=0x7ffcc536d690,
> > linkname=0x7ffcc536d6b0, linktarget=0x7ffcc536d6d0)
> > at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/midas.cxx:573
> > #8 0x000000000041a307 in cm_msg_log (message_type=1, facility=0x46db0e "midas",
> > message=0x7e4290 "[mserver,ERROR] [odb.cxx:2498:db_lock_database,ERROR] cannot lock ODB semaphore,
> > timeout 10000 ms, exiting...") at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/midas.cxx:685
> > #9 0x0000000000421fcd in cm_msg_flush_buffer () at /usr/include/c++/11/bits/basic_string.h:194
> > #10 0x00007fbdeac574dd in __run_exit_handlers () from /lib64/libc.so.6
> > #11 0x00007fbdeac57620 in exit () from /lib64/libc.so.6
> > #12 0x0000000000430f7a in db_lock_database (hDB=hDB@entry=1)
> > at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/odb.cxx:2499
> > #13 0x0000000000437e9c in db_find_key (subhKey=0x7ffcc536da04, key_name=0x476a21 "/Alarms/Alarms", hKey=0,
> > hDB=1)
> > at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/odb.cxx:4099
> > #14 db_find_key (hDB=1, hKey=hKey@entry=0, key_name=key_name@entry=0x476a21 "/Alarms/Alarms",
> > subhKey=subhKey@entry=0x7ffcc536da04) at
> > /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/odb.cxx:4075
> > #15 0x0000000000455fd2 in al_check () at
> > /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/alarm.cxx:614
> > #16 0x000000000041ff85 in cm_periodic_tasks ()
> > at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/midas.cxx:5596
> > #17 0x00000000004235c5 in cm_yield (millisec=millisec@entry=1000)
> > at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/midas.cxx:5676
> > #18 0x00000000004065c2 in main (argc=<optimized out>, argv=0x7ffcc536e628)
> > at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/progs/mserver.cxx:295
> > (gdb)
I checked out the modified version of Midas and recompiled, and am still getting a similar error when I try to run
odbedit:
[aroberts@sdfcdmsdaq midas]$ odbedit
[ODBEdit,ERROR] [odb.cxx:2043:db_open_database,ERROR] Removed ODB client 'ODBEdit', index 0 because process pid
1615051 does not exists
[ODBEdit,INFO] Removed open record flag from "/Experiment/Security/RPC hosts/Allowed hosts"
[ODBEdit,INFO] Removed exclusive access mode from "/Experiment/Security/RPC hosts/Allowed hosts"
[ODBEdit,INFO] Corrected 1 ODB entries
[ODBEdit,INFO] Deleted entry '/System/Clients/1615051' for client 'ODBEdit' because it is not connected to ODB
[ODBEdit,INFO] Client 'ODBEdit' on buffer 'SYSMSG' removed by bm_open_buffer because process pid 1615051 does not
exist
[ODBEdit,ERROR] [odb.cxx:2489:db_lock_database,ERROR] cannot lock ODB semaphore, timeout 10000 ms, aborting...
Aborted (core dumped)
I'm not sure what's causing the call to lock the database, all I'm doing is typing "odbedit" in the command prompt.
I should add that I followed the instructions for unlocking the ODB, but when I call "odbedit" this error still
appears:
[aroberts@sdfcdmsdaq midas]$ ipcs -s -t
------ Semaphore Operation/Change Times --------
semid owner last-op last-changed
4 aroberts Wed Oct 16 11:44:10 2024 Wed Oct 16 11:44:00 2024
[aroberts@sdfcdmsdaq midas]$ ipcrm sem 4
resource(s) deleted
[aroberts@sdfcdmsdaq midas]$ odbedit
[ODBEdit,ERROR] [odb.cxx:2043:db_open_database,ERROR] Removed ODB client 'ODBEdit', index 0 because process pid
1617050 does not exists
[ODBEdit,INFO] Removed open record flag from "/Experiment/Security/RPC hosts/Allowed hosts"
[ODBEdit,INFO] Removed exclusive access mode from "/Experiment/Security/RPC hosts/Allowed hosts"
[ODBEdit,INFO] Corrected 1 ODB entries
[ODBEdit,INFO] Deleted entry '/System/Clients/1617050' for client 'ODBEdit' because it is not connected to ODB
[ODBEdit,INFO] Client 'ODBEdit' on buffer 'SYSMSG' removed by bm_open_buffer because process pid 1617050 does not
exist
[ODBEdit,ERROR] [odb.cxx:2489:db_lock_database,ERROR] cannot lock ODB semaphore, timeout 10000 ms, aborting...
Aborted (core dumped) |
2876
|
13 Oct 2024 |
Konstantin Olchanski | Bug Report | Difficulty running MIDAS on Rocky 9.4 | Thank you for the stack trace, I fixed the buglet that cause midas programs to crash twice,
once on failure to lock ODB, then call exit() -> atexit() handlers -> cm_check_connect() -> crash on ODB lock
failure is the cm_msg() codes.
Replaced exit(1) with abort(). Could have used kill(getpid(),SIGKILL) to avoid making a core dump, but what the
heck...
Of course this does nothing to the original bug where ODB was locked and nobody will ever unlock it (reboot will
unlock it!).
commit bdd1d7fdc093b5a8d54a1b8467002bb3cac3ac11
K.O.
> > > I've uploaded the current core dump at: https://gitlab.com/det-lab/coredumps#.
> >
> > I cannot read the core dump without the corresponding executable (and likely all it's shared libraries).
> >
> > It is best if you run gdb and extract the stack traces on your end.
> >
> > In case you are not familiar with gdb:
> >
> > gdb mserver core # start gdb
> > bt # stack trace of crashed thread
> > info thr # get list of threads
> > thr 1
> > bt
> > thr 2
> > bt
> > # etc, get stack trace of each thread, there should not be too many of them
> >
> > K.O.
>
> Hi Konstantin, thanks for the instructions. I do appear to be missing some debug symbols, but the output
> looks potentially useful:
>
> [lekhraj@sdfcdmsdaq ~]$ gdb mserver
> core.mserver.17468.b174bb74f2bb44f9a0905e78ec6b2677.601715.1728422354000000
> GNU gdb (GDB) Rocky Linux 10.2-11.1.el9_3
> ...
> For help, type "help".
> Type "apropos word" to search for commands related to "word"...
> Reading symbols from mserver...
> [New LWP 601715]
>
> warning: Section `.reg-xstate/601715' in core file too small.
> [Thread debugging using libthread_db enabled]
> Using host libthread_db library "/lib64/libthread_db.so.1".
> Core was generated by `mserver'.
> Program terminated with signal SIGABRT, Aborted.
>
> warning: Section `.reg-xstate/601715' in core file too small.
> #0 0x00007fbdeaca154c in __pthread_kill_implementation () from /lib64/libc.so.6
> Missing separate debuginfos, use: dnf debuginfo-install glibc-2.34-83.el9.12.x86_64 libgcc-11.4.1-
> 3.el9.x86_64 libstdc++-11.4.1-2.1.el9.x86_64 libzstd-1.5.1-2.el9.x86_64 mysql-libs-8.0.36-1.el9_3.x86_64
> openssl-libs-3.0.7-25.el9_3.x86_64 zlib-1.2.11-40.el9.x86_64
> (gdb)
> (gdb) bt
> #0 0x00007fbdeaca154c in __pthread_kill_implementation () from /lib64/libc.so.6
> #1 0x00007fbdeac54d06 in raise () from /lib64/libc.so.6
> #2 0x00007fbdeac287f3 in abort () from /lib64/libc.so.6
> #3 0x0000000000430ee4 in db_lock_database (hDB=hDB@entry=1)
> at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/odb.cxx:2473
> #4 0x0000000000437e9c in db_find_key (subhKey=0x7ffcc536d348, key_name=0x4687a8 "/Logger/Message file date
> format",
> hKey=0, hDB=1) at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/odb.cxx:4099
> #5 db_find_key (hDB=1, hKey=0, key_name=0x4687a8 "/Logger/Message file date format",
> subhKey=0x7ffcc536d348)
> at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/odb.cxx:4075
> #6 0x0000000000448297 in db_get_value_string (hdb=1, hKeyRoot=hKeyRoot@entry=0,
> key_name=key_name@entry=0x4687a8 "/Logger/Message file date format", index=index@entry=0,
> s=s@entry=0x7ffcc536d470, create=create@entry=1, create_string_length=0)
> at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/odb.cxx:13950
> #7 0x000000000040a690 in cm_msg_get_logfile (fac=<optimized out>, t=<optimized out>,
> filename=0x7ffcc536d690,
> linkname=0x7ffcc536d6b0, linktarget=0x7ffcc536d6d0)
> at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/midas.cxx:573
> #8 0x000000000041a307 in cm_msg_log (message_type=1, facility=0x46db0e "midas",
> message=0x7e4290 "[mserver,ERROR] [odb.cxx:2498:db_lock_database,ERROR] cannot lock ODB semaphore,
> timeout 10000 ms, exiting...") at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/midas.cxx:685
> #9 0x0000000000421fcd in cm_msg_flush_buffer () at /usr/include/c++/11/bits/basic_string.h:194
> #10 0x00007fbdeac574dd in __run_exit_handlers () from /lib64/libc.so.6
> #11 0x00007fbdeac57620 in exit () from /lib64/libc.so.6
> #12 0x0000000000430f7a in db_lock_database (hDB=hDB@entry=1)
> at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/odb.cxx:2499
> #13 0x0000000000437e9c in db_find_key (subhKey=0x7ffcc536da04, key_name=0x476a21 "/Alarms/Alarms", hKey=0,
> hDB=1)
> at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/odb.cxx:4099
> #14 db_find_key (hDB=1, hKey=hKey@entry=0, key_name=key_name@entry=0x476a21 "/Alarms/Alarms",
> subhKey=subhKey@entry=0x7ffcc536da04) at
> /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/odb.cxx:4075
> #15 0x0000000000455fd2 in al_check () at
> /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/alarm.cxx:614
> --Type <RET> for more, q to quit, c to continue without paging--
> #16 0x000000000041ff85 in cm_periodic_tasks ()
> at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/midas.cxx:5596
> #17 0x00000000004235c5 in cm_yield (millisec=millisec@entry=1000)
> at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/midas.cxx:5676
> #18 0x00000000004065c2 in main (argc=<optimized out>, argv=0x7ffcc536e628)
> at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/progs/mserver.cxx:295
> (gdb) info thr
> Id Target Id Frame
> * 1 Thread 0x7fbdec0b1740 (LWP 601715) 0x00007fbdeaca154c in __pthread_kill_implementation () from
> /lib64/libc.so.6
> (gdb) thr 1
> [Switching to thread 1 (Thread 0x7fbdec0b1740 (LWP 601715))]
> #0 0x00007fbdeaca154c in __pthread_kill_implementation () from /lib64/libc.so.6
> (gdb) bt
> #0 0x00007fbdeaca154c in __pthread_kill_implementation () from /lib64/libc.so.6
> #1 0x00007fbdeac54d06 in raise () from /lib64/libc.so.6
> #2 0x00007fbdeac287f3 in abort () from /lib64/libc.so.6
> #3 0x0000000000430ee4 in db_lock_database (hDB=hDB@entry=1)
> at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/odb.cxx:2473
> #4 0x0000000000437e9c in db_find_key (subhKey=0x7ffcc536d348, key_name=0x4687a8 "/Logger/Message file date
> format",
> hKey=0, hDB=1) at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/odb.cxx:4099
> #5 db_find_key (hDB=1, hKey=0, key_name=0x4687a8 "/Logger/Message file date format",
> subhKey=0x7ffcc536d348)
> at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/odb.cxx:4075
> #6 0x0000000000448297 in db_get_value_string (hdb=1, hKeyRoot=hKeyRoot@entry=0,
> key_name=key_name@entry=0x4687a8 "/Logger/Message file date format", index=index@entry=0,
> s=s@entry=0x7ffcc536d470, create=create@entry=1, create_string_length=0)
> at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/odb.cxx:13950
> #7 0x000000000040a690 in cm_msg_get_logfile (fac=<optimized out>, t=<optimized out>,
> filename=0x7ffcc536d690,
> linkname=0x7ffcc536d6b0, linktarget=0x7ffcc536d6d0)
> at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/midas.cxx:573
> #8 0x000000000041a307 in cm_msg_log (message_type=1, facility=0x46db0e "midas",
> message=0x7e4290 "[mserver,ERROR] [odb.cxx:2498:db_lock_database,ERROR] cannot lock ODB semaphore,
> timeout 10000 ms, exiting...") at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/midas.cxx:685
> #9 0x0000000000421fcd in cm_msg_flush_buffer () at /usr/include/c++/11/bits/basic_string.h:194
> #10 0x00007fbdeac574dd in __run_exit_handlers () from /lib64/libc.so.6
> #11 0x00007fbdeac57620 in exit () from /lib64/libc.so.6
> #12 0x0000000000430f7a in db_lock_database (hDB=hDB@entry=1)
> at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/odb.cxx:2499
> #13 0x0000000000437e9c in db_find_key (subhKey=0x7ffcc536da04, key_name=0x476a21 "/Alarms/Alarms", hKey=0,
> hDB=1)
> at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/odb.cxx:4099
> #14 db_find_key (hDB=1, hKey=hKey@entry=0, key_name=key_name@entry=0x476a21 "/Alarms/Alarms",
> subhKey=subhKey@entry=0x7ffcc536da04) at
> /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/odb.cxx:4075
> #15 0x0000000000455fd2 in al_check () at
> /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/alarm.cxx:614
> #16 0x000000000041ff85 in cm_periodic_tasks ()
> at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/midas.cxx:5596
> #17 0x00000000004235c5 in cm_yield (millisec=millisec@entry=1000)
> at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/midas.cxx:5676
> #18 0x00000000004065c2 in main (argc=<optimized out>, argv=0x7ffcc536e628)
> at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/progs/mserver.cxx:295
> (gdb) |
2875
|
11 Oct 2024 |
Stefan Ritt | Bug Report | Frontend name must differ from others by more than the last three characters | Hi Denis,
indeed a bug. Will fix it next week.
Best,
Stefan
> Hi,
> I have developed two Midas front-end programs for different hardware. The frontend_name of the first one is "FSCD_SC" (slow control) and that of the second one is "FSCD_PS" (power supply).
>
> Each front-end program runs fine separately, but when attempting to start FSCD_SC while FSCD_PS is running, FSCD_PS is terminated and Midas indicates "Previous frontend stopped" in the window where it starts FSCD_SC.
>
> The problem is that these two frontend names only differ in their last two characters, and Midas currently does not distinguish them properly. |
2874
|
11 Oct 2024 |
Denis Calvet | Bug Report | Frontend name must differ from others by more than the last three characters | Hi,
I have developed two Midas front-end programs for different hardware. The frontend_name of the first one is "FSCD_SC" (slow control) and that of the second one is "FSCD_PS" (power supply).
Each front-end program runs fine separately, but when attempting to start FSCD_SC while FSCD_PS is running, FSCD_PS is terminated and Midas indicates "Previous frontend stopped" in the window where it starts FSCD_SC.
The problem is that these two frontend names only differ in their last two characters, and Midas currently does not distinguish them properly.
Looking in mfe.cxx we have:
int main(int argc, char *argv[])
{
...
/* shutdown previous frontend */
status = cm_shutdown(full_frontend_name, FALSE);
...
And looking in midas.cxx we have:
INT cm_shutdown(const char *name, BOOL bUnique) {
...
if (!bUnique)
client_name[strlen(name)] = 0; /* strip number */
...
The above line removes the last 3 characters of the front-end name before the subsequent comparison with other frontend names. Stripping the last 3 characters of the front-end name is correct for frontend programs that use the "-i" command line option to specify an index for that frontend, but all the characters of the front-end name should otherwise be kept for comparison.
I have changed the names of my frontend programs to avoid the interference, but it would be nice that the code that determines if an instance of a frontend program is already running is corrected.
I hope this can help.
Best regards,
Denis. |
2873
|
10 Oct 2024 |
Amy Roberts | Bug Report | Difficulty running MIDAS on Rocky 9.4 | > > I've uploaded the current core dump at: https://gitlab.com/det-lab/coredumps#.
>
> I cannot read the core dump without the corresponding executable (and likely all it's shared libraries).
>
> It is best if you run gdb and extract the stack traces on your end.
>
> In case you are not familiar with gdb:
>
> gdb mserver core # start gdb
> bt # stack trace of crashed thread
> info thr # get list of threads
> thr 1
> bt
> thr 2
> bt
> # etc, get stack trace of each thread, there should not be too many of them
>
> K.O.
Hi Konstantin, thanks for the instructions. I do appear to be missing some debug symbols, but the output
looks potentially useful:
[lekhraj@sdfcdmsdaq ~]$ gdb mserver
core.mserver.17468.b174bb74f2bb44f9a0905e78ec6b2677.601715.1728422354000000
GNU gdb (GDB) Rocky Linux 10.2-11.1.el9_3
...
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from mserver...
[New LWP 601715]
warning: Section `.reg-xstate/601715' in core file too small.
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `mserver'.
Program terminated with signal SIGABRT, Aborted.
warning: Section `.reg-xstate/601715' in core file too small.
#0 0x00007fbdeaca154c in __pthread_kill_implementation () from /lib64/libc.so.6
Missing separate debuginfos, use: dnf debuginfo-install glibc-2.34-83.el9.12.x86_64 libgcc-11.4.1-
3.el9.x86_64 libstdc++-11.4.1-2.1.el9.x86_64 libzstd-1.5.1-2.el9.x86_64 mysql-libs-8.0.36-1.el9_3.x86_64
openssl-libs-3.0.7-25.el9_3.x86_64 zlib-1.2.11-40.el9.x86_64
(gdb)
(gdb) bt
#0 0x00007fbdeaca154c in __pthread_kill_implementation () from /lib64/libc.so.6
#1 0x00007fbdeac54d06 in raise () from /lib64/libc.so.6
#2 0x00007fbdeac287f3 in abort () from /lib64/libc.so.6
#3 0x0000000000430ee4 in db_lock_database (hDB=hDB@entry=1)
at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/odb.cxx:2473
#4 0x0000000000437e9c in db_find_key (subhKey=0x7ffcc536d348, key_name=0x4687a8 "/Logger/Message file date
format",
hKey=0, hDB=1) at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/odb.cxx:4099
#5 db_find_key (hDB=1, hKey=0, key_name=0x4687a8 "/Logger/Message file date format",
subhKey=0x7ffcc536d348)
at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/odb.cxx:4075
#6 0x0000000000448297 in db_get_value_string (hdb=1, hKeyRoot=hKeyRoot@entry=0,
key_name=key_name@entry=0x4687a8 "/Logger/Message file date format", index=index@entry=0,
s=s@entry=0x7ffcc536d470, create=create@entry=1, create_string_length=0)
at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/odb.cxx:13950
#7 0x000000000040a690 in cm_msg_get_logfile (fac=<optimized out>, t=<optimized out>,
filename=0x7ffcc536d690,
linkname=0x7ffcc536d6b0, linktarget=0x7ffcc536d6d0)
at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/midas.cxx:573
#8 0x000000000041a307 in cm_msg_log (message_type=1, facility=0x46db0e "midas",
message=0x7e4290 "[mserver,ERROR] [odb.cxx:2498:db_lock_database,ERROR] cannot lock ODB semaphore,
timeout 10000 ms, exiting...") at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/midas.cxx:685
#9 0x0000000000421fcd in cm_msg_flush_buffer () at /usr/include/c++/11/bits/basic_string.h:194
#10 0x00007fbdeac574dd in __run_exit_handlers () from /lib64/libc.so.6
#11 0x00007fbdeac57620 in exit () from /lib64/libc.so.6
#12 0x0000000000430f7a in db_lock_database (hDB=hDB@entry=1)
at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/odb.cxx:2499
#13 0x0000000000437e9c in db_find_key (subhKey=0x7ffcc536da04, key_name=0x476a21 "/Alarms/Alarms", hKey=0,
hDB=1)
at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/odb.cxx:4099
#14 db_find_key (hDB=1, hKey=hKey@entry=0, key_name=key_name@entry=0x476a21 "/Alarms/Alarms",
subhKey=subhKey@entry=0x7ffcc536da04) at
/sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/odb.cxx:4075
#15 0x0000000000455fd2 in al_check () at
/sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/alarm.cxx:614
--Type <RET> for more, q to quit, c to continue without paging--
#16 0x000000000041ff85 in cm_periodic_tasks ()
at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/midas.cxx:5596
#17 0x00000000004235c5 in cm_yield (millisec=millisec@entry=1000)
at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/midas.cxx:5676
#18 0x00000000004065c2 in main (argc=<optimized out>, argv=0x7ffcc536e628)
at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/progs/mserver.cxx:295
(gdb) info thr
Id Target Id Frame
* 1 Thread 0x7fbdec0b1740 (LWP 601715) 0x00007fbdeaca154c in __pthread_kill_implementation () from
/lib64/libc.so.6
(gdb) thr 1
[Switching to thread 1 (Thread 0x7fbdec0b1740 (LWP 601715))]
#0 0x00007fbdeaca154c in __pthread_kill_implementation () from /lib64/libc.so.6
(gdb) bt
#0 0x00007fbdeaca154c in __pthread_kill_implementation () from /lib64/libc.so.6
#1 0x00007fbdeac54d06 in raise () from /lib64/libc.so.6
#2 0x00007fbdeac287f3 in abort () from /lib64/libc.so.6
#3 0x0000000000430ee4 in db_lock_database (hDB=hDB@entry=1)
at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/odb.cxx:2473
#4 0x0000000000437e9c in db_find_key (subhKey=0x7ffcc536d348, key_name=0x4687a8 "/Logger/Message file date
format",
hKey=0, hDB=1) at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/odb.cxx:4099
#5 db_find_key (hDB=1, hKey=0, key_name=0x4687a8 "/Logger/Message file date format",
subhKey=0x7ffcc536d348)
at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/odb.cxx:4075
#6 0x0000000000448297 in db_get_value_string (hdb=1, hKeyRoot=hKeyRoot@entry=0,
key_name=key_name@entry=0x4687a8 "/Logger/Message file date format", index=index@entry=0,
s=s@entry=0x7ffcc536d470, create=create@entry=1, create_string_length=0)
at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/odb.cxx:13950
#7 0x000000000040a690 in cm_msg_get_logfile (fac=<optimized out>, t=<optimized out>,
filename=0x7ffcc536d690,
linkname=0x7ffcc536d6b0, linktarget=0x7ffcc536d6d0)
at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/midas.cxx:573
#8 0x000000000041a307 in cm_msg_log (message_type=1, facility=0x46db0e "midas",
message=0x7e4290 "[mserver,ERROR] [odb.cxx:2498:db_lock_database,ERROR] cannot lock ODB semaphore,
timeout 10000 ms, exiting...") at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/midas.cxx:685
#9 0x0000000000421fcd in cm_msg_flush_buffer () at /usr/include/c++/11/bits/basic_string.h:194
#10 0x00007fbdeac574dd in __run_exit_handlers () from /lib64/libc.so.6
#11 0x00007fbdeac57620 in exit () from /lib64/libc.so.6
#12 0x0000000000430f7a in db_lock_database (hDB=hDB@entry=1)
at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/odb.cxx:2499
#13 0x0000000000437e9c in db_find_key (subhKey=0x7ffcc536da04, key_name=0x476a21 "/Alarms/Alarms", hKey=0,
hDB=1)
at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/odb.cxx:4099
#14 db_find_key (hDB=1, hKey=hKey@entry=0, key_name=key_name@entry=0x476a21 "/Alarms/Alarms",
subhKey=subhKey@entry=0x7ffcc536da04) at
/sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/odb.cxx:4075
#15 0x0000000000455fd2 in al_check () at
/sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/alarm.cxx:614
#16 0x000000000041ff85 in cm_periodic_tasks ()
at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/midas.cxx:5596
#17 0x00000000004235c5 in cm_yield (millisec=millisec@entry=1000)
at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/midas.cxx:5676
#18 0x00000000004065c2 in main (argc=<optimized out>, argv=0x7ffcc536e628)
at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/progs/mserver.cxx:295
(gdb) |
2872
|
09 Oct 2024 |
Stefan Ritt | Suggestion | odbedit minor quality of life | Ok, accepted, done and pushed.
Stefan
Lukas Gerritzen wrote: | I have made two minor quality of life changes to odbedit.
- cd command: Typing cd without arguments now changes the directory to /, similar to the behaviour of the cd command in Linux sending you to the home directory.
- Exit behavior: Upon exiting the program with Ctrl+C, a newline character is printed so that the command line starts on an empty line rather than the last line from odbedit.
Here's the diff:
@@ -1668,7 +1668,10 @@ int command_loop(char *host_name, char *exp_name, char *cmd, char *start_dir)
/* cd */
else if (param[0][0] == 'c' && param[0][1] == 'd') {
- compose_name(pwd, param[1], str);
+ if (strlen(param[1]) == 0)
+ strcpy(str, "/");
+ else
+ compose_name(pwd, param[1], str);
status = db_find_key(hDB, 0, str, &hKey);
@@ -2962,6 +2965,7 @@ void ctrlc_odbedit(INT i)
cm_disconnect_experiment();
+ printf("\n");
exit(EXIT_SUCCESS);
}
Please consider incorporating those changes to odbedit.
Lukas |
|
2871
|
09 Oct 2024 |
Lukas Gerritzen | Suggestion | odbedit minor quality of life | I have made two minor quality of life changes to odbedit.
- cd command: Typing cd without arguments now changes the directory to /, similar to the behaviour of the cd command in Linux sending you to the home directory.
- Exit behavior: Upon exiting the program with Ctrl+C, a newline character is printed so that the command line starts on an empty line rather than the last line from odbedit.
Here's the diff:
@@ -1668,7 +1668,10 @@ int command_loop(char *host_name, char *exp_name, char *cmd, char *start_dir)
/* cd */
else if (param[0][0] == 'c' && param[0][1] == 'd') {
- compose_name(pwd, param[1], str);
+ if (strlen(param[1]) == 0)
+ strcpy(str, "/");
+ else
+ compose_name(pwd, param[1], str);
status = db_find_key(hDB, 0, str, &hKey);
@@ -2962,6 +2965,7 @@ void ctrlc_odbedit(INT i)
cm_disconnect_experiment();
+ printf("\n");
exit(EXIT_SUCCESS);
}
Please consider incorporating those changes to odbedit.
Lukas |
2870
|
08 Oct 2024 |
Konstantin Olchanski | Bug Report | Difficulty running MIDAS on Rocky 9.4 | > I read these error messages. There is no ODB corruption. ODB semaphore is locked and all midas programs will fail...
Recovery from this is:
- stop all midas programs (actually they should have all crashed by now)
- identify the ODB semaphore with: ipcs -s -t
- remove the ODB semaphore with: ipcrm sem <semid>
- where <semid> is from the first column of ipcs
- keep deleting semaphores until odbedit works.
- if you delete extra midas sempahores, odbedit will recreate them
- if you delete non-midas semaphores, oh, well...
Little bit better steps for this recovery may be written up by Suzannah in this forum
or in the midas wiki... good luck finding them.
K.O. |
2869
|
08 Oct 2024 |
Konstantin Olchanski | Bug Report | Difficulty running MIDAS on Rocky 9.4 | > Basically I think Rocky 9.4 is a red herring.
This is what likely happened:
- some program crashed while holding the ODB lock semaphore (by ctrl-C at the wrong time or by kill -KILL at the wring time)
- semaphore is locked with a flag "unlock if this program stops"
- this is supposed to ensure ODB lock semaphore never gets stuck in the locked state
- (there is no code in MIDAS to unlock the ODB lock semaphore without locking it first)
- we have observed a malfunction in the Linux kernel, where this automatic unlock does not happen.
- it is rare and so far cannot be reproduced. you can find more about it by searching this forum.
- I think it is a bug in the System-V semaphore code or in the Linux "program stop" code (a path where they fail to call the semaphore unlock handler, and who knows what
other handlers).
- System-V semaphores are very obsolete, replaced by POSIX semaphores.
- POSIX semaphores do not have the "unlock if this program stops" magic, the user (MIDAS) is responsible with detecting that the program who locked the semaphore is gone
and and with taking corrective action, i.e. release the lock, automatically.
I do not know if this problem with System-V semaphores is in the generic Linux kernel, or if it is specific
to the Red Hat kernels (they are known to have many patches, deviating quite far from vanilla kernels).
I do not know if this problem is somehow sensitive to virtual machines.
So yes/no, Red Hat derived linux on a virtual machine could be where this problem happens more often that elsewhere.
K.O. |
2868
|
08 Oct 2024 |
Konstantin Olchanski | Bug Report | Difficulty running MIDAS on Rocky 9.4 | I read these error messages. There is no ODB corruption. ODB semaphore is locked and all midas programs will fail, they will timeout trying to get the lock, report the timeout, then it looks like a bug was introduced where instead of hard exit or abort() they attempt a clean shutdown which crashes from a recursive call in db_lock_database(). Amy's
core dump should confirm this.
K.O.
> > > We're trying to install the SuperCDMS version of MIDAS on a Rocky 9.4 Virtual
> > > Machine and are getting a persistent error when we run mserver. As far as I
> > > know there are minimal changes between this and the MIDAS branch, but Ben Smith
> > > may have more to say on this.
> >
> > For reference, "the SuperCDMS version of MIDAS" is just a fork that no longer has any meaningful differences vs the main MIDAS repo, but we only pull updates infrequently after testing a bunch. We last pulled from the develop branch in November 2023. But that should be irrelevant here as semaphore code hasn't been touched for a very long time.
> >
> > We're running Alma 9.4 on a machine at TRIUMF and the same version of midas works fine there (Amy, you may already have access to scdms-zeus). I believe Alma and Rocky should be basically identical for this.
> >
> > So the questions are:
> > * Have you tried other midas programs, or only mserver? E.g. did odbedit and mhttpd work?
> > * If other programs work, have you been running them all as the same user? In particular, if you ran one program as root and another as an unprivileged user, then you will likely get odd permissions issues.
> > * What do you see if you run `ls -l /dev/shm` and `ls -l ~/packages/SuperCDMS_DAQ/MidasDAQ/online/.*SHM`? (Or wherever your online dir is for the 2nd one).
> > * Did you follow the full instructions for recovering from a corrupt ODB? https://daq00.triumf.ca/MidasWiki/index.php/FAQ#How_to_recover_from_a_corrupted_ODB In particular the bit about running odbinit with the --cleanup flag?
>
> Here's what happens when I try to run odbedit:
>
> [lekhraj@sdfcdmsdaq setup]$ odbedit
> [ODBEdit,ERROR] [odb.cxx:2052:db_open_database,ERROR] Removed ODB client 'ODBEdit', index 0 because process pid 481823 does not exists
> [ODBEdit,INFO] Removed open record flag from "/Experiment/Security/RPC hosts/Allowed hosts"
> [ODBEdit,INFO] Removed exclusive access mode from "/Experiment/Security/RPC hosts/Allowed hosts"
> [ODBEdit,INFO] Corrected 1 ODB entries
> [ODBEdit,INFO] Deleted entry '/System/Clients/481823' for client 'ODBEdit' because it is not connected to ODB
> [ODBEdit,INFO] Client 'ODBEdit' on buffer 'SYSMSG' removed by bm_open_buffer because process pid 481823 does not exist
> [ODBEdit,ERROR] [odb.cxx:2498:db_lock_database,ERROR] cannot lock ODB semaphore, timeout 10000 ms, exiting...
> [ODBEdit,ERROR] [midas.cxx:2205:cm_check_connect,ERROR] cm_disconnect_experiment not called at end of program
> db_lock_database: Detected recursive call to db_{lock,unlock}_database() while already inside db_{lock,unlock}_database(). Maybe this is a call from a signal handler. Cannot continue, aborting...
> Aborted (core dumped)
> [lekhraj@sdfcdmsdaq setup]$
>
> And mhttpd:
>
> [lekhraj@sdfcdmsdaq setup]$ mhttpd
> [mhttpd,ERROR] [odb.cxx:2052:db_open_database,ERROR] Removed ODB client 'ODBEdit', index 0 because process pid 601054 does not exists
> [mhttpd,INFO] Removed open record flag from "/Experiment/Security/RPC hosts/Allowed hosts"
> [mhttpd,INFO] Removed exclusive access mode from "/Experiment/Security/RPC hosts/Allowed hosts"
> [mhttpd,INFO] Corrected 1 ODB entries
> [mhttpd,INFO] Deleted entry '/System/Clients/601054' for client 'ODBEdit' because it is not connected to ODB
> [mhttpd,INFO] Client 'ODBEdit' on buffer 'SYSMSG' removed by bm_open_buffer because process pid 601054 does not exist
> [mhttpd,INFO] ODB subtree /Runinfo corrected successfully
> Password protection is off
> Hostlist off, connections from anywhere will be accepted
> Listening on "http://localhost:8080", passwords OFF, hostlist OFF
> Listening on "http://[::1]:8080", passwords OFF, hostlist OFF
> bm_lock_buffer: Lock buffer "SYSMSG" is taking longer than 1 second!
> [mhttpd,ERROR] [odb.cxx:2498:db_lock_database,ERROR] cannot lock ODB semaphore, timeout 10000 ms, exiting...
> [mhttpd,ERROR] [midas.cxx:2205:cm_check_connect,ERROR] cm_disconnect_experiment not called at end of program
> db_lock_database: Detected recursive call to db_{lock,unlock}_database() while already inside db_{lock,unlock}_database(). Maybe this is a call from a signal handler. Cannot continue, aborting...
> Aborted (core dumped)
> [lekhraj@sdfcdmsdaq setup]$
>
> We have been running everything as a single user, the user who cloned the repositories and owns the directories.
>
> We did follow the corrupted-ODB cleanup instructions.
>
> [lekhraj@sdfcdmsdaq setup]$ ls -lh /dev/shm
> total 1.3M
> -rw------- 1 lekhraj dm 1.2M Oct 8 14:13 17468_test_ODB__sdf_home_l_lekhraj_packages_SuperCDMS_DAQ_MidasDAQ_online_
> -rw------- 1 lekhraj dm 114K Oct 7 14:06 17468_test_SYSMSG__sdf_home_l_lekhraj_packages_SuperCDMS_DAQ_MidasDAQ_online_
>
> [lekhraj@sdfcdmsdaq setup]$ ls -lh ~/packages/SuperCDMS_DAQ/MidasDAQ/online/.*SHM
> -rw-r--r-- 1 lekhraj dm 0 Oct 3 08:46 /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/MidasDAQ/online/.ALARM.SHM
> -rw-r--r-- 1 lekhraj dm 0 Oct 3 08:46 /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/MidasDAQ/online/.ELOG.SHM
> -rw-r--r-- 1 lekhraj dm 0 Oct 3 08:46 /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/MidasDAQ/online/.HISTORY.SHM
> -rw-r--r-- 1 lekhraj dm 0 Oct 3 08:46 /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/MidasDAQ/online/.LAZY.SHM
> -rw-r--r-- 1 lekhraj dm 0 Oct 3 08:46 /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/MidasDAQ/online/.MSG.SHM
> -rw-r--r-- 1 lekhraj dm 1.2M Oct 8 14:12 /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/MidasDAQ/online/.ODB.SHM
> -rw-r--r-- 1 lekhraj dm 0 Oct 3 08:46 /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/MidasDAQ/online/.SYSMSG.SHM
> -rw-r--r-- 1 lekhraj dm 0 Oct 3 08:46 /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/MidasDAQ/online/.SYSTEM.SHM |
2867
|
08 Oct 2024 |
Konstantin Olchanski | Bug Report | Difficulty running MIDAS on Rocky 9.4 | > I've uploaded the current core dump at: https://gitlab.com/det-lab/coredumps#.
I cannot read the core dump without the corresponding executable (and likely all it's shared libraries).
It is best if you run gdb and extract the stack traces on your end.
In case you are not familiar with gdb:
gdb mserver core # start gdb
bt # stack trace of crashed thread
info thr # get list of threads
thr 1
bt
thr 2
bt
# etc, get stack trace of each thread, there should not be too many of them
K.O. |
|