Back Midas Rome Roody Rootana
  Midas DAQ System, Page 1 of 144  Not logged in ELOG logo
New entries since:Wed Dec 31 16:00:00 1969
ID Datedown Author Topic Subject
  2891   07 Nov 2024 Stefan RittSuggestionStop run and sequencer button
I don't find this very useful. Some experiments do not only want to stop the run, but also do other cleanup things. To do that, I proposed and "atexit" function like C has it. Then the user can put a run stop there, plus any other cleanup. This will be much more flexible. Think about the "reset" script we have to manually run if we abort a sequencer. The atexit function will come next week, so you should consider to use it instead your additional button.

Stefan
  2890   07 Nov 2024 Lukas GerritzenSuggestionStop run and sequencer button
Due to popular demand among our students, I added a button to the sequencer that stops the run and the sequence. If you find it useful, please consider merging this upstream.
$ git diff sequencer.html
diff --git a/resources/sequencer.html b/resources/sequencer.html
index e7f8a79d..95c7e3d8 100644
--- a/resources/sequencer.html
+++ b/resources/sequencer.html
@@ -115,6 +115,7 @@
               <img src="icons/play.svg" title="Start" class="seqbtn Stopped" onclick="startSeq();">
               <img src="icons/debug.svg" title="Debug" class="seqbtn Stopped" onclick="debugSeq();">
               <img src="icons/square.svg" title="Stop" class="seqbtn Running Paused" onclick="stopSeq();">
+              <img src="icons/x-octagon.svg" title="Stop Run and Sequencer immediately" class="seqbtn Running Paused" onclick="stopRunAndSeq();">
               <img src="icons/pause.svg" title="Pause" class="seqbtn Running" onclick="modbset('/Sequencer/Command/Pause script',true);">
               <img src="icons/resume.svg" title="Resume" class="seqbtn Paused" onclick="modbset('/Sequencer/Command/Resume script',true);">
               <img src="icons/step-over.svg" title="Step Over" class="seqbtn Running Paused" onclick="modbset('/Sequencer/Command/Step over',true);">
[gac-megj@pc13513 resources]$ git diff sequencer.js
diff --git a/resources/sequencer.js b/resources/sequencer.js
index cc5398ef..b75c926c 100644
--- a/resources/sequencer.js
+++ b/resources/sequencer.js
@@ -1582,6 +1582,23 @@ function stopSeq() {
    });
 }

+function stopRunAndSeq() {
+   const message = `Are you sure you want to stop the run and sequence?`;
+   dlgConfirm(message,function(resp) {
+      if (resp) {
+         modbset('/Sequencer/Command/Stop immediately',true);
+
+         mjsonrpc_call("cm_transition", {"transition": "TR_STOP"}).then(function (rpc) {
+            if (rpc.result.status !== 1) {
+               throw new Error("Cannot stop run, cm_transition() status " + rpc.result.status + ", see MIDAS messages");
+            }
+         }).catch(function (error) {
+            mjsonrpc_error_alert(error);
+         });
+      }
+   });
+}
+
 // Show or hide parameters table
 function showParTable(varContainer) {
    let e = document.getElementById(varContainer);
  2889   06 Nov 2024 Amy RobertsBug ReportDifficulty running MIDAS on Rocky 9.4
After following Konstantin's debugging suggestions, I thought I would try to replicate 
the issue on my own computer.  My hope was that I could provide instructions for 
replicating the bug so that the MIDAS team could try debugging things more easily.

However, when I ran the current version of MIDAS in a Rocky 9.4 VM on my laptop (both 
VMWare and VirtualBox), mserver and odbedit ran just fine (!).

I'm currently trying to find out if there's a way to compare the VMs on my machine and 
the machine that's being problematic, I'll report back if I learn anything.
  2886   05 Nov 2024 Maia Henriksson-WardForumHow to properly write a client listens for events on a given buffer?
> If there's some template for writing a client to access event data, that would be 
> very useful (and you can probably just ignore the context I gave below in that 
> case).
> 
> 
> Some context:
> 
> Quite a while ago, I wrote the attached "data pipeline" client whose job was to 
> listen for events, copy their data, and pipe them to a python script. I believe I 
> just stole bits and pieces from mdump.cxx to accomplish this. Later I wrote the 
> attached wrapper class "MidasConnector.cpp" and a main.cpp to generalize
> data_pipeline.cxx a bit. There were a lot of iterations to the code where I had the 
> below problems; so don't take the logic in the attached code as the exact code that 
> caused the issues below.
> 
> However, I'm unable to resolve a couple issues:
> 
> 1. If a timeout is set, everything will work until that timeout is reached. Then 
> regardless of what kind of logic I tried to implement (retry receiving event, 
> disconnect and reconnect client, etc.) the client would refuse to receive more data.
> 
> 2. When I ctrl-C main, it hangs; this is expected because it's stuck in a while 
> loop. But because I can't set a timeout I have to ctrl-C twice; this would 
> occasionally corrupt the ODB which was not ideal. I was able to get around this with 
> some impractical solution involving ncurses I believe.
> 
> 
> Thanks,
> Jack

midas/examples/lowlevel/consume.cxx might be what you're looking for, but I think all 
you're missing is a call to cm_yield() in your loop, so your midas client doesn't get 
killed when the timeout is reached (and also so you can act on shutdown requests from 
midas)

Something like 
      int status = cm_yield(100);
      if (status == SS_ABORT || status == RPC_SHUTDOWN)
         break;

There might be a recommended way to handle the ctrl-c and disconnect from the ODB, but 
off the top of my head I don't remember it. 

Also check out Ben's new(ish) python library, midas/python/examples/event_receiver.py 
might be a much easier solution. And you can use the context manager, which will take 
care of safely disconnecting from midas after you ctrl-C.
  2885   05 Nov 2024 Jack CarltonForumHow to properly write a client listens for events on a given buffer?
If there's some template for writing a client to access event data, that would be 
very useful (and you can probably just ignore the context I gave below in that 
case).


Some context:

Quite a while ago, I wrote the attached "data pipeline" client whose job was to 
listen for events, copy their data, and pipe them to a python script. I believe I 
just stole bits and pieces from mdump.cxx to accomplish this. Later I wrote the 
attached wrapper class "MidasConnector.cpp" and a main.cpp to generalize
data_pipeline.cxx a bit. There were a lot of iterations to the code where I had the 
below problems; so don't take the logic in the attached code as the exact code that 
caused the issues below.

However, I'm unable to resolve a couple issues:

1. If a timeout is set, everything will work until that timeout is reached. Then 
regardless of what kind of logic I tried to implement (retry receiving event, 
disconnect and reconnect client, etc.) the client would refuse to receive more data.

2. When I ctrl-C main, it hangs; this is expected because it's stuck in a while 
loop. But because I can't set a timeout I have to ctrl-C twice; this would 
occasionally corrupt the ODB which was not ideal. I was able to get around this with 
some impractical solution involving ncurses I believe.


Thanks,
Jack
Attachment 1: data_pipeline_(2).cxx
#include "midas.h"
#include "msystem.h"
#include "mrpc.h"
#include "mdsupport.h"
#include <iostream>
#include <unistd.h>
#include <stdio.h>  // Added for popen
#include <stdlib.h> // Added for malloc and free

INT hBufEvent;

void process_event(EVENT_HEADER *pheader) {
    printf("Received event #%d\n", pheader->serial_number);
    printf("Event ID: %d\n", pheader->event_id);
    printf("Data Size: %d bytes\n", pheader->data_size);
    printf("Timestamp: %d\n", pheader->time_stamp);
    printf("Trigger mask: %d\n", pheader->trigger_mask);

    // Print a marker to indicate the start of serialized data
    printf("EVENT_DATA_START\n");

    // Serialize and print the event data
    int* eventData = (int*)((char*)pheader + sizeof(EVENT_HEADER));
    int numIntegers = (pheader->data_size - sizeof(EVENT_HEADER)) / sizeof(int);
    for (int i = 0; i < 8; ++i) {
        printf("%d ", eventData[i]);
    }
    printf("\n");

    // Process the event here
}

int main() {
    HNDLE hDB, hKey;
    char host_name[HOST_NAME_LENGTH], expt_name[NAME_LENGTH], str[80];
    char buf_name[32] = EVENT_BUFFER_NAME, rep_file[128];
    unsigned int status, start_time, stop_time;
    INT ch, request_id, size, get_flag, action, single, i;

    // Define the maximum event size you expect to receive
    INT max_event_size = 4000;

    // Allocate memory for storing event data dynamically
    void* event_data = malloc(max_event_size);

    printf("1\n");
    /* Get if existing the pre-defined experiment */
    cm_get_environment(host_name, sizeof(host_name), expt_name, sizeof(expt_name));
    // Print host_name
    printf("host_name = %s\n", host_name);

    // Print expt_name
    printf("expt_name = %s\n", expt_name);

    printf("2\n");
    /* connect to the experiment */
    status = cm_connect_experiment(host_name, expt_name, "data_pipeline", 0);
    if (status != CM_SUCCESS) {
        return 1;
    }
    printf("3\n");

    status = bm_open_buffer(buf_name, DEFAULT_BUFFER_SIZE, &hBufEvent);
    if (status != BM_SUCCESS && status != BM_CREATED) {
        cm_msg(MERROR, "data_pipeline", "Cannot open buffer \"%s\", bm_open_buffer() status %d", buf_name, status);
        return 1;
    }
    printf("4\n");
    /* set the buffer cache size if requested */
    bm_set_cache_size(hBufEvent, 100000, 0);
    printf("5\n");

    /* place a request for a specific event id */
    status = bm_request_event(hBufEvent, EVENTID_ALL, TRIGGER_ALL, GET_ALL, &request_id, NULL); // Use NULL as the callback routine
    printf("6\n");
    printf("status = %d\n",status);
    // Open a pipe to a Python script for data transfer
    FILE* pipe = popen("python3 data_pipeline.py", "w");
    if (pipe == NULL) {
        perror("popen");
        return 1;
    }

    // Enter the event processing loop
    while (1) {
        // Use the address of max_event_size in bm_receive_event
        status = bm_receive_event(hBufEvent, event_data, &max_event_size, BM_WAIT); // Wait for new data indefinitely

        if (status == BM_SUCCESS) {
            //process_event((EVENT_HEADER*)((char*)event_data + sizeof(EVENT_HEADER)));
            // Send the event data to the Python script via the pipe
            fprintf(pipe, "EVENT_DATA_START\n");
            int* eventData = (int*)((char*)event_data + sizeof(EVENT_HEADER));
            int numIntegers = (max_event_size - sizeof(EVENT_HEADER)) / sizeof(int);
            for (int i = 4; i < 12; ++i) {
                fprintf(pipe, "%d ", eventData[i]);
            }
            fprintf(pipe, "\n");
            fflush(pipe); // Flush the buffer to ensure data is sent immediately
        } else {
            printf("Error receiving event: %d\n", status);
            break; // Exit the loop if an error occurs
        }
    }

    // Close the pipe
    pclose(pipe);

    // Free the dynamically allocated memory
    free(event_data);

    cm_disconnect_experiment();

    printf("7\n");
    return 1;
}
Attachment 2: MidasConnector.cpp
#include "MidasConnector.h"

MidasConnector::MidasConnector(const char* clientName) {
    // Initialize client name
    strncpy(client_name_, clientName, NAME_LENGTH);

    // Get host name and experiment name from environment
    cm_get_environment(host_name_, sizeof(host_name_), experiment_name_, sizeof(experiment_name_));

    // Initialize other private variables if needed
    event_id = EVENTID_ALL;  // Initialize with default value
    trigger_mask = TRIGGER_ALL;  // Initialize with default value
    sampling_type = GET_ALL;  // Initialize with default value (renamed from get_flags)
    buffer_size = DEFAULT_BUFFER_SIZE;  // Initialize with default value
    timeout_millis = BM_WAIT;
    strncpy(buffer_name, EVENT_BUFFER_NAME, sizeof(buffer_name));  // Initialize with default value
}

// Getters for the private variables
short MidasConnector::getEventId() const {
    return event_id;
}

short MidasConnector::getTriggerMask() const {
    return trigger_mask;
}

int MidasConnector::getSamplingType() const {
    return sampling_type;  
}

int MidasConnector::getBufferSize() const {
    return buffer_size;
}

const char* MidasConnector::getBufferName() const {
    return buffer_name;
}

int MidasConnector::getTimeout() const {
    return timeout_millis;
}

HNDLE MidasConnector::getEventBufferHandle() const {
    return hBufEvent;
}

// Setters for the private variables
void MidasConnector::setEventId(short eventId) {
    event_id = eventId;
}

void MidasConnector::setTriggerMask(short triggerMask) {
    trigger_mask = triggerMask;
}

void MidasConnector::setSamplingType(int samplingType) {
    sampling_type = samplingType;  
}

void MidasConnector::setBufferSize(int bufferSize) {
    buffer_size = bufferSize;
}

void MidasConnector::setBufferName(const char* bufferName) {
    strncpy(buffer_name, bufferName, sizeof(buffer_name));
}

void MidasConnector::setTimeout(int timeoutMillis) {
    timeout_millis = timeoutMillis;
}

void MidasConnector::setEventBufferHandle(HNDLE eventBufferHandle) {
    hBufEvent = eventBufferHandle;
}


bool MidasConnector::ConnectToExperiment() {
    // Connect to the experiment
    int status = cm_connect_experiment(host_name_, experiment_name_, client_name_, NULL);
    if (status != CM_SUCCESS) {
        // Handle connection error
        return false;
    }
    return true;
}

void MidasConnector::DisconnectFromExperiment() {
    // Disconnect from the experiment
    cm_disconnect_experiment();
}

bool MidasConnector::OpenEventBuffer() {
    int status = bm_open_buffer(buffer_name, buffer_size, &hBufEvent);
    if (status != BM_SUCCESS && status != BM_CREATED) {
        cm_msg(MERROR, client_name_, "Cannot open buffer \"%s\", bm_open_buffer() status %d", buffer_name, status);
        return false;
    }
    return true;
}

bool MidasConnector::SetCacheSize(int cacheSize) {
    bm_set_cache_size(hBufEvent, cacheSize, 0);
    return true;
}

bool MidasConnector::RequestEvent() {
    int request_id;
    int status = bm_request_event(hBufEvent, event_id, trigger_mask, sampling_type, &request_id, NULL);
    return status == BM_SUCCESS;
}

bool MidasConnector::ReceiveEvent(void* eventBuffer, int& maxEventSize) {
    int status = bm_receive_event(hBufEvent, eventBuffer, &maxEventSize, timeout_millis);
    return status == BM_SUCCESS;
}

Attachment 3: main.cpp
#include "event_processor/EventProcessor.h"
#include "data_transmitter/DataTransmitter.h"
#include "midas_connector/MidasConnector.h"
#include "json.hpp"
#include <fstream>

INT hBufEvent1;
INT hBufEvent2;

// Function to initialize MIDAS and open an event buffer
bool initializeMidas(MidasConnector& midasConnector, const nlohmann::json& config) {
    // Set the MidasConnector properties based on the config
    midasConnector.setEventId(config["eventId"].get<short>());
    midasConnector.setTriggerMask(config["triggerMask"].get<short>());
    midasConnector.setSamplingType(config["samplingType"].get<int>());
    midasConnector.setBufferSize(config["bufferSize"].get<int>());
    midasConnector.setBufferName(config["bufferName"].get<std::string>().c_str());
    midasConnector.setBufferSize(config["bufferSize"].get<int>());

    // Call the ConnectToExperiment method
    if (!midasConnector.ConnectToExperiment()) {
        return false;
    }

    // Call the OpenEventBuffer method
    if (!midasConnector.OpenEventBuffer()) {
        return false;
    }

    // Set the buffer cache size if requested
    midasConnector.SetCacheSize(config["cacheSize"].get<int>());

    // Place a request for a specific event id
    if (!midasConnector.RequestEvent()) {
        return false;
    }

    return true;
}

int main() {
    // Read configuration from a JSON file
    nlohmann::json config;
    std::ifstream configFile("config.json");
    configFile >> config;
    configFile.close();

    // Initialize MidasConnector and connect to the MIDAS experiment
    MidasConnector midasConnector(config["clientName"].get<std::string>().c_str());
    if (!initializeMidas(midasConnector, config)) {
        printf("Error: Failed to initialize MIDAS.\n");
        return 1;
    }

    // Read the maximum event size from the JSON configuration
    INT max_event_size = config["maxEventSize"].get<int>();

    // Allocate memory for storing event data dynamically
    void* event_data = malloc(max_event_size);

    // Initialize EventProcessor with detector mapping file and verbosity flag
    EventProcessor eventProcessor(config["detectorMappingFile"].get<std::string>(), config["verbose"].get<bool>());

    // Initialize DataTransmitter with the ZeroMQ address
    DataTransmitter dataPublisher(config["zmqAddress"].get<std::string>());

    // Connect to the ZeroMQ server
    if (!dataPublisher.bind()) {
        // Handle connection error
        printf("Error: Failed to bind to port %s.\n", config["zmqAddress"].get<std::string>().c_str());
        return 1;
    } else {
        printf("Connected to the ZeroMQ server.\n");
    }


    // Event processing loop
    while (true) {

        midasConnector.ReceiveEvent(event_data, max_event_size);

        //Prcoess data once we have it
        eventProcessor.processEvent(event_data, max_event_size);

        // Serialize the event data with EventProcessor and store it in serializedData
        std::string serializedData = eventProcessor.getSerializedData();

        // Send the serialized data to the ZeroMQ server with DataTransmitter
        if (!dataPublisher.publish(serializedData)) {
            // Handle send error
            printf("Error: Failed to send serialized data.\n");
        }
    }

    // Cleanup and finalize your application
    midasConnector.DisconnectFromExperiment(); // Disconnect from the MIDAS experiment

    return 0;
}
  2884   28 Oct 2024 Amy RobertsBug ReportDifficulty running MIDAS on Rocky 9.4
> Now for each timeout it will print detailed syscall and timing information, if time goes backwards, it should catch it.

It appears that time is moving forward:

[aroberts@sdfcdmsdaq build]$ odbedit
[ODBEdit,ERROR] [odb.cxx:2043:db_open_database,ERROR] Removed ODB client 'ODBEdit', index 0 because process pid 1617119 does 
not exists
[ODBEdit,INFO] Removed open record flag from "/Experiment/Security/RPC hosts/Allowed hosts"
[ODBEdit,INFO] Removed exclusive access mode from "/Experiment/Security/RPC hosts/Allowed hosts"
[ODBEdit,INFO] Corrected 1 ODB entries
[ODBEdit,INFO] Deleted entry '/System/Clients/1617119' for client 'ODBEdit' because it is not connected to ODB
[ODBEdit,INFO] Client 'ODBEdit' on buffer 'SYSMSG' removed by bm_open_buffer because process pid 1617119 does not exist
[local:amy_test:S]/>ss_semaphore_wait_for: semop/semtimedop(5) returned -1, errno 11 (Resource temporarily unavailable), 
start time 0xd4fd98f6, now 0xd4fdc0ef, dt 0x000027f9, timeout 0x00002710 ms, SEMAPHORE TIMEOUT!
[ODBEdit,ERROR] [odb.cxx:2489:db_lock_database,ERROR] cannot lock ODB semaphore, timeout 10000 ms, aborting...
Aborted (core dumped)
  Draft   28 Oct 2024 Lukas GerritzenBug ReportVisual glitch in history system
Attachment 1: Screenshot_2024-10-28_at_17.34.26.png
Screenshot_2024-10-28_at_17.34.26.png
  2882   28 Oct 2024 Lukas GerritzenBug ReportVisual glitch in history system
Today, I encountered the bug shown in the attached video. The value of the plotted curve does not match the mouseover number.

When trying to understand it better, I stopped being able to replicate. Has anyone else observed a similar problem? 
Attachment 1: Screen_Recording_2024-10-28_at_17.23.57.mov
Attachment 2: Screenshot_2024-10-28_at_17.29.34.png
Screenshot_2024-10-28_at_17.29.34.png
  2881   23 Oct 2024 Lukas GerritzenBug ReportODB key picker does not close when creating link / Edit-on-run string box too large
To reproduce:
In the interactive ODB, click the &#128279; icon to create a link. Next to the target,  click the "..." button to open 
the key picker browser. Then try to close it by either:
- Selecting a key and clicking ok
- Clicking "cancel"
- Clicking the red circle at the top left

Expected result:
The key picker closes

Actual result:
The key picker does not close.

Depending on how you trying to close the picker, the error messages in the debug console differ slightly.

On the red circle:
Uncaught TypeError: dlg is null
    dlgClose http://localhost:8080/controls.js:791
    onclick http://localhost:8080/?cmd=ODB&odb_path=/Test:1

On "ok" or "cancel":
Uncaught TypeError: dlg is null
    dlgMessageDestroy http://localhost:8080/controls.js:828
    pickerButton http://localhost:8080/odbbrowser.js:453
    onclick http://localhost:8080/?cmd=ODB&odb_path=/Test:1


Another more minor visual problem is the edit-on-start dialog. There seems to be no upper bound to the 
size of the text box. In the attached screenshot, ShortString has a maximum length of 32 characters, 
LongString has 255. Both are empty at the time of the screenshot. Maybe, the size should be limited to a 
reasonable width.
Attachment 1: Screenshot_2024-10-23_at_11.38.38.png
Screenshot_2024-10-23_at_11.38.38.png
  2880   18 Oct 2024 Konstantin OlchanskiBug ReportDifficulty running MIDAS on Rocky 9.4
> suddenly, an ODB semaphore timeout...

I am wondering if something bizarre is going on, like the system clock going backwards. I heard of things like that 
happening in virtual environments.

https://stackoverflow.com/questions/4801122/how-to-stop-time-from-running-backwards-on-linux

I added some debugging information to the semaphore locking code. Please update to commit 
eb625af119067f6d702211542d88a28ccb57ad2c of src/system.cxx (plus small change in include/msystem.h) and try again.

Now for each timeout it will print detailed syscall and timing information, if time goes backwards, it should catch it.

K.O.
  2879   18 Oct 2024 Konstantin OlchanskiBug ReportDifficulty running MIDAS on Rocky 9.4
> [aroberts@sdfcdmsdaq midas]$ odbedit
> [ODBEdit,ERROR] [odb.cxx:2043:db_open_database,ERROR] Removed ODB client 'ODBEdit', index 0 because process pid 
> 1615051 does not exists
> [ODBEdit,INFO] Removed open record flag from "/Experiment/Security/RPC hosts/Allowed hosts"
> [ODBEdit,INFO] Removed exclusive access mode from "/Experiment/Security/RPC hosts/Allowed hosts"
> [ODBEdit,INFO] Corrected 1 ODB entries
> [ODBEdit,INFO] Deleted entry '/System/Clients/1615051' for client 'ODBEdit' because it is not connected to ODB
> [ODBEdit,INFO] Client 'ODBEdit' on buffer 'SYSMSG' removed by bm_open_buffer because process pid 1615051 does not 
> exist

so far, so good, we connected to ODB (lock was not stuck), cleared out client "odbedit" with pid 1615051 that crashed 
without properly disconnecting. ODB semaphore is working correctly.

> [ODBEdit,ERROR] [odb.cxx:2489:db_lock_database,ERROR] cannot lock ODB semaphore, timeout 10000 ms, aborting...
> Aborted (core dumped)

suddenly, an ODB semaphore timeout...

can you post the stack trace from this core dump? I am pretty sure it will be boring, but just in case...

K.O.
  2878   18 Oct 2024 Stefan RittBug ReportFrontend name must differ from others by more than the last three characters
Fixed and committed.

Best,
Stefan

> Hi Denis,
> 
> indeed a bug. Will fix it next week.
> 
> Best,
> Stefan
> 
> 
> > Hi,
> > I have developed two Midas front-end programs for different hardware. The frontend_name of the first one is "FSCD_SC" (slow control) and that of the second one is "FSCD_PS" (power supply).
> > 
> > Each front-end program runs fine separately, but when attempting to start FSCD_SC while FSCD_PS is running, FSCD_PS is terminated and Midas indicates "Previous frontend stopped" in the window where it starts FSCD_SC.
> > 
> > The problem is that these two frontend names only differ in their last two characters, and Midas currently does not distinguish them properly.
  2877   16 Oct 2024 Amy RobertsBug ReportDifficulty running MIDAS on Rocky 9.4
> Thank you for the stack trace, I fixed the buglet that cause midas programs to crash twice,
> once on failure to lock ODB, then call exit() -> atexit() handlers -> cm_check_connect() -> crash on ODB lock 
> failure is the cm_msg() codes.
> 
> Replaced exit(1) with abort(). Could have used kill(getpid(),SIGKILL) to avoid making a core dump, but what the 
> heck...
> 
> Of course this does nothing to the original bug where ODB was locked and nobody will ever unlock it (reboot will 
> unlock it!).
> 
> commit bdd1d7fdc093b5a8d54a1b8467002bb3cac3ac11
> 
> K.O.
> 
> 
> > > > I've uploaded the current core dump at: https://gitlab.com/det-lab/coredumps#.
> > > 
> > > I cannot read the core dump without the corresponding executable (and likely all it's shared libraries).
> > > 
> > > It is best if you run gdb and extract the stack traces on your end.
> > > 
> > > In case you are not familiar with gdb:
> > > 
> > > gdb mserver core # start gdb
> > > bt # stack trace of crashed thread
> > > info thr # get list of threads
> > > thr 1
> > > bt
> > > thr 2
> > > bt
> > > # etc, get stack trace of each thread, there should not be too many of them
> > > 
> > > K.O.
> > 
> > Hi Konstantin, thanks for the instructions.  I do appear to be missing some debug symbols, but the output 
> > looks potentially useful:
> > 
> > [lekhraj@sdfcdmsdaq ~]$ gdb mserver 
> > core.mserver.17468.b174bb74f2bb44f9a0905e78ec6b2677.601715.1728422354000000
> > GNU gdb (GDB) Rocky Linux 10.2-11.1.el9_3
> > ...
> > For help, type "help".
> > Type "apropos word" to search for commands related to "word"...
> > Reading symbols from mserver...
> > [New LWP 601715]
> > 
> > warning: Section `.reg-xstate/601715' in core file too small.
> > [Thread debugging using libthread_db enabled]
> > Using host libthread_db library "/lib64/libthread_db.so.1".
> > Core was generated by `mserver'.
> > Program terminated with signal SIGABRT, Aborted.
> > 
> > warning: Section `.reg-xstate/601715' in core file too small.
> > #0  0x00007fbdeaca154c in __pthread_kill_implementation () from /lib64/libc.so.6
> > Missing separate debuginfos, use: dnf debuginfo-install glibc-2.34-83.el9.12.x86_64 libgcc-11.4.1-
> > 3.el9.x86_64 libstdc++-11.4.1-2.1.el9.x86_64 libzstd-1.5.1-2.el9.x86_64 mysql-libs-8.0.36-1.el9_3.x86_64 
> > openssl-libs-3.0.7-25.el9_3.x86_64 zlib-1.2.11-40.el9.x86_64
> > (gdb)
> > (gdb) bt
> > #0  0x00007fbdeaca154c in __pthread_kill_implementation () from /lib64/libc.so.6
> > #1  0x00007fbdeac54d06 in raise () from /lib64/libc.so.6
> > #2  0x00007fbdeac287f3 in abort () from /lib64/libc.so.6
> > #3  0x0000000000430ee4 in db_lock_database (hDB=hDB@entry=1)
> >     at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/odb.cxx:2473
> > #4  0x0000000000437e9c in db_find_key (subhKey=0x7ffcc536d348, key_name=0x4687a8 "/Logger/Message file date 
> > format",
> >     hKey=0, hDB=1) at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/odb.cxx:4099
> > #5  db_find_key (hDB=1, hKey=0, key_name=0x4687a8 "/Logger/Message file date format", 
> > subhKey=0x7ffcc536d348)
> >     at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/odb.cxx:4075
> > #6  0x0000000000448297 in db_get_value_string (hdb=1, hKeyRoot=hKeyRoot@entry=0,
> >     key_name=key_name@entry=0x4687a8 "/Logger/Message file date format", index=index@entry=0,
> >     s=s@entry=0x7ffcc536d470, create=create@entry=1, create_string_length=0)
> >     at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/odb.cxx:13950
> > #7  0x000000000040a690 in cm_msg_get_logfile (fac=<optimized out>, t=<optimized out>, 
> > filename=0x7ffcc536d690,
> >     linkname=0x7ffcc536d6b0, linktarget=0x7ffcc536d6d0)
> >     at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/midas.cxx:573
> > #8  0x000000000041a307 in cm_msg_log (message_type=1, facility=0x46db0e "midas",
> >     message=0x7e4290 "[mserver,ERROR] [odb.cxx:2498:db_lock_database,ERROR] cannot lock ODB semaphore, 
> > timeout 10000 ms, exiting...") at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/midas.cxx:685
> > #9  0x0000000000421fcd in cm_msg_flush_buffer () at /usr/include/c++/11/bits/basic_string.h:194
> > #10 0x00007fbdeac574dd in __run_exit_handlers () from /lib64/libc.so.6
> > #11 0x00007fbdeac57620 in exit () from /lib64/libc.so.6
> > #12 0x0000000000430f7a in db_lock_database (hDB=hDB@entry=1)
> >     at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/odb.cxx:2499
> > #13 0x0000000000437e9c in db_find_key (subhKey=0x7ffcc536da04, key_name=0x476a21 "/Alarms/Alarms", hKey=0, 
> > hDB=1)
> >     at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/odb.cxx:4099
> > #14 db_find_key (hDB=1, hKey=hKey@entry=0, key_name=key_name@entry=0x476a21 "/Alarms/Alarms",
> >     subhKey=subhKey@entry=0x7ffcc536da04) at 
> > /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/odb.cxx:4075
> > #15 0x0000000000455fd2 in al_check () at 
> > /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/alarm.cxx:614
> > --Type <RET> for more, q to quit, c to continue without paging--
> > #16 0x000000000041ff85 in cm_periodic_tasks ()
> >     at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/midas.cxx:5596
> > #17 0x00000000004235c5 in cm_yield (millisec=millisec@entry=1000)
> >     at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/midas.cxx:5676
> > #18 0x00000000004065c2 in main (argc=<optimized out>, argv=0x7ffcc536e628)
> >     at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/progs/mserver.cxx:295
> > (gdb) info thr
> >   Id   Target Id                          Frame
> > * 1    Thread 0x7fbdec0b1740 (LWP 601715) 0x00007fbdeaca154c in __pthread_kill_implementation () from 
> > /lib64/libc.so.6
> > (gdb) thr 1
> > [Switching to thread 1 (Thread 0x7fbdec0b1740 (LWP 601715))]
> > #0  0x00007fbdeaca154c in __pthread_kill_implementation () from /lib64/libc.so.6
> > (gdb) bt
> > #0  0x00007fbdeaca154c in __pthread_kill_implementation () from /lib64/libc.so.6
> > #1  0x00007fbdeac54d06 in raise () from /lib64/libc.so.6
> > #2  0x00007fbdeac287f3 in abort () from /lib64/libc.so.6
> > #3  0x0000000000430ee4 in db_lock_database (hDB=hDB@entry=1)
> >     at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/odb.cxx:2473
> > #4  0x0000000000437e9c in db_find_key (subhKey=0x7ffcc536d348, key_name=0x4687a8 "/Logger/Message file date 
> > format",
> >     hKey=0, hDB=1) at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/odb.cxx:4099
> > #5  db_find_key (hDB=1, hKey=0, key_name=0x4687a8 "/Logger/Message file date format", 
> > subhKey=0x7ffcc536d348)
> >     at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/odb.cxx:4075
> > #6  0x0000000000448297 in db_get_value_string (hdb=1, hKeyRoot=hKeyRoot@entry=0,
> >     key_name=key_name@entry=0x4687a8 "/Logger/Message file date format", index=index@entry=0,
> >     s=s@entry=0x7ffcc536d470, create=create@entry=1, create_string_length=0)
> >     at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/odb.cxx:13950
> > #7  0x000000000040a690 in cm_msg_get_logfile (fac=<optimized out>, t=<optimized out>, 
> > filename=0x7ffcc536d690,
> >     linkname=0x7ffcc536d6b0, linktarget=0x7ffcc536d6d0)
> >     at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/midas.cxx:573
> > #8  0x000000000041a307 in cm_msg_log (message_type=1, facility=0x46db0e "midas",
> >     message=0x7e4290 "[mserver,ERROR] [odb.cxx:2498:db_lock_database,ERROR] cannot lock ODB semaphore, 
> > timeout 10000 ms, exiting...") at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/midas.cxx:685
> > #9  0x0000000000421fcd in cm_msg_flush_buffer () at /usr/include/c++/11/bits/basic_string.h:194
> > #10 0x00007fbdeac574dd in __run_exit_handlers () from /lib64/libc.so.6
> > #11 0x00007fbdeac57620 in exit () from /lib64/libc.so.6
> > #12 0x0000000000430f7a in db_lock_database (hDB=hDB@entry=1)
> >     at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/odb.cxx:2499
> > #13 0x0000000000437e9c in db_find_key (subhKey=0x7ffcc536da04, key_name=0x476a21 "/Alarms/Alarms", hKey=0, 
> > hDB=1)
> >     at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/odb.cxx:4099
> > #14 db_find_key (hDB=1, hKey=hKey@entry=0, key_name=key_name@entry=0x476a21 "/Alarms/Alarms",
> >     subhKey=subhKey@entry=0x7ffcc536da04) at 
> > /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/odb.cxx:4075
> > #15 0x0000000000455fd2 in al_check () at 
> > /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/alarm.cxx:614
> > #16 0x000000000041ff85 in cm_periodic_tasks ()
> >     at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/midas.cxx:5596
> > #17 0x00000000004235c5 in cm_yield (millisec=millisec@entry=1000)
> >     at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/midas.cxx:5676
> > #18 0x00000000004065c2 in main (argc=<optimized out>, argv=0x7ffcc536e628)
> >     at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/progs/mserver.cxx:295
> > (gdb)

I checked out the modified version of Midas and recompiled, and am still getting a similar error when I try to run 
odbedit:

[aroberts@sdfcdmsdaq midas]$ odbedit
[ODBEdit,ERROR] [odb.cxx:2043:db_open_database,ERROR] Removed ODB client 'ODBEdit', index 0 because process pid 
1615051 does not exists
[ODBEdit,INFO] Removed open record flag from "/Experiment/Security/RPC hosts/Allowed hosts"
[ODBEdit,INFO] Removed exclusive access mode from "/Experiment/Security/RPC hosts/Allowed hosts"
[ODBEdit,INFO] Corrected 1 ODB entries
[ODBEdit,INFO] Deleted entry '/System/Clients/1615051' for client 'ODBEdit' because it is not connected to ODB
[ODBEdit,INFO] Client 'ODBEdit' on buffer 'SYSMSG' removed by bm_open_buffer because process pid 1615051 does not 
exist
[ODBEdit,ERROR] [odb.cxx:2489:db_lock_database,ERROR] cannot lock ODB semaphore, timeout 10000 ms, aborting...
Aborted (core dumped)

I'm not sure what's causing the call to lock the database, all I'm doing is typing "odbedit" in the command prompt.

I should add that I followed the instructions for unlocking the ODB, but when I call "odbedit" this error still 
appears:

[aroberts@sdfcdmsdaq midas]$ ipcs -s -t

------ Semaphore Operation/Change Times --------
semid    owner      last-op                    last-changed
4        aroberts    Wed Oct 16 11:44:10 2024   Wed Oct 16 11:44:00 2024

[aroberts@sdfcdmsdaq midas]$ ipcrm sem 4
resource(s) deleted
[aroberts@sdfcdmsdaq midas]$ odbedit
[ODBEdit,ERROR] [odb.cxx:2043:db_open_database,ERROR] Removed ODB client 'ODBEdit', index 0 because process pid 
1617050 does not exists
[ODBEdit,INFO] Removed open record flag from "/Experiment/Security/RPC hosts/Allowed hosts"
[ODBEdit,INFO] Removed exclusive access mode from "/Experiment/Security/RPC hosts/Allowed hosts"
[ODBEdit,INFO] Corrected 1 ODB entries
[ODBEdit,INFO] Deleted entry '/System/Clients/1617050' for client 'ODBEdit' because it is not connected to ODB
[ODBEdit,INFO] Client 'ODBEdit' on buffer 'SYSMSG' removed by bm_open_buffer because process pid 1617050 does not 
exist
[ODBEdit,ERROR] [odb.cxx:2489:db_lock_database,ERROR] cannot lock ODB semaphore, timeout 10000 ms, aborting...
Aborted (core dumped)
  2876   13 Oct 2024 Konstantin OlchanskiBug ReportDifficulty running MIDAS on Rocky 9.4
Thank you for the stack trace, I fixed the buglet that cause midas programs to crash twice,
once on failure to lock ODB, then call exit() -> atexit() handlers -> cm_check_connect() -> crash on ODB lock 
failure is the cm_msg() codes.

Replaced exit(1) with abort(). Could have used kill(getpid(),SIGKILL) to avoid making a core dump, but what the 
heck...

Of course this does nothing to the original bug where ODB was locked and nobody will ever unlock it (reboot will 
unlock it!).

commit bdd1d7fdc093b5a8d54a1b8467002bb3cac3ac11

K.O.


> > > I've uploaded the current core dump at: https://gitlab.com/det-lab/coredumps#.
> > 
> > I cannot read the core dump without the corresponding executable (and likely all it's shared libraries).
> > 
> > It is best if you run gdb and extract the stack traces on your end.
> > 
> > In case you are not familiar with gdb:
> > 
> > gdb mserver core # start gdb
> > bt # stack trace of crashed thread
> > info thr # get list of threads
> > thr 1
> > bt
> > thr 2
> > bt
> > # etc, get stack trace of each thread, there should not be too many of them
> > 
> > K.O.
> 
> Hi Konstantin, thanks for the instructions.  I do appear to be missing some debug symbols, but the output 
> looks potentially useful:
> 
> [lekhraj@sdfcdmsdaq ~]$ gdb mserver 
> core.mserver.17468.b174bb74f2bb44f9a0905e78ec6b2677.601715.1728422354000000
> GNU gdb (GDB) Rocky Linux 10.2-11.1.el9_3
> ...
> For help, type "help".
> Type "apropos word" to search for commands related to "word"...
> Reading symbols from mserver...
> [New LWP 601715]
> 
> warning: Section `.reg-xstate/601715' in core file too small.
> [Thread debugging using libthread_db enabled]
> Using host libthread_db library "/lib64/libthread_db.so.1".
> Core was generated by `mserver'.
> Program terminated with signal SIGABRT, Aborted.
> 
> warning: Section `.reg-xstate/601715' in core file too small.
> #0  0x00007fbdeaca154c in __pthread_kill_implementation () from /lib64/libc.so.6
> Missing separate debuginfos, use: dnf debuginfo-install glibc-2.34-83.el9.12.x86_64 libgcc-11.4.1-
> 3.el9.x86_64 libstdc++-11.4.1-2.1.el9.x86_64 libzstd-1.5.1-2.el9.x86_64 mysql-libs-8.0.36-1.el9_3.x86_64 
> openssl-libs-3.0.7-25.el9_3.x86_64 zlib-1.2.11-40.el9.x86_64
> (gdb)
> (gdb) bt
> #0  0x00007fbdeaca154c in __pthread_kill_implementation () from /lib64/libc.so.6
> #1  0x00007fbdeac54d06 in raise () from /lib64/libc.so.6
> #2  0x00007fbdeac287f3 in abort () from /lib64/libc.so.6
> #3  0x0000000000430ee4 in db_lock_database (hDB=hDB@entry=1)
>     at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/odb.cxx:2473
> #4  0x0000000000437e9c in db_find_key (subhKey=0x7ffcc536d348, key_name=0x4687a8 "/Logger/Message file date 
> format",
>     hKey=0, hDB=1) at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/odb.cxx:4099
> #5  db_find_key (hDB=1, hKey=0, key_name=0x4687a8 "/Logger/Message file date format", 
> subhKey=0x7ffcc536d348)
>     at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/odb.cxx:4075
> #6  0x0000000000448297 in db_get_value_string (hdb=1, hKeyRoot=hKeyRoot@entry=0,
>     key_name=key_name@entry=0x4687a8 "/Logger/Message file date format", index=index@entry=0,
>     s=s@entry=0x7ffcc536d470, create=create@entry=1, create_string_length=0)
>     at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/odb.cxx:13950
> #7  0x000000000040a690 in cm_msg_get_logfile (fac=<optimized out>, t=<optimized out>, 
> filename=0x7ffcc536d690,
>     linkname=0x7ffcc536d6b0, linktarget=0x7ffcc536d6d0)
>     at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/midas.cxx:573
> #8  0x000000000041a307 in cm_msg_log (message_type=1, facility=0x46db0e "midas",
>     message=0x7e4290 "[mserver,ERROR] [odb.cxx:2498:db_lock_database,ERROR] cannot lock ODB semaphore, 
> timeout 10000 ms, exiting...") at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/midas.cxx:685
> #9  0x0000000000421fcd in cm_msg_flush_buffer () at /usr/include/c++/11/bits/basic_string.h:194
> #10 0x00007fbdeac574dd in __run_exit_handlers () from /lib64/libc.so.6
> #11 0x00007fbdeac57620 in exit () from /lib64/libc.so.6
> #12 0x0000000000430f7a in db_lock_database (hDB=hDB@entry=1)
>     at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/odb.cxx:2499
> #13 0x0000000000437e9c in db_find_key (subhKey=0x7ffcc536da04, key_name=0x476a21 "/Alarms/Alarms", hKey=0, 
> hDB=1)
>     at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/odb.cxx:4099
> #14 db_find_key (hDB=1, hKey=hKey@entry=0, key_name=key_name@entry=0x476a21 "/Alarms/Alarms",
>     subhKey=subhKey@entry=0x7ffcc536da04) at 
> /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/odb.cxx:4075
> #15 0x0000000000455fd2 in al_check () at 
> /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/alarm.cxx:614
> --Type <RET> for more, q to quit, c to continue without paging--
> #16 0x000000000041ff85 in cm_periodic_tasks ()
>     at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/midas.cxx:5596
> #17 0x00000000004235c5 in cm_yield (millisec=millisec@entry=1000)
>     at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/midas.cxx:5676
> #18 0x00000000004065c2 in main (argc=<optimized out>, argv=0x7ffcc536e628)
>     at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/progs/mserver.cxx:295
> (gdb) info thr
>   Id   Target Id                          Frame
> * 1    Thread 0x7fbdec0b1740 (LWP 601715) 0x00007fbdeaca154c in __pthread_kill_implementation () from 
> /lib64/libc.so.6
> (gdb) thr 1
> [Switching to thread 1 (Thread 0x7fbdec0b1740 (LWP 601715))]
> #0  0x00007fbdeaca154c in __pthread_kill_implementation () from /lib64/libc.so.6
> (gdb) bt
> #0  0x00007fbdeaca154c in __pthread_kill_implementation () from /lib64/libc.so.6
> #1  0x00007fbdeac54d06 in raise () from /lib64/libc.so.6
> #2  0x00007fbdeac287f3 in abort () from /lib64/libc.so.6
> #3  0x0000000000430ee4 in db_lock_database (hDB=hDB@entry=1)
>     at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/odb.cxx:2473
> #4  0x0000000000437e9c in db_find_key (subhKey=0x7ffcc536d348, key_name=0x4687a8 "/Logger/Message file date 
> format",
>     hKey=0, hDB=1) at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/odb.cxx:4099
> #5  db_find_key (hDB=1, hKey=0, key_name=0x4687a8 "/Logger/Message file date format", 
> subhKey=0x7ffcc536d348)
>     at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/odb.cxx:4075
> #6  0x0000000000448297 in db_get_value_string (hdb=1, hKeyRoot=hKeyRoot@entry=0,
>     key_name=key_name@entry=0x4687a8 "/Logger/Message file date format", index=index@entry=0,
>     s=s@entry=0x7ffcc536d470, create=create@entry=1, create_string_length=0)
>     at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/odb.cxx:13950
> #7  0x000000000040a690 in cm_msg_get_logfile (fac=<optimized out>, t=<optimized out>, 
> filename=0x7ffcc536d690,
>     linkname=0x7ffcc536d6b0, linktarget=0x7ffcc536d6d0)
>     at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/midas.cxx:573
> #8  0x000000000041a307 in cm_msg_log (message_type=1, facility=0x46db0e "midas",
>     message=0x7e4290 "[mserver,ERROR] [odb.cxx:2498:db_lock_database,ERROR] cannot lock ODB semaphore, 
> timeout 10000 ms, exiting...") at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/midas.cxx:685
> #9  0x0000000000421fcd in cm_msg_flush_buffer () at /usr/include/c++/11/bits/basic_string.h:194
> #10 0x00007fbdeac574dd in __run_exit_handlers () from /lib64/libc.so.6
> #11 0x00007fbdeac57620 in exit () from /lib64/libc.so.6
> #12 0x0000000000430f7a in db_lock_database (hDB=hDB@entry=1)
>     at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/odb.cxx:2499
> #13 0x0000000000437e9c in db_find_key (subhKey=0x7ffcc536da04, key_name=0x476a21 "/Alarms/Alarms", hKey=0, 
> hDB=1)
>     at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/odb.cxx:4099
> #14 db_find_key (hDB=1, hKey=hKey@entry=0, key_name=key_name@entry=0x476a21 "/Alarms/Alarms",
>     subhKey=subhKey@entry=0x7ffcc536da04) at 
> /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/odb.cxx:4075
> #15 0x0000000000455fd2 in al_check () at 
> /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/alarm.cxx:614
> #16 0x000000000041ff85 in cm_periodic_tasks ()
>     at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/midas.cxx:5596
> #17 0x00000000004235c5 in cm_yield (millisec=millisec@entry=1000)
>     at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/midas.cxx:5676
> #18 0x00000000004065c2 in main (argc=<optimized out>, argv=0x7ffcc536e628)
>     at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/progs/mserver.cxx:295
> (gdb)
  2875   11 Oct 2024 Stefan RittBug ReportFrontend name must differ from others by more than the last three characters
Hi Denis,

indeed a bug. Will fix it next week.

Best,
Stefan


> Hi,
> I have developed two Midas front-end programs for different hardware. The frontend_name of the first one is "FSCD_SC" (slow control) and that of the second one is "FSCD_PS" (power supply).
> 
> Each front-end program runs fine separately, but when attempting to start FSCD_SC while FSCD_PS is running, FSCD_PS is terminated and Midas indicates "Previous frontend stopped" in the window where it starts FSCD_SC.
> 
> The problem is that these two frontend names only differ in their last two characters, and Midas currently does not distinguish them properly.
  2874   11 Oct 2024 Denis CalvetBug ReportFrontend name must differ from others by more than the last three characters
Hi,
I have developed two Midas front-end programs for different hardware. The frontend_name of the first one is "FSCD_SC" (slow control) and that of the second one is "FSCD_PS" (power supply).

Each front-end program runs fine separately, but when attempting to start FSCD_SC while FSCD_PS is running, FSCD_PS is terminated and Midas indicates "Previous frontend stopped" in the window where it starts FSCD_SC.

The problem is that these two frontend names only differ in their last two characters, and Midas currently does not distinguish them properly.

Looking in mfe.cxx we have:

int main(int argc, char *argv[])
{
...
/* shutdown previous frontend */
   status = cm_shutdown(full_frontend_name, FALSE);
...

And looking in midas.cxx we have:

INT cm_shutdown(const char *name, BOOL bUnique) {
...
if (!bUnique)
            client_name[strlen(name)] = 0;      /* strip number */
...

The above line removes the last 3 characters of the front-end name before the subsequent comparison with other frontend names. Stripping the last 3 characters of the front-end name is correct for frontend programs that use the "-i" command line option to specify an index for that frontend, but all the characters of the front-end name should otherwise be kept for comparison.

I have changed the names of my frontend programs to avoid the interference, but it would be nice that the code that determines if an instance of a frontend program is already running is corrected.

I hope this can help.

Best regards,
Denis.
  2873   10 Oct 2024 Amy RobertsBug ReportDifficulty running MIDAS on Rocky 9.4
> > I've uploaded the current core dump at: https://gitlab.com/det-lab/coredumps#.
> 
> I cannot read the core dump without the corresponding executable (and likely all it's shared libraries).
> 
> It is best if you run gdb and extract the stack traces on your end.
> 
> In case you are not familiar with gdb:
> 
> gdb mserver core # start gdb
> bt # stack trace of crashed thread
> info thr # get list of threads
> thr 1
> bt
> thr 2
> bt
> # etc, get stack trace of each thread, there should not be too many of them
> 
> K.O.

Hi Konstantin, thanks for the instructions.  I do appear to be missing some debug symbols, but the output 
looks potentially useful:

[lekhraj@sdfcdmsdaq ~]$ gdb mserver 
core.mserver.17468.b174bb74f2bb44f9a0905e78ec6b2677.601715.1728422354000000
GNU gdb (GDB) Rocky Linux 10.2-11.1.el9_3
...
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from mserver...
[New LWP 601715]

warning: Section `.reg-xstate/601715' in core file too small.
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `mserver'.
Program terminated with signal SIGABRT, Aborted.

warning: Section `.reg-xstate/601715' in core file too small.
#0  0x00007fbdeaca154c in __pthread_kill_implementation () from /lib64/libc.so.6
Missing separate debuginfos, use: dnf debuginfo-install glibc-2.34-83.el9.12.x86_64 libgcc-11.4.1-
3.el9.x86_64 libstdc++-11.4.1-2.1.el9.x86_64 libzstd-1.5.1-2.el9.x86_64 mysql-libs-8.0.36-1.el9_3.x86_64 
openssl-libs-3.0.7-25.el9_3.x86_64 zlib-1.2.11-40.el9.x86_64
(gdb)
(gdb) bt
#0  0x00007fbdeaca154c in __pthread_kill_implementation () from /lib64/libc.so.6
#1  0x00007fbdeac54d06 in raise () from /lib64/libc.so.6
#2  0x00007fbdeac287f3 in abort () from /lib64/libc.so.6
#3  0x0000000000430ee4 in db_lock_database (hDB=hDB@entry=1)
    at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/odb.cxx:2473
#4  0x0000000000437e9c in db_find_key (subhKey=0x7ffcc536d348, key_name=0x4687a8 "/Logger/Message file date 
format",
    hKey=0, hDB=1) at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/odb.cxx:4099
#5  db_find_key (hDB=1, hKey=0, key_name=0x4687a8 "/Logger/Message file date format", 
subhKey=0x7ffcc536d348)
    at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/odb.cxx:4075
#6  0x0000000000448297 in db_get_value_string (hdb=1, hKeyRoot=hKeyRoot@entry=0,
    key_name=key_name@entry=0x4687a8 "/Logger/Message file date format", index=index@entry=0,
    s=s@entry=0x7ffcc536d470, create=create@entry=1, create_string_length=0)
    at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/odb.cxx:13950
#7  0x000000000040a690 in cm_msg_get_logfile (fac=<optimized out>, t=<optimized out>, 
filename=0x7ffcc536d690,
    linkname=0x7ffcc536d6b0, linktarget=0x7ffcc536d6d0)
    at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/midas.cxx:573
#8  0x000000000041a307 in cm_msg_log (message_type=1, facility=0x46db0e "midas",
    message=0x7e4290 "[mserver,ERROR] [odb.cxx:2498:db_lock_database,ERROR] cannot lock ODB semaphore, 
timeout 10000 ms, exiting...") at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/midas.cxx:685
#9  0x0000000000421fcd in cm_msg_flush_buffer () at /usr/include/c++/11/bits/basic_string.h:194
#10 0x00007fbdeac574dd in __run_exit_handlers () from /lib64/libc.so.6
#11 0x00007fbdeac57620 in exit () from /lib64/libc.so.6
#12 0x0000000000430f7a in db_lock_database (hDB=hDB@entry=1)
    at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/odb.cxx:2499
#13 0x0000000000437e9c in db_find_key (subhKey=0x7ffcc536da04, key_name=0x476a21 "/Alarms/Alarms", hKey=0, 
hDB=1)
    at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/odb.cxx:4099
#14 db_find_key (hDB=1, hKey=hKey@entry=0, key_name=key_name@entry=0x476a21 "/Alarms/Alarms",
    subhKey=subhKey@entry=0x7ffcc536da04) at 
/sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/odb.cxx:4075
#15 0x0000000000455fd2 in al_check () at 
/sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/alarm.cxx:614
--Type <RET> for more, q to quit, c to continue without paging--
#16 0x000000000041ff85 in cm_periodic_tasks ()
    at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/midas.cxx:5596
#17 0x00000000004235c5 in cm_yield (millisec=millisec@entry=1000)
    at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/midas.cxx:5676
#18 0x00000000004065c2 in main (argc=<optimized out>, argv=0x7ffcc536e628)
    at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/progs/mserver.cxx:295
(gdb) info thr
  Id   Target Id                          Frame
* 1    Thread 0x7fbdec0b1740 (LWP 601715) 0x00007fbdeaca154c in __pthread_kill_implementation () from 
/lib64/libc.so.6
(gdb) thr 1
[Switching to thread 1 (Thread 0x7fbdec0b1740 (LWP 601715))]
#0  0x00007fbdeaca154c in __pthread_kill_implementation () from /lib64/libc.so.6
(gdb) bt
#0  0x00007fbdeaca154c in __pthread_kill_implementation () from /lib64/libc.so.6
#1  0x00007fbdeac54d06 in raise () from /lib64/libc.so.6
#2  0x00007fbdeac287f3 in abort () from /lib64/libc.so.6
#3  0x0000000000430ee4 in db_lock_database (hDB=hDB@entry=1)
    at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/odb.cxx:2473
#4  0x0000000000437e9c in db_find_key (subhKey=0x7ffcc536d348, key_name=0x4687a8 "/Logger/Message file date 
format",
    hKey=0, hDB=1) at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/odb.cxx:4099
#5  db_find_key (hDB=1, hKey=0, key_name=0x4687a8 "/Logger/Message file date format", 
subhKey=0x7ffcc536d348)
    at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/odb.cxx:4075
#6  0x0000000000448297 in db_get_value_string (hdb=1, hKeyRoot=hKeyRoot@entry=0,
    key_name=key_name@entry=0x4687a8 "/Logger/Message file date format", index=index@entry=0,
    s=s@entry=0x7ffcc536d470, create=create@entry=1, create_string_length=0)
    at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/odb.cxx:13950
#7  0x000000000040a690 in cm_msg_get_logfile (fac=<optimized out>, t=<optimized out>, 
filename=0x7ffcc536d690,
    linkname=0x7ffcc536d6b0, linktarget=0x7ffcc536d6d0)
    at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/midas.cxx:573
#8  0x000000000041a307 in cm_msg_log (message_type=1, facility=0x46db0e "midas",
    message=0x7e4290 "[mserver,ERROR] [odb.cxx:2498:db_lock_database,ERROR] cannot lock ODB semaphore, 
timeout 10000 ms, exiting...") at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/midas.cxx:685
#9  0x0000000000421fcd in cm_msg_flush_buffer () at /usr/include/c++/11/bits/basic_string.h:194
#10 0x00007fbdeac574dd in __run_exit_handlers () from /lib64/libc.so.6
#11 0x00007fbdeac57620 in exit () from /lib64/libc.so.6
#12 0x0000000000430f7a in db_lock_database (hDB=hDB@entry=1)
    at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/odb.cxx:2499
#13 0x0000000000437e9c in db_find_key (subhKey=0x7ffcc536da04, key_name=0x476a21 "/Alarms/Alarms", hKey=0, 
hDB=1)
    at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/odb.cxx:4099
#14 db_find_key (hDB=1, hKey=hKey@entry=0, key_name=key_name@entry=0x476a21 "/Alarms/Alarms",
    subhKey=subhKey@entry=0x7ffcc536da04) at 
/sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/odb.cxx:4075
#15 0x0000000000455fd2 in al_check () at 
/sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/alarm.cxx:614
#16 0x000000000041ff85 in cm_periodic_tasks ()
    at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/midas.cxx:5596
#17 0x00000000004235c5 in cm_yield (millisec=millisec@entry=1000)
    at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/midas.cxx:5676
#18 0x00000000004065c2 in main (argc=<optimized out>, argv=0x7ffcc536e628)
    at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/progs/mserver.cxx:295
(gdb)
  2872   09 Oct 2024 Stefan RittSuggestionodbedit minor quality of life
Ok, accepted, done and pushed.

Stefan


Lukas Gerritzen wrote:
I have made two minor quality of life changes to odbedit.
  • cd command: Typing cd without arguments now changes the directory to /, similar to the behaviour of the cd command in Linux sending you to the home directory.
  • Exit behavior: Upon exiting the program with Ctrl+C, a newline character is printed so that the command line starts on an empty line rather than the last line from odbedit.
Here's the diff:
@@ -1668,7 +1668,10 @@ int command_loop(char *host_name, char *exp_name, char *cmd, char *start_dir)

       /* cd */
       else if (param[0][0] == 'c' && param[0][1] == 'd') {
-         compose_name(pwd, param[1], str);
+         if (strlen(param[1]) == 0)
+            strcpy(str, "/");
+         else
+            compose_name(pwd, param[1], str);

          status = db_find_key(hDB, 0, str, &hKey);

@@ -2962,6 +2965,7 @@ void ctrlc_odbedit(INT i)

    cm_disconnect_experiment();

+   printf("\n");
    exit(EXIT_SUCCESS);
 }

Please consider incorporating those changes to odbedit.

Lukas
  2871   09 Oct 2024 Lukas GerritzenSuggestionodbedit minor quality of life
I have made two minor quality of life changes to odbedit.
  • cd command: Typing cd without arguments now changes the directory to /, similar to the behaviour of the cd command in Linux sending you to the home directory.
  • Exit behavior: Upon exiting the program with Ctrl+C, a newline character is printed so that the command line starts on an empty line rather than the last line from odbedit.
Here's the diff:
@@ -1668,7 +1668,10 @@ int command_loop(char *host_name, char *exp_name, char *cmd, char *start_dir)

       /* cd */
       else if (param[0][0] == 'c' && param[0][1] == 'd') {
-         compose_name(pwd, param[1], str);
+         if (strlen(param[1]) == 0)
+            strcpy(str, "/");
+         else
+            compose_name(pwd, param[1], str);

          status = db_find_key(hDB, 0, str, &hKey);

@@ -2962,6 +2965,7 @@ void ctrlc_odbedit(INT i)

    cm_disconnect_experiment();

+   printf("\n");
    exit(EXIT_SUCCESS);
 }

Please consider incorporating those changes to odbedit.

Lukas
  2870   08 Oct 2024 Konstantin OlchanskiBug ReportDifficulty running MIDAS on Rocky 9.4
> I read these error messages. There is no ODB corruption. ODB semaphore is locked and all midas programs will fail...

Recovery from this is:
- stop all midas programs (actually they should have all crashed by now)
- identify the ODB semaphore with: ipcs -s -t
- remove the ODB semaphore with: ipcrm sem <semid>
- where <semid> is from the first column of ipcs
- keep deleting semaphores until odbedit works.
- if you delete extra midas sempahores, odbedit will recreate them
- if you delete non-midas semaphores, oh, well...

Little bit better steps for this recovery may be written up by Suzannah in this forum
or in the midas wiki... good luck finding them.

K.O.
ELOG V3.1.4-2e1708b5