ELOG Midas

Back Midas Rome Roody Rootana

Midas DAQ System, Page 126 of 155

Not logged in

Find | Login | Help

Full | Summary | Threaded | Hide attachments

3085 Entries

Goto page Previous 1, 2, 3 ... 125, 126, 127 ... 153, 154, 155 Next

ID	Date	Author	Topic	Subject
2319	26 Jan 2022	Stefan Ritt	Bug Report	Off-by-one in sequencer documentation
> Shades/ghosts of FORTRAN. c/c++/perl/python loops loop from 0 to n-1. for (i=1 ; i<=10 ; i++); ;-)
2321	26 Jan 2022	Konstantin Olchanski	Bug Report	Off-by-one in sequencer documentation
> > Shades/ghosts of FORTRAN. c/c++/perl/python loops loop from 0 to n-1. > > for (i=1 ; i<=10 ; i++); ;-) Similar code made big news just recently: (scroll down to the example main() program) https://blog.qualys.com/vulnerabilities-threat-research/2022/01/25/pwnkit-local-privilege-escalation- vulnerability-discovered-in-polkits-pkexec-cve-2021-4034 I forget if the FORTRAN rules were "loop once" or "never loop" or if it was different between Fortran-4, fortran-77, DEC extensions and IBM extension, or if it was a compiler switch. We should check that we do something reasonable with such loops to zero: LOOP n,0 MESSAGE $n,1 ENDLOOP P.S. Yup. "man g77" option "-fonetrip". K.O.
2322	26 Jan 2022	Konstantin Olchanski	Bug Report	Writting MIDAS Events via FPGAs
> > > Any error messages printed by the frontend? any error message in midas.log? core dumps? crashes? > > I do not understand what you mean by "did not get the data into midas". You create events > > and send them to a midas event buffer and you do not see them there? With mdump? > > Do you see this both connected locally and connected remotely through the mserver? > > I simply don't see the event counter counting up and I also don't see them using mdump. No logs, no dumps and no crashes - every is quite. I only tested it locally. > If you are connected locally (no mserver), I want to know the value returned by bm_send_event(). Simplest if you edit mfe.c and everywhere it calls bm_send_event() and rpc_send_event(), print the returned value. It would be very interesting to see if bm_send_event() returns 1 (SUCCESS), but the event vanishes without a trace. Before you do that, try something simpler: Run "mdump -s -d", it will print some event buffer internals. Watch to see if any data pointers change when you send your events ("wp", "rp", etc). If nothing changes at all, then we are not sending anything (fault is in your code or on mfe.c). If you see "wp" counting up, then we definitely write your events into the buffer and mdump & mlogger should see them. But there is some funny logic for event_id and trigger_mask and it is worth checking their values. For a good test, set event_id=1 and trigger_mask=0x1. There might be trouble if either is set to zero. K.O.
2323	26 Jan 2022	Konstantin Olchanski	Bug Report	Unknown Error 319 from client
> I’m trying to run MIDAS using a frontend code/client named “fetiglab”. Run stops > after 2/3sec with an error saying “Unknown error 319 from client “fetiglab” on > localhost. actually run never starts. > 11:46:32 [fetiglab,ERROR] [odb.cxx:11268:db_get_record,ERROR] struct size > mismatch for "/" (expected size: 1, size in ODB: 41920) this is the error that causes run start to fail. for reasons unknown your frontend is trying to do a db_get_record() from "/" (ODB root top directory). if this is an mfe.c frontend, I do not think I have ever seen it do something like this. so, a puzzle. K.O.
2325	26 Jan 2022	Marius Koeppel	Bug Report	Writting MIDAS Events via FPGAs
> If you are connected locally (no mserver), I want to know the value returned by bm_send_event(). Simplest > if you edit mfe.c and everywhere it calls bm_send_event() and rpc_send_event(), print the returned value. > > It would be very interesting to see if bm_send_event() returns 1 (SUCCESS), but the event vanishes > without a trace. I checked bm_send_event(rbh, (EVENT_HEADER*)(&pdata[0]), 0, 20); which gives me back 1. I also check the status of rb_increment_wp which is also 1. > Before you do that, try something simpler: > Run "mdump -s -d", it will print some event buffer internals. > Watch to see if any data pointers change when you send your events ("wp", "rp", etc). "rp" & "wp" are not counting up. > But there is some funny logic for event_id and trigger_mask and it is worth checking their > values. For a good test, set event_id=1 and trigger_mask=0x1. There might be trouble if either is set to zero. Changing both to 0x1 did not change the behavior. Cheers, Marius
2326	28 Jan 2022	Stefan Ritt	Bug Report	Writting MIDAS Events via FPGAs
I finally got the dummy program working. There were several issues: - event_buffer_size was defined as 10000 * 32 MB = 320 GB, exceeding the RAM of the computer - SERIAL number starting with 1. Actually in midas, event serial numbers always started with zero, but this was wrong in the documentation at https://midas.triumf.ca/MidasWiki/index.php/Event_Structure, so I also fixed the documentation - the event header time stamp must be seconds since 1.1.1970, and thus the function ss_time() should be used to set it - calling set_equipment_status() for each event slows down the event collection considerably, since this function access the ODB each time - dma_buf_dummy is defined inside the event loop, so it gets allocated and de-allocated on the stack for each event. Of course this might vanish when the real FPGA buffer will be used. - The line pdata+=sizeof(dma_buf_dummy); is wrong. pdata is pointer to uint32_t, but the sizeof() operation returns the size of the dma_buf_dummy in bytes. Therefore, pdata gets incremented by four times the size of dma_buf_dummy - Instead the call to std::this_thread::sleep_for(std::chrono::milliseconds(2000)); one can call the standard midas call ss_sleep(2000); which is a bit shorter - Finally, sending many events to the ring buffer triggered a bug in the midas ring buffer functions which were lingering there since 2007. I'm glad that this happened and now could be fixed. Not sure if other experiments where affected in the last decade by that. This could have manifested itself in lost events or crashing front-ends. Anyhow, now it's fixed. You need to update midas to get the fix. I attached a working version of the dummy program for your reference. Banks a different but the principle should become clear. Stefan
Attachment 1: dummy_fe.cpp
/******************************************************************\ Name: dummy_fe.cxx Created by: Frederik Wauters Changed by: Marius Koeppel Contents: Dummy frontend producing stream data \*****************************************************************/ #include <algorithm> #include <math.h> #include <random> #include <stdio.h> #include <stdlib.h> #include <bitset> #include <iostream> #include <unistd.h> #include "midas.h" #include "msystem.h" #include <chrono> #include <thread> #include "mfe.h" using namespace std; /-- Globals -------------------------------------------------------/ / The frontend name (client name) as seen by other MIDAS clients / const char frontend_name = "Dummy FE SWB"; /* The frontend file name, don't change it / const char frontend_file_name = __FILE__; /* frontend_loop is called periodically if this variable is TRUE / BOOL frontend_call_loop = FALSE; / a frontend status page is displayed with this frequency in ms / INT display_period = 0; / maximum event size produced by this frontend / INT max_event_size = 1 (1024 * 1024);// 32MB /* maximum event size for fragmented events (EQ_FRAGMENTED) / INT max_event_size_frag = 5 1024 * 1024; /* buffer size to hold events / INT event_buffer_size = 2 max_event_size; /-- Function declarations -----------------------------------------/ INT read_stream_thread(void param); uint32_t generate_random_pixel_hit_swb(uint32_t time_stamp); uint32_t generate_random_beam_ref_hit(uint32_t time_stamp, uint32_t chipID); BOOL equipment_common_overwrite = TRUE;//true is overwriting the common odb / DMA Buffer and related / volatile uint32_t dma_buf; #define MUDAQ_DMABUF_DATA_ORDER 25 // 29, 25 for 32 MB #define MUDAQ_DMABUF_DATA_LEN (1 << MUDAQ_DMABUF_DATA_ORDER)// in bytes size_t dma_buf_size = MUDAQ_DMABUF_DATA_LEN; uint32_t dma_buf_nwords = dma_buf_size / sizeof(uint32_t); /-- Equipment list ------------------------------------------------/ EQUIPMENT equipment[] = { { "Stream SWB", /* equipment name / {1, 0, / event ID, trigger mask / "SYSTEM", / event buffer / EQ_USER, / equipment type / 0, / event source / "MIDAS", / format / TRUE, / enabled / RO_RUNNING, / read always and update ODB / 100, / poll for 100ms / 0, / stop run after this event limit / 0, / number of sub events / 0, / log history every event / "", "", ""}, NULL, / readout routine / }, {""}}; /-- Dummy routines ------------------------------------------------/ INT poll_event(INT source, INT count, BOOL test) { return 1; }; INT interrupt_configure(INT cmd, INT source, POINTER_T adr) { return 1; }; /-- Frontend Init -------------------------------------------------/ INT frontend_init() { // create ring buffer for readout thread create_event_rb(0); // create readout thread ss_thread_create(read_stream_thread, NULL); return CM_SUCCESS; } /-- Frontend Exit -------------------------------------------------/ INT frontend_exit() { return CM_SUCCESS; } /-- Frontend Loop -------------------------------------------------/ INT frontend_loop() { return CM_SUCCESS; } /-- Begin of Run --------------------------------------------------/ INT begin_of_run(INT run_number, char error) { return CM_SUCCESS; } /-- End of Run ----------------------------------------------------/ INT end_of_run(INT run_number, char error) { return CM_SUCCESS; } /-- Pause Run -----------------------------------------------------/ INT pause_run(INT run_number, char error) { return CM_SUCCESS; } /-- Resume Run ----------------------------------------------------/ INT resume_run(INT run_number, char error) { return CM_SUCCESS; } /------------------------------------------------------------------/ uint32_t generate_random_pixel_hit_swb(uint32_t time_stamp) { uint32_t tot = rand() % 32; // 0 to 31 uint32_t chipID = rand() % 3;// 0 to 2 uint32_t col = rand() % 256; // 0 to 256 uint32_t row = rand() % 250; // 0 to 250 uint32_t hit = (time_stamp << 28) \| (chipID << 22) \| (row << 14) \| (col << 6) \| (tot << 1); // if ( print ) { // printf("ts:%8.8x,chipID:%8.8x,row:%8.8x,col:%8.8x,tot:%8.8x\n", time_stamp,chipID,row,col,tot); // printf("hit:%8.8x\n", hit); // std::cout << std::bitset<32>(hit) << std::endl; // } if (((hit >> 22) & 0x3f) > 2) printf("Hit %8.8x", hit); if (chipID > 2) printf("ChipID %8.8x", chipID); return hit; } uint32_t generate_random_beam_ref_hit(uint32_t time_stamp, uint32_t chipID) { uint32_t fastTS = rand() % 4194303 / 2;// 0 to 4194303 uint32_t hit = (time_stamp << 28) \| (chipID << 22) \| (fastTS << 1); if (((hit >> 22) & 0x3f) > 4) { printf("Hit Ref %8.8x\n", hit); printf("Ref fast %8.8x\n", fastTS); printf("ChipID Ref %8.8x\n", chipID); printf("Time Ref %8.8x\n", time_stamp); printf("Chip %8.8x\n", ((hit >> 22) & 0x3f)); } return hit; } INT read_stream_thread(void param) { uint32_t pdata; // init bank structure - 64bit alignment uint32_t SERIAL = 0; // tell framework that we are alive signal_readout_thread_active(0, TRUE); // obtain ring buffer for inter-thread data exchange int rbh = get_event_rbh(0); int status; int nEvents = 5000; size_t eventSize = 32; // in 4-byte words size_t dmaBufSize = nEvents eventSize * sizeof(uint32_t); // buffer size in bytes uint32_t * dma_buf_dummy = (uint32_t ) malloc(dmaBufSize); while (is_readout_thread_enabled()) { // don't readout events if we are not running if (!readout_enabled()) { // do not produce events when run is stopped ss_sleep(10);// don't eat all CPU continue; } // obtain buffer space with 10 ms timeout status = rb_get_wp(rbh, (void ) &pdata, 10); // just try again if buffer has no space if (status == DB_TIMEOUT) continue; if (status != DB_SUCCESS) { cout << "!DB_SUCCESS" << endl; break; } for (int i = 0; i < nEvents; i++) { // event header dma_buf_dummy[ 0 + i eventSize] = 0x00000001; // Trigger Mask & Event ID dma_buf_dummy[ 1 + i * eventSize] = SERIAL++; // Serial number dma_buf_dummy[ 2 + i * eventSize] = ss_time(); // time dma_buf_dummy[ 3 + i * eventSize] = eventSize * 4 - 4 * 4;// event size dma_buf_dummy[ 4 + i * eventSize] = eventSize * 4 - 6 * 4;// all bank size dma_buf_dummy[ 5 + i * eventSize] = 0x31; // flags // bank PCD0 first FEB dma_buf_dummy[ 6 + i * eventSize] = 'P' << 0 \| 'C' << 8 \| 'D' << 16 \| '0' << 24;// bank name dma_buf_dummy[ 7 + i * eventSize] = 0x06; // bank type TID_DWORD dma_buf_dummy[ 8 + i * eventSize] = 10 * 4; // data size dma_buf_dummy[ 9 + i * eventSize] = 0x0; // reserved dma_buf_dummy[10 + i * eventSize] = 0xE80000BC; // preamble dma_buf_dummy[11 + i * eventSize] = 0x00000000; // TS0 dma_buf_dummy[12 + i * eventSize] = ss_time(); // TS1 dma_buf_dummy[13 + i * eventSize] = 0xFC000000; // sub header dma_buf_dummy[14 + i * eventSize] = generate_random_pixel_hit_swb(ss_time()); // hit0 dma_buf_dummy[15 + i * eventSize] = generate_random_pixel_hit_swb(ss_time()); // hit1 dma_buf_dummy[16 + i * eventSize] = generate_random_beam_ref_hit(ss_time(), 3);// chip 3 beam ref bits 22:1 -> fast TS dma_buf_dummy[17 + i * eventSize] = generate_random_beam_ref_hit(ss_time(), 4);// chip 4 sintilator bits 22:1 -> fast TS dma_buf_dummy[18 + i * eventSize] = 0xFC00009C; // TRAILER dma_buf_dummy[19 + i * eventSize] = 0xAFFEAFFE; // PADDING // bank PCD1 second FEB dma_buf_dummy[20 + i * eventSize] = 'P' << 0 \| 'C' << 8 \| 'D' << 16 \| '1' << 24;// bank name dma_buf_dummy[21 + i * eventSize] = 0x6; // bank type TID_DWORD dma_buf_dummy[22 + i * eventSize] = 8 * 4; // data size dma_buf_dummy[23 + i * eventSize] = 0x0; // reserved dma_buf_dummy[24 + i * eventSize] = 0xE80001BC; // preamble dma_buf_dummy[25 + i * eventSize] = 0x00000000; // TS0 dma_buf_dummy[26 + i * eventSize] = ss_time(); // TS1 dma_buf_dummy[27 + i * eventSize] = 0xFC000000; // sub header dma_buf_dummy[28 + i * eventSize] = generate_random_pixel_hit_swb(ss_time());// hit0 dma_buf_dummy[29 + i * eventSize] = generate_random_pixel_hit_swb(ss_time());// hit1 dma_buf_dummy[30 + i * eventSize] = 0xFC00009C; // TRAILER dma_buf_dummy[31 + i * eventSize] = 0xAFFEAFFE; // PADDING } memcpy(pdata, dma_buf_dummy, dmaBufSize); // print data if (true) { auto eh = (EVENT_HEADER ) (&pdata[0]); auto bh = (BANK_HEADER ) (&pdata[4]); auto ba = (BANK32A ) (&pdata[6]); char bank_name[5]; bank_name[4] = 0; memcpy(bank_name, (char *) (ba->name), 4); printf("EID=%4.4x TM=%4.4x SERNO=%8.8x TS=%8.8x EDsiz=%8.8x\n", eh->event_id, eh->trigger_mask, eh->serial_number, eh->time_stamp, eh->data_size); printf("DAsiz=%8.8x FLAG=%8.8x\n", bh->data_size, bh->flags); printf("BAname=%s TYP=%8.8x BAsiz=%8.8x BAres=%8.8x\n", bank_name, ba->type, ba->data_size, ba->reserved); } rb_increment_wp(rbh, dmaBufSize);// in byte length ss_sleep(300);// limit data rate } free(dma_buf_dummy); return 0; }
2336	10 Feb 2022	Stefan Ritt	Bug Report	History plots deceiving users into thinking data is still logging
The problem has been fixed on commit 825935dc on Oct. 2021 and runs fine since then at PSI. If TRIUMF people agree, we can close that issue and proceed. Stefan
2337	11 Feb 2022	Alexey Kalinin	Bug Report	some frontend kicked by cm_periodic_tasks
Thanks for the answer. As soon as I can(possible in a month) I'll try suggestion below: > One thing to try is set the write cache size to zero and see if your crash goes away. I see > some indication of something rotten in the event buffer code if write cache is enabled. This > is set in ODB "/Eq/XXX/Common/Write Cache Size", set it to zero. (beware recent confusion > where odb settings have no effect depending on value of "equipment_common_overwrite"). I tried to change this ODB for one of the frontend via mhttpd/browser, and eventually it goes back to default value (1000 as I remember). but this frontend has the minimum rate 50DWORD/~10sec. and depending on cashe size it appears in mdump once per 31 events but all aff them . SO its different story, but m.b. it has the same solution to play with Write Cashe Size. double free message goes from mserver terminal. all of the frontends are remote. I can't exclude crashes of frontend , but when I run ./frontend -i 1(2,3 etc) thet means that I run one code for all, and only several causes crash.also I found that crash in frontend happened while it do nothing with collected data (last event reached and new data is not ready), but it tries to watch for the ODB changes.I mean it crashes iside (while {odb_changes(value in watchdog)}),and I don't know what else happenned meanwhile with cahed buffer. Future plans is to use event buider for frontends when data/signals will be perfectly reasonable i/e/ without broken events. for now i kinda worry about if one of frontends will skip one of the event inside its buffer. Thanks for the way to dig into. A. > > The problem is that eventually some of frontend closed with message > > :19:22:31.834 2021/12/02 [rootana,INFO] Client 'Sample Frontend38' on buffer > > 'SYSMSG' removed by cm_periodic_tasks because process pid 9789 does not exist > > This messages means what it says. A client was registered with the SYSMSG buffer and this > client had pid 9789. At some point some other client (rootana, in this case) checked it and > process pid 9789 was no longer running. (it then proceeded to remove the registration). > > There is 2 possibilities: > - simplest: your frontend has crashed. best to debug this by running it inside gdb, wait for > the crash. > - unlikely: reported pid is bogus, real pid of your frontend is different, the client > registration in SYSMSG is corrupted. this would indicate massive corruption of midas shared > memory buffers, not impossible if your frontend misbehaves and writes to random memory > addresses. ODB has protection against this (normally turned off, easy to enable, set ODB > "/experiment/protect odb" to yes), shared memory buffers do not have protection against this > (should be added?). > > Do this. When you start your frontend, write down it's pid, when you see the crash message, > confirm pid number printed is the same. As additional test, run your frontend inside gdb, > after it crashes, you can print the stack trace, etc. > > > > > in the meantime mserver loggging : > > mserver started interactively > > mserver will listen on TCP port 1175 > > double free or corruption (!prev) > > double free or corruption (!prev) > > free(): invalid next size (normal) > > double free or corruption (!prev) > > > > Are these "double free" messages coming from the mserver or from your frontend? (i.e. you run > them in different terminals, not all in the same terminal?). > > If messages are coming from the mserver, this confirms possibility (1), > except that for frontends connected remotely, the pid is the pid of the mserver, > and what we see are crashes of mserver, not crashes of your frontend. These are much harder to > debug. > > You will need to enable core dumps (ODB /Experiment/Enable core dumps set to "y"), > confirm that core dumps work (i.e. "killall -SEGV mserver", observe core files are created > in the directory where you started the mserver), reproduce the crash, run "gdb mserver > core.NNNN", run "bt" to print the stack trace, post the stack trace here (or email to me > directly). > > > > > I can find some correlation between number of events/event size produced by > > frontend, cause its failed when its become big enough. > > > > There is no limit on event size or event rate in midas, you should not see any crash > regardless of what you do. (there is a limit of event size, because an event has > to fit inside an event buffer and event buffer size is limited to 2 GB). > > Obviously you hit a bug in mserver that makes it crash. Let's debug it. > > One thing to try is set the write cache size to zero and see if your crash goes away. I see > some indication of something rotten in the event buffer code if write cache is enabled. This > is set in ODB "/Eq/XXX/Common/Write Cache Size", set it to zero. (beware recent confusion > where odb settings have no effect depending on value of "equipment_common_overwrite"). > > > > > frontend scheme is like this: > > > > Best if you use the tmfe c++ frontend, event data handling is much simpler and we do not > have to debug the convoluted old code in mfe.c. > > K.O. > > > > > poll event time set to 0; > > > > poll_event{ > > //if buffer not transferred return (continue cutting the main buffer) > > //read main buffer from hardware > > //buffer not transfered > > } > > > > read event{ > > // cut the main buffer to subevents (cut one event from main buffer) return; > > //if (last subevent) {buffer transfered ;return} > > } > > > > What is strange to me that 2 frontends (1 per remote pc) causing this. > > > > Also, I'm executing one FEcode with -i # flag , put setting eventid in > > frontend_init , and using SYSTEM buffer for all. > > > > Is there something I'm missing? > > Thanks. > > A.
2347	16 Feb 2022	Marius Koeppel	Bug Report	Writting MIDAS Events via FPGAs
I just came back to this and started to use the dummy frontend. Unfortunately, I have a problem during run cycles: Starting the frontend and starting a run works fine -> seeing events with mdump and also on the web GUI. But when I stop the run and try to start the next run the frontend is sending no events anymore. It get stuck at line 221 (if (status == DB_TIMEOUT)). I tried to reduce the nEvents to 1 which helped in terms of DB_TIMEOUT but still I don't get any events after I did a stop / start cycle -> no events in mdump and no events counting up at the web GUI. If I kill the frontend in the terminal (ctrl+c) and restart it, while the run is still running, it starts to send events again. Cheers, Marius
2349	03 Mar 2022	Stefan Ritt	Bug Report	Writting MIDAS Events via FPGAs
> Starting the frontend and starting a run works fine -> seeing events with mdump and also on the web GUI. > But when I stop the run and try to start the next run the frontend is sending no events anymore. > It get stuck at line 221 (if (status == DB_TIMEOUT)). > I tried to reduce the nEvents to 1 which helped in terms of DB_TIMEOUT but still I don't get any events after I did a stop / start cycle -> no events in mdump and no events counting up at the web GUI. > If I kill the frontend in the terminal (ctrl+c) and restart it, while the run is still running, it starts to send events again. This problem has (likely) been fixed in the current version. Please pull develop and try again. Was a recursive call to the event collection routine which is only triggered if you send events faster than the logger can digest, so not many people see it. Best, Stefan
2352	07 Mar 2022	Marius Koeppel	Bug Report	Writting MIDAS Events via FPGAs
> This problem has (likely) been fixed in the current version. Please pull develop and try again. Was a recursive call to the event collection routine which is only triggered if you send events faster than > the logger can digest, so not many people see it. I just pulled the current version (d945fa9) but the problem as explained in 2347 stays the same. Best, Marius
2353	10 Mar 2022	Gennaro Tortone	Bug Report	Python ODB watch
Hi, I have an issue with ODB watch on MIDAS Python library; I wrote a simple frontend that read/write FPGA registers through ODB keys (simplified version at link below): https://gist.github.com/gtortone/cd035a9ac4ea7a78ea9cd931e80e2c75 Everything works fine but there is a boolean array in Settings (Enable ADC sampling) that I need to "toggle" (19 bit to 0 and 19 bit to 1). This operation is handled by detailed_settings_changed_func that write the value of toggled bit to FPGA. The issue is that if I quickly toggle the boolean array by odbedit: set "/Equipment/odbtest/Settings/Enable ADC sampling[0-18]" 0 set "/Equipment/odbtest/Settings/Enable ADC sampling[0-18]" 1 I see in the Python script the following list of callbacks: detailed_settings_changed_func: /Equipment/odbtest/Settings/Enable ADC sampling[0] - new value 0 detailed_settings_changed_func: /Equipment/odbtest/Settings/Enable ADC sampling[1] - new value 0 detailed_settings_changed_func: /Equipment/odbtest/Settings/Enable ADC sampling[2] - new value 0 detailed_settings_changed_func: /Equipment/odbtest/Settings/Enable ADC sampling[3] - new value 0 detailed_settings_changed_func: /Equipment/odbtest/Settings/Enable ADC sampling[4] - new value 0 detailed_settings_changed_func: /Equipment/odbtest/Settings/Enable ADC sampling[5] - new value 0 detailed_settings_changed_func: /Equipment/odbtest/Settings/Enable ADC sampling[6] - new value 0 detailed_settings_changed_func: /Equipment/odbtest/Settings/Enable ADC sampling[7] - new value 0 detailed_settings_changed_func: /Equipment/odbtest/Settings/Enable ADC sampling[8] - new value 1 *** detailed_settings_changed_func: /Equipment/odbtest/Settings/Enable ADC sampling[9] - new value 1 detailed_settings_changed_func: /Equipment/odbtest/Settings/Enable ADC sampling[10] - new value 1 detailed_settings_changed_func: /Equipment/odbtest/Settings/Enable ADC sampling[11] - new value 1 detailed_settings_changed_func: /Equipment/odbtest/Settings/Enable ADC sampling[12] - new value 1 detailed_settings_changed_func: /Equipment/odbtest/Settings/Enable ADC sampling[13] - new value 1 detailed_settings_changed_func: /Equipment/odbtest/Settings/Enable ADC sampling[14] - new value 1 detailed_settings_changed_func: /Equipment/odbtest/Settings/Enable ADC sampling[15] - new value 1 detailed_settings_changed_func: /Equipment/odbtest/Settings/Enable ADC sampling[16] - new value 1 detailed_settings_changed_func: /Equipment/odbtest/Settings/Enable ADC sampling[17] - new value 1 detailed_settings_changed_func: /Equipment/odbtest/Settings/Enable ADC sampling[18] - new value 1 detailed_settings_changed_func: /Equipment/odbtest/Settings/Enable ADC sampling[0] - new value 1 detailed_settings_changed_func: /Equipment/odbtest/Settings/Enable ADC sampling[1] - new value 1 detailed_settings_changed_func: /Equipment/odbtest/Settings/Enable ADC sampling[2] - new value 1 detailed_settings_changed_func: /Equipment/odbtest/Settings/Enable ADC sampling[3] - new value 1 detailed_settings_changed_func: /Equipment/odbtest/Settings/Enable ADC sampling[4] - new value 1 detailed_settings_changed_func: /Equipment/odbtest/Settings/Enable ADC sampling[5] - new value 1 detailed_settings_changed_func: /Equipment/odbtest/Settings/Enable ADC sampling[6] - new value 1 detailed_settings_changed_func: /Equipment/odbtest/Settings/Enable ADC sampling[7] - new value 1 detailed_settings_changed_func: /Equipment/odbtest/Settings/Enable ADC sampling[8] - new value 1 detailed_settings_changed_func: /Equipment/odbtest/Settings/Enable ADC sampling[9] - new value 1 detailed_settings_changed_func: /Equipment/odbtest/Settings/Enable ADC sampling[10] - new value 1 detailed_settings_changed_func: /Equipment/odbtest/Settings/Enable ADC sampling[11] - new value 1 detailed_settings_changed_func: /Equipment/odbtest/Settings/Enable ADC sampling[12] - new value 1 detailed_settings_changed_func: /Equipment/odbtest/Settings/Enable ADC sampling[13] - new value 1 detailed_settings_changed_func: /Equipment/odbtest/Settings/Enable ADC sampling[14] - new value 1 detailed_settings_changed_func: /Equipment/odbtest/Settings/Enable ADC sampling[15] - new value 1 detailed_settings_changed_func: /Equipment/odbtest/Settings/Enable ADC sampling[16] - new value 1 detailed_settings_changed_func: /Equipment/odbtest/Settings/Enable ADC sampling[17] - new value 1 detailed_settings_changed_func: /Equipment/odbtest/Settings/Enable ADC sampling[18] - new value 1 It seems that the second write operation "overlaps" the first one... The same behavior is not observed using a 'watch' in odbedit... I can overcame this problem using the value of register as ODB key to avoid array of boolean... but I report this issue as "possible" bug/limitation on Python ODB watch; Cheers, Gennaro
2356	16 Mar 2022	Ben Smith	Bug Report	Python ODB watch
> It seems that the second write operation "overlaps" the first one... Hi Gennaro, In principle the same issue can happen in C++ code, but is much less likely as the callbacks get executed more quickly (partly due to C++/python in general, and partly because the python code does some extra work to make the interface more user-friendly). The C++ code at the end of this message adds a 100ms sleep to the callback and can result in output like this when you do quick edits of "Test[0-19]" in odbedit. Element 1 is 0 Element 2 is 0 Element 3 is 0 Element 4 is 0 Element 5 is 0 Element 6 is 0 Element 7 is 1 Element 8 is 1 Element 9 is 1 etc... I agree that this can be a really nasty source of bugs if you need to react to every change. I'll add a warning to the python docstrings, but I can't think of a way to make this more robust at the midas level - I think we'd need some sort of ODB "snapshot" system... #include "midas.h" void watch_fn(HNDLE hDB, HNDLE hKey, int index, void info) { DWORD data = 0; INT buf_size = sizeof(data); db_get_data_index(hDB, hKey, &data, &buf_size, index, TID_DWORD); printf("Element %d is %u\n", index, data); ss_sleep(100); } int main() { HNDLE hDB, hClient, hTestKey; std::string host, expt; cm_get_environment(&host, &expt); cm_connect_experiment(host.c_str(), expt.c_str(), "test_odb", nullptr); cm_get_experiment_database(&hDB, &hClient); static const DWORD numValues = 20; DWORD data[numValues] = {}; db_set_value(hDB, 0, "Test", data, sizeof(DWORD) numValues, numValues, TID_DWORD); db_find_key(hDB, 0, "Test", &hTestKey); db_watch(hDB, hTestKey, watch_fn, nullptr); printf("Press any key to exit loop...\n"); while (!ss_kbhit()) { cm_yield(1); } db_unwatch_all(); db_delete_key(hDB, hTestKey, FALSE); cm_disconnect_experiment(); return 0; }
2357	21 Mar 2022	Stefan Ritt	Bug Report	Python ODB watch
What you describe is a well-known problem with the ODB. At PSI we have similar issues. There are two approaches to solve it: 1) Write values one-by-one to the ODB, but do not trigger a watch update. In the sequencer, this can be achieved with the ODBSET command (see https://daq00.triumf.ca/MidasWiki/index.php/Sequencer and the last paragraph right of the ODBSET command). You use notify=0 for all set commands except the last one where you use notify=1. On the C++ API, you can use db_set_data_index1() which has this notify flag as the last parameter. 2) You add intelligence to your front-end. If you get a watchdog update, you do not apply this directly to the hardware, but put it into a FIFO. Once you do not get any more update for a certain period (like 1s is a good value), you empty the FIFO and apply all setting immediately. Both methods have been used at PSI successfully, although 1) is much easier to implement, especially if you use the midas sequencer. Stefan
2370	24 Mar 2022	Konstantin Olchanski	Bug Report	data missing in runXXXXXX.mid
> > It would be good to pin point there the data is lost. This is the sequence: > > > > frontend user code -> mfe.c code -> SYSTEM buffer -> mlogger -> disk > > > > To see if correct data arrives to the SYSTEM buffer, run: > > mdump -z SYSTEM > > > > To see if mlogger is receiving events from the SYSTEM buffer, run: > > mlogger -v ### mlogger should report all events, history and data > > > > To see if mlogger writes events to disk, examine the disk file (in this case, you already did, data is not there). > > > > I would guess that your data does not make it out from the frontend (mdump shows "nothing"), > > if data were to arrive into the SYSTEM buffer, it would make it to disk, unless > > mlogger is misconfigured (but you already checked that). > > > > If you have trouble with the frontend framework code, you can try to switch from the mfe.c frontend > > to the newer c++ tmfe frontend (see progs/fetest_tmfe.cxx and progs/fetest_tmfe_thread.cxx). > > > > K.O. > > Good evening > > I tried to reproduce the behavior in a very simple FE but it did not work out. > The next thing for me would be to take the FE that is producing this behavior, > replace all the device communication and data with dummies. If the problem is still > there I would start to simplify as much as possible. > > Following the inputs of KO, I pin-pointed the data loss. The system buffer still > gets the data but the mlogger does not write the data event. Then of course the data > is also not anymore present in the data file. Therefore, I checked the logger > settings again, Event ID and Trigger Mask still -1. Nothing else, at least from my point of view, > that is misconfigured. Nevertheless, if it helps I can send my ODB settings. > > When doing the tests just before I found something else that probably > can give a hint to the problem. The data is only lost if the time between > two runs is long (a few seconds). As an example: If I run a sequence with a loop > and after the FE stops the run the loop ends and the next run is started automatically, > then only the first run has no data, which is the one after a longer time of > no data taking. When I add a "WAIT Seconds 5" after the run before starting > the next, not data is written to the disk for any run. I also found this > once when adding a sleep(1) at the end of the FE readout function > but back then did not think about it any further. > Looks like this problem fell into the covid crack. As far as I know, MIDAS does not lose any events between bm_send_event() and the shared memory buffer. It does not lose any events in the mlogger (unless the "event request" is misconfigured). (there is lots of opportunity to lose events in complicated frontends). If you have some evidence otherwise, I would very much like to hear about it and I want to fix all problems that cause it. In your previous report I was under the impression that you lose random events here and there, but your latest report is about mlogger not writing anything at all. Which case is it? If you can definitely say that all your events make it to the SYSTEM buffer but mlogger sometimes does not see some of them and sometimes does not see all of them, we should look very closely at bm_receive_event() and mlogger itself. In the case where mlogger is not seeing any events at all (output file is empty), as this is happening, I would like to see the output of mdump (to confirm events are written to SYSTEM buffer with correct event_id and trigger_mask) and the output of (say) "manalyzer_test.exe --dump run01161.mid.lz4" on your output file. If the output is very long, you can email it to me directly instead of posting it here. K.O.
2372	24 Mar 2022	Stefan Ritt	Bug Report	data missing in runXXXXXX.mid
One idea: we should have a look at mlogger::close_channels(). There the SYSTEM buffer is emptied through the cm_yield() call. Instrumenting this with some debugging code will enlighten us. Another possible problem: If the frontend requested to be notified for a run stop AFTER the logger, then the problem might happen: Logger closes file, and THEN the frontend flushes events ending up in the SYSTEM buffer and being logged at the beginning of the next run. The mfe.cxx framework takes care of this by calling cm_register_transition(TR_SOP, 500); while the mlogger does cm_register_transition(TR_STOP, tr_stop, 800); and since 800 > 500 the logger will be called AFTER the frontend. If one use a framework different from mfe.cxx, this could however be different. Stefan
2373	24 Mar 2022	Konstantin Olchanski	Bug Report	data missing in runXXXXXX.mid
> One idea: we should have a look at mlogger::close_channels(). > There the SYSTEM buffer is emptied through the cm_yield() call. > Instrumenting this with some debugging code will enlighten us. right. this will "last few events are lost at the end of run". but that code in the mlogger was not touched in years, if there is a problem there, we would have seen it by now, most experiments check that the number of events in the data file is same as number of triggers generated, both numbers are shown on the midas status page. > Another possible problem: If the frontend requested to be notified for a run stop AFTER the logger, then the problem might happen: Logger closes file, and THEN the frontend flushes events ending up in the SYSTEM buffer and being logged at the beginning of the next run. The mfe.cxx framework takes care of this by calling > cm_register_transition(TR_STOP, 500); default sequence, both mfe.c frontend and c++ tmfe frontend: start of run: - mlogger first (configure history, open data file) - frontends last - (if any frontend fails, TR_STARTABORT is sent to mlogger to close the output file and "undo" the run start) end of run: - frontends first (must not send any events after after processing the TR_STOP RPC call, inside the TR_STOP handler, bm_flush_cache() takes care of the write cache) - mlogger last - (if any frontend fails, failure is ignored, run stops regardless) wrong order will be only if they manually change it, and whatever order they set, you see it on the midas transition page (and mtransition -v and odbedit stop now -v, etc). K.O.
2375	25 Mar 2022	Marius Koeppel	Bug Report	Writting MIDAS Events via FPGAs
I finally found the problem why the readout stops after a run transition. In my dummy frontend the serial number was not reset to zero at run start. This leads to a mismatch of the serial number in the function receive_trigger_event of mfe.cxx:1247. Which is than resulting in the problem that the function founds never a new event in all ring buffers and nothing get read out of the buffer. Nevertheless, it would be nice that the system would tell the user that there is a mismatch in the serial number (printing a warning / error etc.). Cheers, Marius
2412	20 Jun 2022	jianrun	Bug Report	Error in "midas/src/mana.cxx"
Dear Midas developers, When we are running the examples in $MIDASSYS/examples/experiment/, we meet some problems when analyzing the results: 1. When we analyze the data using the analyzer: ./analyzer -i run00001.mid -o run00001.rz , we find some bugs: " Root server listening on port 9090... Running analyzer offline. Stop with "!" [Analyzer,ERROR] [mana.cxx:1832:bor,ERROR] HBOOK support is not compiled in [Analyzer,INFO] Set run number 6 in ODB Load ODB from run 6...OK run00006.mid:2680 events, 0.00s " We think this occurs in the "midas/src/mana.cxx ". How can we solve this? 2. When we analyze the above data, an error also occurs: [Analyzer,ERROR] [odb.cxx:847:db_validate_name,ERROR] Invalid name "/Analyzer/Tests/Always true/Rate [Hz]" passed to db_create_key_wlocked: should not contain "[" We simply fixed that just by replacing the "Rate [Hz]" with "Rate" in the test_write in midas/src/mana.cxx We are curious whether you can fix the problem permanently in the next version, or we are not running the code properly. Thanks!
2414	25 Jun 2022	Joseph McKenna	Bug Report	RPC timeout for manalyzer over network
In ALPHA, I get RPC timeouts running a (reasonably heavy) analyzer on a remote machine (connected directly via a ~30 meter 10Gbe Ethernet cable) after ~5 minutes of running. If I run the analyser locally, I dont not see a timeout... gdb trace: #0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50 #1 0x00007ffff5d35859 in __GI_abort () at abort.c:79 #2 0x00005555555a2a22 in rpc_call (routine_id=11111) at /home/alpha/packages/midas/src/midas.cxx:13866 #3 0x000055555562699d in bm_receive_event_rpc (buffer_handle=buffer_handle@entry=2, buf=buf@entry=0x0, buf_size=buf_size@entry=0x0, ppevent=ppevent@entry=0x0, pvec=pvec@entry=0x7fffffffd700, timeout_msec=timeout_msec@entry=100) at /home/alpha/packages/midas/src/midas.cxx:10510 #4 0x0000555555631082 in bm_receive_event_vec (buffer_handle=2, pvec=pvec@entry=0x7fffffffd700, timeout_msec=timeout_msec@entry=100) at /home/alpha/packages/midas/src/midas.cxx:10794 #5 0x0000555555673dbb in TMEventBuffer::ReceiveEvent (this=this@entry=0x555557388b30, e=e@entry=0x7fffffffd700, timeout_msec=timeout_msec@entry=100) at /home/alpha/packages/midas/src/tmfe.cxx:312 #6 0x0000555555607b56 in ReceiveEvent (b=0x555557388b30, e=0x7fffffffd6c0, timeout_msec=100) at /home/alpha/packages/midas/manalyzer/manalyzer.cxx:1411 #7 0x000055555560d8dc in ProcessMidasOnlineTmfe (args=..., progname=<optimized out>, hostname=<optimized out>, exptname=<optimized out>, bufname=<optimized out>, event_id=<optimized out>, trigger_mask=<optimized out>, sampling_type_string=<optimized out>, num_analyze=0, writer=<optimized out>, multithread=<optimized out>, profiler=<optimized out>, queue_interval_check=<optimized out>) at /home/alpha/packages/midas/manalyzer/manalyzer.cxx:1534 #8 0x000055555560f93b in manalyzer_main (argc=<optimized out>, argv=<optimized out>) at /usr/include/c++/9/bits/basic_string.h:2304 #9 0x00007ffff5d37083 in __libc_start_main (main=0x5555555b1130 <main(int, char**)>, argc=8, argv=0x7fffffffdda8, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7fffffffdd98) at ../csu/libc-start.c:308 #10 0x00005555555b184e in _start () at /usr/include/c++/9/bits/stl_vector.h:94 Any suggestions? Many thanks

Goto page Previous 1, 2, 3 ... 125, 126, 127 ... 153, 154, 155 Next

ELOG V3.1.4-2e1708b5