ID |
Date |
Author |
Topic |
Subject |
2283
|
11 Oct 2021 |
Stefan Ritt | Info | Modification in the history logging system | A requested change in the history logging system has been made today. Previously, history values were
logged with a maximum frequency (usually once per second) but also with a minimum frequency, meaning
that values were logged for example every 60 seconds, even if they did not change. This causes a problem.
If a frontend is inactive or crashed which produces variables to be logged, one cannot distinguish between
a crashed or inactive frontend program or a history value which simply did not change much over time.
The history system was designed from the beginning in a way that values are only logged when they actually
change. This design pattern was broken since about spring 2021, see for example this issue:
https://bitbucket.org/tmidas/midas/issues/305/log_history_periodic-doesnt-account-for
Today I modified the history code to fix this issue. History logging is now controlled by the value of
common/Log history in the following way:
* Common/Log history = 0 means no history logging
* Common/Log history = 1 means log whenever the value changes in the ODB
* Common/Log history = N means log whenever the value changes in the ODB and
the previous write was more than N seconds ago
So most experiments should be happy with 0 or 1. Only experiments which have fluctuating values due to noisy
sensors might benefit from a value larger than 1 to limit the history logging. Anyhow this is not the preferred
way to limit history logging. This should be done by the front-end limiting the updates to the ODB. Most of the
midas slow control drivers have a “threshold” value. Only if the input changes by more then the threshold are
written to the ODB. This allows a per-channel “dead band” and not a per-event limit on history logging
as ‘log history’ would do. In addition, the threshold reduces the write accesses to the ODB, although that is
only important for very large experiments.
Stefan |
2290
|
15 Oct 2021 |
Stefan Ritt | Suggestion | Adding (or improving discoverability) of TID for odbset | > Creating an ODB key requires users to know the Type ID that are defined in
> https://bitbucket.org/tmidas/midas/src/develop/include/midas.h starting at line 320.
>
> I can't find any information on the Midas Wiki about these values or how to find
> them.
>
> Am I missing something obvious? Is there a way to improve how to find these values?
> Or is this not the best way to interact with the ODB?
Well, you found them in midas.h, so where is the problem?
If you want a more detailed description, just look in the midas documentation (RTFM):
https://midas.triumf.ca/MidasWiki/index.php/Midas_Data_Types
If you want a more modern interface to the ODB without these data types, look here:
https://midas.triumf.ca/MidasWiki/index.php/Odbxx
Best regards,
Stefan |
2292
|
22 Oct 2021 |
Stefan Ritt | Forum | mhttpd error | > Enable IPv6 y
Probably the IPv6 problem, see here elog:2269
I asked to turn off IPv6 by default, or at least mention this in the documentation,
but unfortunately nothing happened.
Stefan |
2295
|
25 Oct 2021 |
Stefan Ritt | Forum | Logger crash | The short term solution would be to increase the logger timeout in the ODB under
/Programs/Logger/Watchdog timeout
and set it to 6000 (one minute). But that is curing just the symptoms. It would be
interesting to understand the cause of this error. Probably the logger takes more than 10
seconds to start or stop the run. The reason could be that the history grow too big (what
we have right now in MEG II), or some disk problems. But that needs detailed debugging on
the logger side.
Stefan |
2301
|
10 Nov 2021 |
Stefan Ritt | Forum | Issue in data writing speed | Midas uses various buffers (in the frontend, at the server side before the SYSTEM buffer, the SYSTEM buffer itself, on the
logger before writing to disk. All these buffers are in RAM and have fast access, so you can fill them pretty quickly. When
they are full, the logger writes to disk, which is slower. So I believe at 2 Hz your disk can keep up with your writing
speed, but at 4 Hz (2x8MBx4=32 MB/sec) your disk starts slowing down the writing process. Now 32MB/s is pretty slow for
a disk, so I presume you have turned compression on which takes quite some time.
To verify this, disable logging. The disable compression and keep logging. Then report back here again.
> Dear all,
> I've a frontend writing a quite big bunch of data into a MIDAS bank (16bit output from a 4MP photo camera).
> I'm experiencing a writing speed problem that I don't understand. When the photo camera is triggered at a low rate (< 2 Hz)
> writing into the bank takes a very short time for each event (indeed, what I measure is the time to write and go back
> into the polling function). If I increase the rate to 4 Hz, I see that writing the first two events takes a sort time,
> but the third event takes a very long time (hundreds of ms), then again the fourth and fifth events are very fast, and
> the sixth is very slow. If I further increase the rate, every other event is very slow. The problem is not in the readout
> of the camera, because if I just remove the bank writing and keep the camera readout, the problem disappears. Can you
> explain this behavior? Is there any way to improve it?
>
> Below you can also find the code I use to copy the data from the camera buffer into the bank. If you have any suggestion
> to improve it, it would be really appreciated.
>
> Thank you very much,
> Francesco
>
>
>
> const char* pSrc = (const char*)bufframe.buf;
>
> for(int y = 0; y < bufframe.height; y++ ){
>
> //Copy one row
> const unsigned short* pDst = (const unsigned short*)pSrc;
>
> //go through the row
> for(int x = 0; x < bufframe.width; x++ ){
>
> WORD tmpData = *pDst++;
>
> *pdata++ = tmpData;
>
> }
>
> pSrc += bufframe.rowbytes;
>
> }
> |
2305
|
02 Dec 2021 |
Stefan Ritt | Bug Report | Off-by-one in sequencer documentation | > The documentation for the sequencer loop says:
>
> <quote>
> LOOP [name ,] n ... ENDLOOP To execute a loop n times. For infinite loops, "infinite"
> can be specified as n. Optionally, the loop variable running from 0...(n-1) can be accessed
> inside the loop via $name.
> </quote>
>
> In fact the loop variable runs from 1...n, as can be seen by running this exciting
> sequencer code:
>
> 1 COMMENT "Figuring out MSL"
> 2
> 3 LOOP n,4
> 4 MESSAGE $n,1
> 5 ENDLOOP
Indeed you're right. The loop variable runs from 1...n. I fixed that in the documentation.
Stefan |
2306
|
02 Dec 2021 |
Stefan Ritt | Forum | Sequencer error with ODB Inc | Thanks for reporting that bug. Indeed there was a problem in the sequencer code which I fixed now. Please try the updated develop branch.
Stefan |
2319
|
26 Jan 2022 |
Stefan Ritt | Bug Report | Off-by-one in sequencer documentation | > Shades/ghosts of FORTRAN. c/c++/perl/python loops loop from 0 to n-1.
for (i=1 ; i<=10 ; i++); ;-) |
2326
|
28 Jan 2022 |
Stefan Ritt | Bug Report | Writting MIDAS Events via FPGAs | I finally got the dummy program working. There were several issues:
- event_buffer_size was defined as 10000 * 32 MB = 320 GB, exceeding the RAM of the computer
- SERIAL number starting with 1. Actually in midas, event serial numbers always started with zero, but this was wrong in the documentation at
https://midas.triumf.ca/MidasWiki/index.php/Event_Structure, so I also fixed the documentation
- the event header time stamp must be seconds since 1.1.1970, and thus the function ss_time() should be used to set it
- calling set_equipment_status() for each event slows down the event collection considerably, since this function access the ODB each time
- dma_buf_dummy is defined inside the event loop, so it gets allocated and de-allocated on the stack for each event. Of course this might vanish
when the real FPGA buffer will be used.
- The line pdata+=sizeof(dma_buf_dummy); is wrong. pdata is pointer to uint32_t, but the sizeof() operation returns the size of the
dma_buf_dummy in bytes. Therefore, pdata gets incremented by four times the size of dma_buf_dummy
- Instead the call to std::this_thread::sleep_for(std::chrono::milliseconds(2000)); one can call the standard midas call ss_sleep(2000); which
is a bit shorter
- Finally, sending many events to the ring buffer triggered a bug in the midas ring buffer functions which were lingering there since 2007. I'm
glad that this happened and now could be fixed. Not sure if other experiments where affected in the last decade by that. This could have
manifested itself in lost events or crashing front-ends. Anyhow, now it's fixed. You need to update midas to get the fix.
I attached a working version of the dummy program for your reference. Banks a different but the principle should become clear.
Stefan |
Attachment 1: dummy_fe.cpp
|
/********************************************************************\
Name: dummy_fe.cxx
Created by: Frederik Wauters
Changed by: Marius Koeppel
Contents: Dummy frontend producing stream data
\********************************************************************/
#include <algorithm>
#include <math.h>
#include <random>
#include <stdio.h>
#include <stdlib.h>
#include <bitset>
#include <iostream>
#include <unistd.h>
#include "midas.h"
#include "msystem.h"
#include <chrono>
#include <thread>
#include "mfe.h"
using namespace std;
/*-- Globals -------------------------------------------------------*/
/* The frontend name (client name) as seen by other MIDAS clients */
const char *frontend_name = "Dummy FE SWB";
/* The frontend file name, don't change it */
const char *frontend_file_name = __FILE__;
/* frontend_loop is called periodically if this variable is TRUE */
BOOL frontend_call_loop = FALSE;
/* a frontend status page is displayed with this frequency in ms */
INT display_period = 0;
/* maximum event size produced by this frontend */
INT max_event_size = 1 * (1024 * 1024);// 32MB
/* maximum event size for fragmented events (EQ_FRAGMENTED) */
INT max_event_size_frag = 5 * 1024 * 1024;
/* buffer size to hold events */
INT event_buffer_size = 2 * max_event_size;
/*-- Function declarations -----------------------------------------*/
INT read_stream_thread(void *param);
uint32_t generate_random_pixel_hit_swb(uint32_t time_stamp);
uint32_t generate_random_beam_ref_hit(uint32_t time_stamp, uint32_t chipID);
BOOL equipment_common_overwrite = TRUE;//true is overwriting the common odb
/* DMA Buffer and related */
volatile uint32_t *dma_buf;
#define MUDAQ_DMABUF_DATA_ORDER 25 // 29, 25 for 32 MB
#define MUDAQ_DMABUF_DATA_LEN (1 << MUDAQ_DMABUF_DATA_ORDER)// in bytes
size_t dma_buf_size = MUDAQ_DMABUF_DATA_LEN;
uint32_t dma_buf_nwords = dma_buf_size / sizeof(uint32_t);
/*-- Equipment list ------------------------------------------------*/
EQUIPMENT equipment[] = {
{
"Stream SWB", /* equipment name */
{1, 0, /* event ID, trigger mask */
"SYSTEM", /* event buffer */
EQ_USER, /* equipment type */
0, /* event source */
"MIDAS", /* format */
TRUE, /* enabled */
RO_RUNNING, /* read always and update ODB */
100, /* poll for 100ms */
0, /* stop run after this event limit */
0, /* number of sub events */
0, /* log history every event */
"", "", ""},
NULL, /* readout routine */
},
{""}};
/*-- Dummy routines ------------------------------------------------*/
INT poll_event(INT source, INT count, BOOL test) {
return 1;
};
INT interrupt_configure(INT cmd, INT source, POINTER_T adr) {
return 1;
};
/*-- Frontend Init -------------------------------------------------*/
INT frontend_init() {
// create ring buffer for readout thread
create_event_rb(0);
// create readout thread
ss_thread_create(read_stream_thread, NULL);
return CM_SUCCESS;
}
/*-- Frontend Exit -------------------------------------------------*/
INT frontend_exit() {
return CM_SUCCESS;
}
/*-- Frontend Loop -------------------------------------------------*/
INT frontend_loop() {
return CM_SUCCESS;
}
/*-- Begin of Run --------------------------------------------------*/
INT begin_of_run(INT run_number, char *error) {
return CM_SUCCESS;
}
/*-- End of Run ----------------------------------------------------*/
INT end_of_run(INT run_number, char *error) {
return CM_SUCCESS;
}
/*-- Pause Run -----------------------------------------------------*/
INT pause_run(INT run_number, char *error) {
return CM_SUCCESS;
}
/*-- Resume Run ----------------------------------------------------*/
INT resume_run(INT run_number, char *error) {
return CM_SUCCESS;
}
/*------------------------------------------------------------------*/
uint32_t generate_random_pixel_hit_swb(uint32_t time_stamp) {
uint32_t tot = rand() % 32; // 0 to 31
uint32_t chipID = rand() % 3;// 0 to 2
uint32_t col = rand() % 256; // 0 to 256
uint32_t row = rand() % 250; // 0 to 250
uint32_t hit = (time_stamp << 28) | (chipID << 22) | (row << 14) | (col << 6) | (tot << 1);
// if ( print ) {
// printf("ts:%8.8x,chipID:%8.8x,row:%8.8x,col:%8.8x,tot:%8.8x\n", time_stamp,chipID,row,col,tot);
// printf("hit:%8.8x\n", hit);
// std::cout << std::bitset<32>(hit) << std::endl;
// }
if (((hit >> 22) & 0x3f) > 2)
printf("Hit %8.8x", hit);
if (chipID > 2)
printf("ChipID %8.8x", chipID);
return hit;
}
uint32_t generate_random_beam_ref_hit(uint32_t time_stamp, uint32_t chipID) {
uint32_t fastTS = rand() % 4194303 / 2;// 0 to 4194303
uint32_t hit = (time_stamp << 28) | (chipID << 22) | (fastTS << 1);
if (((hit >> 22) & 0x3f) > 4) {
printf("Hit Ref %8.8x\n", hit);
printf("Ref fast %8.8x\n", fastTS);
printf("ChipID Ref %8.8x\n", chipID);
printf("Time Ref %8.8x\n", time_stamp);
printf("Chip %8.8x\n", ((hit >> 22) & 0x3f));
}
return hit;
}
INT read_stream_thread(void *param) {
uint32_t *pdata;
// init bank structure - 64bit alignment
uint32_t SERIAL = 0;
// tell framework that we are alive
signal_readout_thread_active(0, TRUE);
// obtain ring buffer for inter-thread data exchange
int rbh = get_event_rbh(0);
int status;
int nEvents = 5000;
size_t eventSize = 32; // in 4-byte words
size_t dmaBufSize = nEvents * eventSize * sizeof(uint32_t); // buffer size in bytes
uint32_t * dma_buf_dummy = (uint32_t *) malloc(dmaBufSize);
while (is_readout_thread_enabled()) {
// don't readout events if we are not running
if (!readout_enabled()) {
// do not produce events when run is stopped
ss_sleep(10);// don't eat all CPU
continue;
}
// obtain buffer space with 10 ms timeout
status = rb_get_wp(rbh, (void **) &pdata, 10);
// just try again if buffer has no space
if (status == DB_TIMEOUT)
continue;
if (status != DB_SUCCESS) {
cout << "!DB_SUCCESS" << endl;
break;
}
for (int i = 0; i < nEvents; i++) {
// event header
dma_buf_dummy[ 0 + i * eventSize] = 0x00000001; // Trigger Mask & Event ID
dma_buf_dummy[ 1 + i * eventSize] = SERIAL++; // Serial number
dma_buf_dummy[ 2 + i * eventSize] = ss_time(); // time
dma_buf_dummy[ 3 + i * eventSize] = eventSize * 4 - 4 * 4;// event size
dma_buf_dummy[ 4 + i * eventSize] = eventSize * 4 - 6 * 4;// all bank size
dma_buf_dummy[ 5 + i * eventSize] = 0x31; // flags
// bank PCD0 first FEB
dma_buf_dummy[ 6 + i * eventSize] = 'P' << 0 | 'C' << 8 | 'D' << 16 | '0' << 24;// bank name
dma_buf_dummy[ 7 + i * eventSize] = 0x06; // bank type TID_DWORD
dma_buf_dummy[ 8 + i * eventSize] = 10 * 4; // data size
dma_buf_dummy[ 9 + i * eventSize] = 0x0; // reserved
dma_buf_dummy[10 + i * eventSize] = 0xE80000BC; // preamble
dma_buf_dummy[11 + i * eventSize] = 0x00000000; // TS0
dma_buf_dummy[12 + i * eventSize] = ss_time(); // TS1
dma_buf_dummy[13 + i * eventSize] = 0xFC000000; // sub header
dma_buf_dummy[14 + i * eventSize] = generate_random_pixel_hit_swb(ss_time()); // hit0
dma_buf_dummy[15 + i * eventSize] = generate_random_pixel_hit_swb(ss_time()); // hit1
dma_buf_dummy[16 + i * eventSize] = generate_random_beam_ref_hit(ss_time(), 3);// chip 3 beam ref bits 22:1 -> fast TS
dma_buf_dummy[17 + i * eventSize] = generate_random_beam_ref_hit(ss_time(), 4);// chip 4 sintilator bits 22:1 -> fast TS
dma_buf_dummy[18 + i * eventSize] = 0xFC00009C; // TRAILER
dma_buf_dummy[19 + i * eventSize] = 0xAFFEAFFE; // PADDING
// bank PCD1 second FEB
dma_buf_dummy[20 + i * eventSize] = 'P' << 0 | 'C' << 8 | 'D' << 16 | '1' << 24;// bank name
dma_buf_dummy[21 + i * eventSize] = 0x6; // bank type TID_DWORD
dma_buf_dummy[22 + i * eventSize] = 8 * 4; // data size
dma_buf_dummy[23 + i * eventSize] = 0x0; // reserved
dma_buf_dummy[24 + i * eventSize] = 0xE80001BC; // preamble
dma_buf_dummy[25 + i * eventSize] = 0x00000000; // TS0
dma_buf_dummy[26 + i * eventSize] = ss_time(); // TS1
dma_buf_dummy[27 + i * eventSize] = 0xFC000000; // sub header
dma_buf_dummy[28 + i * eventSize] = generate_random_pixel_hit_swb(ss_time());// hit0
dma_buf_dummy[29 + i * eventSize] = generate_random_pixel_hit_swb(ss_time());// hit1
dma_buf_dummy[30 + i * eventSize] = 0xFC00009C; // TRAILER
dma_buf_dummy[31 + i * eventSize] = 0xAFFEAFFE; // PADDING
}
memcpy(pdata, dma_buf_dummy, dmaBufSize);
// print data
if (true) {
auto *eh = (EVENT_HEADER *) (&pdata[0]);
auto *bh = (BANK_HEADER *) (&pdata[4]);
auto *ba = (BANK32A *) (&pdata[6]);
char bank_name[5];
bank_name[4] = 0;
memcpy(bank_name, (char *) (ba->name), 4);
printf("EID=%4.4x TM=%4.4x SERNO=%8.8x TS=%8.8x EDsiz=%8.8x\n", eh->event_id, eh->trigger_mask, eh->serial_number, eh->time_stamp, eh->data_size);
printf("DAsiz=%8.8x FLAG=%8.8x\n", bh->data_size, bh->flags);
printf("BAname=%s TYP=%8.8x BAsiz=%8.8x BAres=%8.8x\n", bank_name, ba->type, ba->data_size, ba->reserved);
}
rb_increment_wp(rbh, dmaBufSize);// in byte length
ss_sleep(300);// limit data rate
}
free(dma_buf_dummy);
return 0;
}
|
2335
|
10 Feb 2022 |
Stefan Ritt | Bug Fix | ODBINC/Sequencer Issue | I tried following script:
ODBSET /Equipment/ArduinoTestStation/Variables/_S_, 10
LOOP 10
WAIT seconds, 3
ODBINC /Equipment/ArduinoTestStation/Variables/_S_
ENDLOOP
and it worked as expected. So I conclude the problem must be in your script. Probably a typo in
the ODB path pointing to a 32-byte string instead to a 4-byte float.
Stefan |
2336
|
10 Feb 2022 |
Stefan Ritt | Bug Report | History plots deceiving users into thinking data is still logging | The problem has been fixed on commit 825935dc on Oct. 2021 and runs fine since then at PSI. If TRIUMF people
agree, we can close that issue and proceed.
Stefan |
2340
|
14 Feb 2022 |
Stefan Ritt | Bug Fix | ODBINC/Sequencer Issue | Just post here a minimal script which produces the error, so that I can try myself.
... and make sure that you have the latest develop version of midas.
Stefan |
2342
|
14 Feb 2022 |
Stefan Ritt | Bug Fix | ODBINC/Sequencer Issue | > I noticed that "Jacob Thorne" in the forum had the same issue as us in Novemeber last
> year. Indeed we have not installed any later versions of MIDAS since then so we will
> double check we have the latest version.
As you see from my reply to Jacob, the bug has been fixed in midas since then, so just
update.
Stefan |
2344
|
15 Feb 2022 |
Stefan Ritt | Bug Fix | ODBINC/Sequencer Issue | > But the error still persists. Is there another way to update which we are missing?
The bug was definitively fixed in this modification:
https://bitbucket.org/tmidas/midas/commits/5f33f9f7f21bcaa474455ab72b15abc424bbebf2
You probably forgot to compile/install correctly after your pull. Of you start "odbedit" and do
a "ver" you see which git revision you are currently running. Make sure to get this output:
MIDAS version: 2.1
GIT revision: Fri Feb 11 08:56:02 2022 +0100 - midas-2020-08-a-509-g585faa96 on branch
develop
ODB version: 3
Stefan |
2348
|
23 Feb 2022 |
Stefan Ritt | Info | Midas slow control event generation switched to 32-bit banks | The midas slow control system class drivers automatically read their equipment and generate events containing midas banks. So far these have been 16-bit banks using bk_init(). But now more and more experiments use large amount of channels, so the 16-bit address space is exceeded. Until last week, there was even no check that this happens, leading to unpredictable crashes.
Therefore I switched the bank generation in the drivers generic.cxx, hv.cxx and multi.cxx to 32-bit banks via bk_init32(). This should be in principle transparent, since the midas bank functions automatically detect the bank type during reading. But I thought I let everybody know just in case.
Stefan |
2349
|
03 Mar 2022 |
Stefan Ritt | Bug Report | Writting MIDAS Events via FPGAs | > Starting the frontend and starting a run works fine -> seeing events with mdump and also on the web GUI.
> But when I stop the run and try to start the next run the frontend is sending no events anymore.
> It get stuck at line 221 (if (status == DB_TIMEOUT)).
> I tried to reduce the nEvents to 1 which helped in terms of DB_TIMEOUT but still I don't get any events after I did a stop / start cycle -> no events in mdump and no events counting up at the web GUI.
> If I kill the frontend in the terminal (ctrl+c) and restart it, while the run is still running, it starts to send events again.
This problem has (likely) been fixed in the current version. Please pull develop and try again. Was a recursive call to the event collection routine which is only triggered if you send events faster than
the logger can digest, so not many people see it.
Best,
Stefan |
2355
|
16 Mar 2022 |
Stefan Ritt | Info | New midas sequencer version | A new version of the midas sequencer has been developed and now available in the
develop/seq_eval branch. Many thanks to Lewis Van Winkle and his TinyExpr library
(https://codeplea.com/tinyexpr), which has now been integrated into the sequencer
and allow arbitrary Math expressions. Here is a complete list of new features:
* Math is now possible in all expressions, such as "x = $i*3 + sin($y*pi)^2", or
in "ODBSET /Path/value[$i*2+1], 10"
* "SET <var>,<value>" can be written as "<var>=<value>", but the old syntax is
still possible.
* There are new functions ODBCREATE and ODBDLETE to create and delete ODB keys,
including arrays
* Variable arrays are now possible, like "a[5] = 0" and "MESSAGE $a[5]"
If the branch works for us in the next days and I don't get complaints from
others, I will merge the branch into develop next week.
Stefan |
2357
|
21 Mar 2022 |
Stefan Ritt | Bug Report | Python ODB watch | What you describe is a well-known problem with the ODB. At PSI we have similar issues. There are
two approaches to solve it:
1) Write values one-by-one to the ODB, but do not trigger a watch update. In the sequencer, this
can be achieved with the ODBSET command (see https://daq00.triumf.ca/MidasWiki/index.php/Sequencer
and the last paragraph right of the ODBSET command). You use notify=0 for all set commands except
the last one where you use notify=1. On the C++ API, you can use db_set_data_index1() which has
this notify flag as the last parameter.
2) You add intelligence to your front-end. If you get a watchdog update, you do not apply this
directly to the hardware, but put it into a FIFO. Once you do not get any more update for a certain
period (like 1s is a good value), you empty the FIFO and apply all setting immediately.
Both methods have been used at PSI successfully, although 1) is much easier to implement, especially
if you use the midas sequencer.
Stefan |
2358
|
22 Mar 2022 |
Stefan Ritt | Info | New midas sequencer version | After several days of testing in various experiments, the new sequencer has
been merged into the develop branch. One more feature was added. The path to
the ODB can now contain variables which are substituted with their values.
Instead writing
ODBSET /Equipment/XYZ/Setting/1/Switch, 1
ODBSET /Equipment/XYZ/Setting/2/Switch, 1
ODBSET /Equipment/XYZ/Setting/3/Switch, 1
one can now write
LOOP i, 3
ODBSET /Equipment/XYZ/Setting/$i/Switch, 1
ENDLOOP
Of course it is not possible for me to test any possible script. So if you
have issues with the new sequencer, please don't hesitate to report them
back to me.
Best,
Stefan |
2360
|
22 Mar 2022 |
Stefan Ritt | Bug Fix | fix for event buffer corruption in bm_flush_cache() | Thanks Konstantin for your detailed description.
I wonder why we never saw this problem at PSI. Here is the reason: In multil-threaded environments, we never call bm_send_event() directly
from all threads (since in the old days nothing was thread safe in midas). Instead, we use a collector thread which gets all events via the
rb_xxx functions from the individual readout threads. This is well integrated into the mfe.cxx framework. Look at examples/mtfe/mfte.cxx.
Each thread does (simplified):
while (true) {
do {
status = rb_get_wp(&pevent);
} while (status == DB_TIMEOUT)
bm_compose_event_threadsafe(pevent, ..., &serial_number);
bk_init32(pevent+1);
... fill event ...
bk_close(pevent)
rb_increment_wp(sizeof(EVENT_HEADER) + pevent->data_size);
}
The framework now collects all these events in receive_trigger_event() which runs in the main thread:
for (i=0 ; i<n_thread ; i++) {
rb_get_rp(i, pevent);
if (pevent->serial_number == prev_serial+1)
break;
}
prev_serial = pevent->serial_number;
rpc_send_event(pevent);
rb_increment_rp(sizeof(EVENT_HEADER) + pevent->data_size);
This code ensures that all events are in the right sequence (before the serial numbers where mixed up) and that all events are sent only
from a single thread, so the write buffer can be used effectively without complicated multi-thread locks.
This solution works nicely at PSI since many years, maybe you should put some thought to use it in your tmfe framework in Alpha-g as well
instead of struggling with all your locks.
Stefan |
|