28 Mar 2019, Konstantin Olchanski, Forum, assertion failed
|
For the record, I am stumped by this problem. We have definitely ruled out any data overflow inside the midas message code (there are no
long messages sent). My only guess is that the frontend itself is corrupting the midas message buffer, but this corruption
must be unlikely lucky to corrupt just the "_" character (and maybe what follows it) from the "MSG_" header inside the buffer.
If indeed this is memory corruption inside the frontend, to find and fix it, one would have to roll out valgrind and other malloc() debugging
tools and good luck...
K.O.
> > > [dfe01,INFO] Slow control equipment initialized
> > > dfe: src/midas.c:838: cm_msg_flush_buffer: Assertion `rp[3]=='_'' failed.
> > > if I remove line 838 from midas.c (fixing message length) the problem disappear...
> >
> > Thank you for reporting this problem.
> >
> > It is very strange, the check is for message start "MSG_", why "M", "S" and "G" are there
> > but "_" is missing? And you remove the check for "_" and the rest of the message is also okey?
> > Very odd.
>
> if I remove the check for "_" then the first message is empty and next messages are ok...
> If I don't remove the check the frontend fails at start and I find these lines in midas.log:
>
> 14:46:29.719 2019/03/19 [dfe01,INFO] Program dfe01 on host lxaria02 started
> 14:46:29.731 2019/03/19 [dfe01,INFO] Dome FE initialized
> 14:46:29.737 2019/03/19 [dfe01,ERROR] [system.c:4709:recv_tcp2,ERROR] unexpected connection closure
> 14:46:29.737 2019/03/19 [dfe01,ERROR] [midas.c:12814:recv_event_server,ERROR] recv_tcp2(header) returned -1
> 14:46:29.737 2019/03/19 [dfe01,ERROR] [midas.c:14699:rpc_server_receive,ERROR] recv_event_server() returned -1, abort
> 14:46:29.737 2019/03/19 [dfe01,TALK] Program 'dfe01' on host 'lxaria02' aborted
>
> > You can also add this code "assert(4+3*sizeof(int)+len < 1020)" in cm_msg_buffer() right before
> > rb_increment_wp() - it this assert fails, we definitely determine that we have a buffer overflow.
>
> I added assert you suggested in cm_msg_buffer() function before rp_increment_wp() and
> result is always the same at same line:
>
> [dfe01,INFO] Dome FE initialized
> Dome0001-rc:
> [dfe01,INFO] Slow control equipment initialized
> dfe: src/midas.c:839: cm_msg_flush_buffer: Assertion `rp[3]=='_'' failed.
> Aborted
>
> Regards,
> Gennaro |
07 Mar 2008, Randolf Pohl, Bug Report, array overflows and other bugs
|
Hi,
I have just compiled MIDAS svn 4132 on a fresh SuSE 10.3 x86_64 system and gcc
found a bunch of bugs, I guess.
> uname -a
Linux pc 2.6.22.17-0.1-default #1 SMP 2008/02/10 20:01:04 UTC x86_64 x86_64
x86_64 GNU/Linux
> gcc --version
gcc (GCC) 4.2.1 (SUSE Linux)
I see loads of warnings during compile, most of which I know from earlier
compiles:
* warning: dereferencing type-punned pointer will break strict-aliasing rules
* warning: pointer targets in passing argument 3 of 'getsockname' differ in
signedness
But then there is a new one (in fact lots of this one), and brief
inspection suggests this is a true bug with the possibility of truly
nasty consequences.
(1)=========================
src/midas.c:7398: warning: array subscript is above array bounds
Inspection of midas.c:
if (i == MAX_DEFRAG_EVENTS) {
/* no buffer available -> no first fragment received */
7398: free(defrag_buffer[i].pevent);
memset(&defrag_buffer[i].event_id, 0, sizeof(EVENT_DEFRAG_BUFFER));
cm_msg(MERROR, "bm_defragement_event",
"Received fragment without first fragment (ID %d) Ser#:%d",
pevent->event_id & 0x0FFF, pevent->serial_number);
return;
}
And midas.c line 7297:
EVENT_DEFRAG_BUFFER defrag_buffer[MAX_DEFRAG_EVENTS];
So, if(i==MAX_DEFRAG_EVENTS) free(defrag_buffer[i]).
I guess this is an off-by-one bug.
(2)==========================
src/midas.c:2958: warning: array subscript is above array bounds
for (i = 0; i < 13; i++)
2958 if (trans_name[i].transition == transition)
break;
Holy cow, hard-coded "13" in the code! Should be a #define, shouldn't it?
Now look at midas.c lines 94ff:
struct {
int transition;
char name[32];
} trans_name[] = {
{
TR_START, "START",}, {
TR_STOP, "STOP",}, {
TR_PAUSE, "PAUSE",}, {
TR_RESUME, "RESUME",}, {
TR_DEFERRED, "DEFERRED",}, {
0, "",},};
There is no trans_name[12].
The trans_name[12] shows up in line 2894 and 2790, too.
(3)=============================
mfe.c:
src/mfe.c:412: warning: array subscript is above array bounds
src/mfe.c:311: warning: array subscript is above array bounds
src/mfe.c:340: warning: array subscript is above array bounds
412: device_drv->dd(CMD_GET_DEMAND, device_drv->dd_info, i,
&device_drv->mt_buffer->channel[i].array[CMD_GET_DEMAND]);
for (cmd = CMD_GET_FIRST; cmd <= CMD_GET_LAST; cmd++) {
(..)
311: device_drv->mt_buffer->channel[current_channel].array[cmd] = value;
for (cmd = CMD_GET_FIRST; cmd <= CMD_GET_LAST; cmd++) {
(..)
340: device_drv->mt_buffer->channel[i].array[cmd] = value;
CMD_GET_DEMAND is in include/midas.h:
#define CMD_GET_DEMAND CMD_GET_DIRECT // = 20
I haven't even tried to understand mfe.c, nor did I read it.
But I suspect the thing should always be something like
....array[cmd-CMD_GET_FIRST]
so array[] is indexed from [0], not from an arbitrary number that
depends on the number of commands you insert before line 698 in
midas.h. But please could the author of this check this very carefully?
(4)=========================
src/lazylogger.c:1957: warning: array subscript is below array bounds
if ((channel < 0) && (lazyinfo[channel].hKey != 0))
That is lazyinfo[something below zero].
(5)=============================
More warnings an expert might want to have a look at:
* warning: deprecated conversion from string constant to 'char*'
* src/fal.c:106: warning: non-local variable '<anonymous struct> out_info'
uses anonymous type
* src/fal.c:3064: warning: non-local variable '<anonymous struct> eb' uses
anonymous type
I attach the full output of make.
Could someone knowledgeable please have a look at these warnings and fix them?
They make me a bit nervous when thinking about data integrity, and
there are now so many that they actually start to hide serious stuff
like the ones I presented.
Oh, I got rid of the "dereferencing type-punned pointer" thing by adding
"-fno-strict-aliasing" as a compiler flag. Was suggested on the Web. Seemed to
have worked during data taking (the data look reasonable :-). Is that a
possible fix/workaround?
Cheers,
Randolf |
07 Mar 2008, Stefan Ritt, Bug Report, array overflows and other bugs
|
> I have just compiled MIDAS svn 4132 on a fresh SuSE 10.3 x86_64 system and gcc
> found a bunch of bugs, I guess.
Ahh, great! gcc is getting more and more clever. Each time gcc is updated, it finds
a few new issues.
Indeed some are real bugs, and I will work down the list as time permits. I see
however no immediate thread (you are not using fragmented events, a transition 12
never occurs, etc.). Issue #4 from your list has to be checked by Pierre-Andre. |
10 Mar 2008, Stefan Ritt, Bug Report, array overflows and other bugs
|
There were some trivial and some non-trivial issues. Glad the compiled picked up on
this!
> I see loads of warnings during compile, most of which I know from earlier
> compiles:
> * warning: dereferencing type-punned pointer will break strict-aliasing rules
> * warning: pointer targets in passing argument 3 of 'getsockname' differ in
> signedness
I ignore these for the moment until I have a gcc 4.2 myself (we use Scientific
Linux 5 which has gcc 4.1 for the moment). As Randolph pointed out correctly you
can make gcc shut up by a proper flag there. The warnings have no influence on the
stability of midas.
> (1)=========================
> src/midas.c:7398: warning: array subscript is above array bounds
> Inspection of midas.c:
>
> if (i == MAX_DEFRAG_EVENTS) {
> /* no buffer available -> no first fragment received */
> 7398: free(defrag_buffer[i].pevent);
> memset(&defrag_buffer[i].event_id, 0, sizeof(EVENT_DEFRAG_BUFFER));
> cm_msg(MERROR, "bm_defragement_event",
> "Received fragment without first fragment (ID %d) Ser#:%d",
> pevent->event_id & 0x0FFF, pevent->serial_number);
> return;
> }
The free() was just wrong at that place, I removed it.
> (2)==========================
> src/midas.c:2958: warning: array subscript is above array bounds
>
> for (i = 0; i < 13; i++)
> 2958 if (trans_name[i].transition == transition)
> break;
Fixed that by
for (i=0 ;; i++)
if (trans_name[i].name[0] == 0 || trans_name[i].transition == transition)
break;
Since trans_name[i].name = "" indicates the end of the list.
> (3)=============================
> mfe.c:
> src/mfe.c:412: warning: array subscript is above array bounds
> src/mfe.c:311: warning: array subscript is above array bounds
> src/mfe.c:340: warning: array subscript is above array bounds
>
> 412: device_drv->dd(CMD_GET_DEMAND, device_drv->dd_info, i,
> &device_drv->mt_buffer->channel[i].array[CMD_GET_DEMAND]);
The code at 412 was wrong there, the demand value is queried later by the device
driver directly. For the other two occurences (311 and 340) I had to really
increase the array size by one. This issue can cause segfaults if you have a slow
control front-end which uses multithreading (not many people use it except me).
> (4)=========================
> src/lazylogger.c:1957: warning: array subscript is below array bounds
>
> if ((channel < 0) && (lazyinfo[channel].hKey != 0))
>
> That is lazyinfo[something below zero].
This has to be fixed by Pierre. I guess an or instead of an and would do it, but
I'm not 100% sure.
> (5)=============================
> More warnings an expert might want to have a look at:
>
> * warning: deprecated conversion from string constant to 'char*'
>
> * src/fal.c:106: warning: non-local variable '<anonymous struct> out_info'
> uses anonymous type
> * src/fal.c:3064: warning: non-local variable '<anonymous struct> eb' uses
> anonymous type
>
> I attach the full output of make.
> Could someone knowledgeable please have a look at these warnings and fix them?
Uahhh. Especially the "const char*" vs. "char*" is in principle right, but will
cause a major rework. Probably hundreds of occations have to be fixed. Many strings
must be declared const, others not. It will help the programmer to find some errors
during compile which would later show up only during runtime (like writing into a
fixed string), but I only will go through that when I have gcc 4.2 installed
myself, and have two free days to work on this ;-)
> They make me a bit nervous when thinking about data integrity, and
> there are now so many that they actually start to hide serious stuff
> like the ones I presented.
Except the slow control stuff (which only is an issue for multithreaded frontends)
none of the above things will have an influence on the data integrity. But I agree
that they should be fixed.
- Stefan |
22 May 2007, Randolf Pohl, Bug Report, analyzer_init called by odb_load
|
Hi,
I wonder why mana.c:odb_load() calls analyzer_init(). This way analyzer_init
is called TWICE or more times:
first from mana.c:mana_init(), for each invocation of the analyzer, and
second from mana.c:odb_load(), for each run to be analyzed
Isn't this a bug? It can mess up several things (like mallocs) if you don't
take the necessary precautions. Other module_init functions are correctly
called only once, before all runs are analyzed.
I have the feeling, that odb_load should NOT call analyzer_init. Or am I wrong
(probably, but please explain to me)? Do I have to live with it and make sure
that my beautiful global initialization in analyzer_init is only done once?
:-)
Cheers,
Randolf
And here is the annotated log using the ROOT example experiment
(several modules changed/added to print their respective names)
:~/midas/examples/root> ./analyzer -e exa_root -i run%05d.mid -r 1 3
analyzer_init <-- ok
Root server listening on port 9090...
adc_calib_init <-- ok
adc_summing_init <-- ok
scaler_init <-- ok
Running analyzer offline. Stop with "!"
Set run number 1 in ODB
Load ODB from run 1...
analyzer_init <-- not ok, or is it?
OK
run00001.mid:777 events, 0.00s
Set run number 2 in ODB
Load ODB from run 2...
analyzer_init <-- not ok, or is it?
OK
run00002.mid:7227 events, 0.03s
Set run number 3 in ODB
Load ODB from run 3...
analyzer_init <-- not ok, or is it?
OK
run00003.mid:13866 events, 0.06s
adc_calib_exit
adc_summing_exit
scaler_exit
analyzer_exit |
22 May 2007, Stefan Ritt, Bug Report, analyzer_init called by odb_load
|
The reason to call analyzer_init in odb_load is the following:
Assume you run the analyzer offline, analyzing many files in series. Then assume
that you have /Experiment/Run Parameters, which is actively used by the analyzer
(like beam settings etc.). In this case you do a db_open_record() to map
/Experiment/Run Parameters to the exp_param C structure. For this mapping to work,
the ODB structure and the C structure have to be exactly the same. Now assume that
you changed your run parameters over time, like you added some comment later. Now
you want to analyzer several runs, some before and some after the modification.
Both sets have a different structure in /Experiment/Run Parameters, which is a
problem, since the compiled analyzer can only have a single C structure. My "poor"
solution was to call analyzer_init after each loading of the ODB from the *.mid
file. The db_create_record() call matches the C structure to the ODB structure by
modifying the ODB structure if necessary. So if you added one parameter later, this
(modified) structure gets loaded by odb_load, but then it gets adjusted in
analyzer_init().
I understand now that this case might not happen so often, and you are more
bothered by the fact that analyzer_init gets called several time. There must
however be a hook for offline analysis that the user code can correct the ODB
structure. So I propose to add a flag to analyzer_init, such as
INT analyzer_init(BOOL bFirst)
{
}
If bFirst equals TRUE, the function got called from mana_init(), if FALSE, it got
called from odb_load. Then you can put code like
INT analyzer_init(BOOL bFirst)
{
if (bFirst) {
p = malloc()
...
}
}
If you agree, I will modify the code and commit the change.
- Stefan |
22 May 2007, Randolf Pohl, Bug Report, analyzer_init called by odb_load
|
Thanks for the quick reply, Stefan.
Please don't change anything in the code unless you find it really important. I guess
changing the analyzer_init prototype will break a lot of code out there?
In fact, I think I do understand this behavior now.
And even without your suggested fix there is a simple workaround: I add a static
variable to my analyzer_init.cxx file, and do something similar to your bFirst fix.
In conclusion, commit your fix if it does not harm others. Postpone this commit to a
future new version of midas which breaks a lot of things anyway...
A last question, for me to understand: Why not call db_open_record in
ana_begin_of_run then?
Cheers,
Randolf |
22 May 2007, Stefan Ritt, Bug Report, analyzer_init called by odb_load
|
> Thanks for the quick reply, Stefan.
>
> Please don't change anything in the code unless you find it really important.
I guess
> changing the analyzer_init prototype will break a lot of code out there?
>
> In fact, I think I do understand this behavior now.
> And even without your suggested fix there is a simple workaround: I add a static
> variable to my analyzer_init.cxx file, and do something similar to your bFirst
fix.
>
> In conclusion, commit your fix if it does not harm others. Postpone this
commit to a
> future new version of midas which breaks a lot of things anyway...
>
> A last question, for me to understand: Why not call db_open_record in
> ana_begin_of_run then?
I fully agree with you that db_open_record would better go into ana_begin_of_run
(and
analyzer_init not being called in odb_load), and I fully agree with you that
changing the
code would break many experiments. ;-)
So I guess we leave it as it is right now as you suggested. |
27 Feb 2017, William Moore, Suggestion, analyzer failing to load ODB parameters
|
Hi,
I am attempting to compile and run analysis code on a completely different,
unconnected system than the DAQ computer for the experiment. The analyzer was
developed previously and my goal is to get it running and then update it to
achieve my needs. Before compiling the analyzer, I load a backup ODB file in
odbedit, and compile experim.h. I then compile the analyzer with that experim.h
file. When I run the analyzer I get the following output:
> MIDAS version 2.1ROOT version 5.34/36Root server listening on port 9090...
> Running analyzer offline. Stop with "!"
> Configuration file "/somedir/switches.odb" loaded
> [Analyzer,INFO] Set run number 1290 in ODB
> Load ODB from run 1290...[Analyzer,INFO] cannot load value "Client Notify":
write protected
> [Analyzer,INFO] cannot load value "Prompt": write protected
.
.
.
> [Analyzer,INFO] cannot load value "LANSCE-ops": write protected
> MIDAS version 2.1ROOT version 5.34/36OK
> Configuration file "/somedir/switches.odb" loaded
> Data_Raw/run01290.mid.gz:16355 Data_Analyzed/run01290.root:15208 events, 0.43s
I have confirmed all files being used have read/write access to all users. The
analyzer does populate a .root output file with filled histograms, however not
all histograms are filled. I believe this is because histograms that relied on
an ODB paramater that failed to load did not populate. Any idea as to what I am
doing wrong or how I could resolve this issue are greatly appreciated.
Thanks,
William Moore |
04 Feb 2008, Robert Pattie, Forum, analyzer crashes at high rates
|
I'm using midas to read data from a waveform digitizer at event rates of
10-30kHz. To accomplish this the digitizer is read via Block transfers and the
raw data put into a single MIDAS event. Thus a MIDAS event could contain upto
250 physical events and at maximum 350kBytes. In the analyzer modules I had
been analyzing the first physics event contained in a MIDAS event with no
problem. Recently I tried to analyze all the physical events. At low rates,
100hz-1khz, this was no problem, 1-5 physical events in a MIDAS event. At
higher rates 10-20kHz, where there are about 40physical events per MIDAS event,
the analyzer keeps up for a few seconds then seg faults with " 'shared object
read from target memory' has disappear; keeping it symbols". Any suggestions as
to why the analyzer is crashing would be very helpful.
Thanks,
Robert |
05 Feb 2008, Stefan Ritt, Forum, analyzer crashes at high rates
|
> I'm using midas to read data from a waveform digitizer at event rates of
> 10-30kHz. To accomplish this the digitizer is read via Block transfers and the
> raw data put into a single MIDAS event. Thus a MIDAS event could contain upto
> 250 physical events and at maximum 350kBytes. In the analyzer modules I had
> been analyzing the first physics event contained in a MIDAS event with no
> problem. Recently I tried to analyze all the physical events. At low rates,
> 100hz-1khz, this was no problem, 1-5 physical events in a MIDAS event. At
> higher rates 10-20kHz, where there are about 40physical events per MIDAS event,
> the analyzer keeps up for a few seconds then seg faults with " 'shared object
> read from target memory' has disappear; keeping it symbols". Any suggestions as
> to why the analyzer is crashing would be very helpful.
I personally have never seen this error message. The analyzer is designed such that
it produces "back pressure" if the data rate is higher than the analysis rate and
you have "request all events" on. The only thing I can image are the following two
issues:
- At higher rate where you have more than 40 physical events per MIDAS event, there
is some bug in your analysis code which gets exploited only in that case. Maybe some
temporary array which is only 35 entries long or something like this.
- The back pressure mentioned above will slow down the frontend. If your computer
busy logic is not working correctly, you might get more triggers than you can
acquire. Maybe then the data gets screwed up and the analyzer chokes on it.
Finding the exact reason is not simple. For sure you have to run the analyzer inside
the debugger, to see exactly where the segfault happens. You then maybe have to
produce some dummy data in the frontend (like always sending the same event) to
disentangle some possible trigger problems from other problems.
Best regards,
Stefan |
28 Jan 2013, Robert Pattie, Forum, analyzer cannot connect to the statistics database
|
I've managed to put the analyzer into state where it cannot connect to the
statistics database. The error message suggests another analyzer is connected.
I've recompiled MIDAS and the user code, restarted the computer etc..., and the
analyzer cannot connect. If I run "odbedit -c clean", I can start the analyzer,
but get the same error when exiting or starting a run. I've commented out all the
user code in the analyzer.c and its associated analyzer module's, and read event
code in the frontend and nothing resolves this issue. Any suggestion?
The output from attempting to run the analyzer is:
Connect to experiment nnbarxwnr...[odb.c:1013:db_open_database,ERROR] Removed ODB
client 'Analyzer', index 0 because process pid 31982 does not exists
Deleted entry '/System/Clients/31982' for client 'Analyzer' because it is not
connected to ODB
OK
Root server listening on port 9090...
Loading previous online histos from ./data/last.root
ss_mutex_wait_for: pthread_mutex_lock() returned errno 22 (Invalid argument),
aborting...
When attempting to clean up the Analyzer tree in the ODB I receive the message
:"deletion of key not allowed."
It appears that running the analyzer sets the permissions of the Statistics tree of
my analyzer module into RWDE.
Adding the following lines to my start up script eliminate the above problem:
odbedit -c clean
odbedit -c "chmod 7 Analyzer/"
odbedit -c "rm /Analyzer/fADCs/Statistics"
Now when starting a run the analyzer crashes with this error:analyzer:
src/midas.c:11443: rpc_execute: Assertion `return_buffer' failed.
Aborted (core dumped)
and the messages in the odb are :
[system.c:4295:recv_tcp,ERROR] header: recv returned 0, n_received = 0, unexpected
connection closure
[midas.c:10042:rpc_client_call,ERROR] recv_tcp() failed, routine = "rc_transition",
host = "LANL-FADC-DAQ"
[midas.c:4130:cm_transition,ERROR] Could not start a run: cm_transition() status 503,
message 'Unknown error 503 from client 'Analyzer' on host LANL-FADC-DAQ'
Deleted entry '/System/Clients/1001' for client 'Analyzer' because process pid 1001
does not exists
[midas.c:8893:rpc_client_check,ERROR] Connection broken to "Analyzer" on host
LANL-FADC-DAQ
Run #180 start aborted
Error: Unknown error 503 from client 'Analyzer' on host LANL-FADC-DAQ
20:05:02 [Logger,INFO] Deleting previous file "./data/run00180.mid"
20:05:02 [ODBEdit,ERROR] [system.c:4295:recv_tcp,ERROR] header: recv returned 0,
n_received = 0, unexpected connection closure
20:05:02 [ODBEdit,ERROR] [midas.c:10042:rpc_client_call,ERROR] recv_tcp() failed,
routine = "rc_transition", host = "LANL-FADC-DAQ"
20:05:02 [ODBEdit,ERROR] [midas.c:4130:cm_transition,ERROR] Could not start a run:
cm_transition() status 503, message 'Unknown error 503 from client 'Analyzer' on host
LANL-FADC-DAQ'
20:05:02 [ODBEdit,INFO] Deleted entry '/System/Clients/1001' for client 'Analyzer'
because process pid 1001 does not exists
20:05:02 [ODBEdit,ERROR] [midas.c:8893:rpc_client_check,ERROR] Connection broken to
"Analyzer" on host LANL-FADC-DAQ
20:05:02 [ODBEdit,INFO] Run #180 start aborted
20:05:03 [mdump,INFO] Client 'Analyzer' on buffer 'SYSTEM' removed by cm_watchdog
because process pid 1001 does not exist
20:05:11 [mhttpd,INFO] Client 'Analyzer' (PID 1001) on database 'ODB' removed by
cm_watchdog (idle 10.1s,TO 10s)
Thanks,
Robert Pattie |
01 Feb 2013, Randolf Pohl, Forum, analyzer cannot connect to the statistics database
|
The simplest thing is probably to delete all files .[A-Z]*.SHM in the odb directory (the
one you specified in /etc/exptab).
This wipes the ODB, shared memory and all the other obscure stuff, giving you a clean,
fresh start.
Of course it wipes all the valuable stuff, too. That's why it's handy to sometimes open
odbedit and "save odb_<yyyymmdd>.odb". You can reload the thing after such a fatal
"rm .[A-Z]*.SHM" |
01 Feb 2013, Stefan Ritt, Forum, analyzer cannot connect to the statistics database
|
> The simplest thing is probably to delete all files .[A-Z]*.SHM in the odb directory (the
> one you specified in /etc/exptab).
> This wipes the ODB, shared memory and all the other obscure stuff, giving you a clean,
> fresh start.
>
> Of course it wipes all the valuable stuff, too. That's why it's handy to sometimes open
> odbedit and "save odb_<yyyymmdd>.odb". You can reload the thing after such a fatal
> "rm .[A-Z]*.SHM"
Thanks Randolf for helping out, I was not in the office this week.
In addition of deleting the *SHM files, it's sometimes necessary to delete the shared memory. You do this with the
command line tools
ipcs -m
ipcrm -m <shmid>
/Stefan |
13 May 2022, Konstantin Olchanski, Info, analysis of corner cases in event buffer write cache
|
introduction:
to remember, bm_send_event() writes an event to the write cache, bm_flush_cache()
writes the contents of the write cache into the shared memory event buffer, buffer
free space is consumed. in the usual case, mlogger is reading events from the shared
memory event buffer, buffer free space is released. there is also a read cache, not
part of this discussion.
the purpose of the write cache is to reduce contention for the shared memory
semaphore. in the case of large number of small events, semaphore is locked per
cache-flush, instead of per-event. correct tuning of write cache and event size can
reduce lock rate from >100 kHz to around 100 Hz or lower.
analysis:
for correct operation of bm_send_event() under all conditions we need to consider
all corner cases:
1) no write cache: (cache size set to 0)
- event_size > buffer_size -> reject the event (obviously)
- event_size > 0.5 * buffer_size -> only 1 event fits into the buffer, next write
will stall until mlogger reads the previous event (sequential operation, bad)
- event_size < 0.3 * buffer_size -> at least 2 events fit into the buffer (good)
decision: limit event size to 0.5 to 0.3 * buffer_size (current limit is 0.5 *
buffer_size, I think).
consequence: buffer size limit is 2 Gbytes (32-bit byte offsets, code is only 31-
bit-clean), max event size is between 1 Gbytes and 0.6 Gbytes.
2) writing to write cache:
- event_size > cache_size -> flush cache, write event to directly to buffer
- event_size > 0.5 * cache_size -> inefficient use of cache: write to cache, next
event does not fit, flush to buffer, repeat. no gain in semaphore locking (bad), one
additional memcpy() (event to cache and cache to buffer) (bad)
- event_size < 0.3 * cache_size -> multiple events fit into cache, but probably no
gain in semaphore locking
decision: events that are bigger than 0.3 to 0.1 * cache_size should not go through
the cache. (flush cache, write directly to buffer).
3) flush write cache to buffer:
- cache_size > buffer_size -> cannot flush in 1 operation, must have a loop and
flush the cache in pieces
- cache_size between 0.5 and 1.0 * buffer_size -> can flush in 1 operation, but must
wait for mlogger to fully empty the buffer (sequential operation, bad)
- cache size < 0.3 * buffer_size -> can flush in 1 operation, at least 2 "flushes"
fit inside the buffer (good)
decision: limit write cache size to 0.3 * buffer_size. (current limit is
0.25*buffer_size).
consequences:
- write cache size limit is 0.3..0.25 * 2GB = 0.6..0.5 Gbytes
- cached event size limit is 0.3..0.1 * 0.5 GBytes = 150..50 Mbytes
- minimum number of cached events: 3 to 10
- semaphore locks reduced: 3 to 10 locks become 1 lock (all events cached),
4 to 11 locks become 2 locks (big event causes cache flush).
4) complications:
- there is a periodic 1/second bm_flush_cache() that flushes the cache early and
reduces it's efficiency (but needed to avoid having data stuck in cache for long
time)
- if multiple frontends use large write cache (~ 0.3..0.5 * buffer_size), again,
sequential operation can happen (bad)
- write cache is per-frontend, not per-equipment. if different equipments request
different cache sizes, mfe.c and tmfe c++ frontends complain about this, but the
user has to sort it out.
K.O. |
16 May 2022, Konstantin Olchanski, Info, analysis of corner cases in event buffer write cache
|
> for correct operation of bm_send_event() under all conditions we need to ...
to continue computation from last message:
default SYSTEM buffer size: 32 MiBytes
default max event size: 4 MiBytes
hard max buffer size: 2 Gbytes (code is only 31-bit-clean)
hard max event size: 2 Gbytes (code is only 31-bit-clean)
max event size currently: 32 Mbytes (same as buffer size)
max event size per (1) in previous post: 32*0.5..0.3 = 16..9 MiBytes
number of default-max-size events buffered: 32/4 = 8.
number of per (1) max-size events buffered: 2 or 3
number of current max-size events buffered: 0 (bad, frontend is serialized with mlogger)
default write cache size: 100 kbytes
max write cache size currently: buffer size / 4 = 32/4 = 8 MiBytes
max write cache size per (3) in previous post: buffer_size / 3 = 10 Mbytes
hard max write cache size per (3): 2 Gbytes/3 = 600 Mbytes
max size of cached events:
current: 100 kbytes (size as cache size)
per (2) in previous post: 0.1..0.3 * cache size = 10..30 kbytes
per (2), 1 Mbyte cahe: 0.1..0.3 * cache size = 100..300 kbytes
hard max size: 0.1..0.3 * hard_max_cache_size = 0.1..0.3 * 600 = 60..180 Mbytes.
max data rate before event buffer semaphore locking rate exceeds 100 Hz:
1 kbyte events, no write cache: 100 kbytes/sec
1 kbyte events, 100 kbyte cache: 100 events cached, cache flush rate 100 Hz -> 100*1kbyte*100Hz -> 10 Mbytes/sec
1 kbyte events, 1 Mbyte cache: 1000 events cached, cache flush rate 100 Hz -> 100 Mbytes/sec (1gige ethernet)
N kbyte events, 1 Mbyte cache: same thing (data rate is limited by cache flush rate 100 Hz)
100 kbyte events, 1 Mbyte cache, not cached per (2): 100kbyte*100Hz = 10 Mbytes/sec
300 kbyte events, 1 Mbyte cache, not cached per (2): 300kbyte*100Hz = 30 Mbytes/sec
N00 kbyte events: N0 Mbytes/sec (500->50, etc)
1 kbyte events, 10 Mbyte cache: 10000 events cached, cache flush rate 100 Hz -> 1000 Mbytes/sec (10gige ethernet)
N kbyte events, 10 Mbyte cache: same thing (data rate is limited by cache flush rate 100 Hz)
1000 kbyte events, 10 Mbyte cache, not cached per (2): 1000kbyte*100Hz = 100 Mbytes/sec
3000 kbyte events, 10 Mbyte cache, not cached per (2): 3000kbyte*100Hz = 300 Mbytes/sec
N000 kbyte events: N00 Mbytes/sec (4000->400, 5000->500, etc)
default max event size: 4 Mibytes*100Hz = 400 Mbytes/sec (exceeds 1gige ethernet)
hard max event size (divided by 10 to buffer 10 events): 200 Mbytes*100Hz -> 20 Gbytes/sec
max event rate before event buffer semaphore locking rate exceeds 100 Hz:
1 kbyte events, no write cache: 100 Hz (obviously)
1 kbyte events, 100 kbyte cache: 100 events cached, cache flush rate 100 Hz -> 10 kHz
1 kbyte events, 1 Mbyte cache: 1000 events cached, cache flush rate 100 Hz -> 100 kHz
N kbyte events, 1 Mbyte cache: 1000/N events cached, cache flush rate 100 Hz -> 100/N kHz
1 kbyte events, 10 Mbyte cache: 10000 events cached, cache flush rate 100 Hz -> 1000 kHz
N kbyte events, 10 Mbyte cache: 10000/N events cached, cache flush rate 100 Hz -> 1000/N kHz
100 kbyte events, not cached per (2): 100 Hz (obviously)
300 kbyte events, not cached per (2): 100 Hz (obviously)
default max event size: 100 Hz (obviously)
K.O. |
16 May 2022, Konstantin Olchanski, Info, analysis of corner cases in event buffer write cache
|
> > for correct operation of bm_send_event() under all conditions we need to ...
> to continue computation from last message:
if I got my numbers right, for present-day hardware (1gige/10gige data rates, 100 Hz max locking rate), we should
increase the default buffer write cache size from 100 kbytes to 10 Mbytes.
this cache size will permit processing of the full mix of small/big events
at the full mix of event rates without exceeding the 100 Hz semaphore locking rate.
with the 10 Mbyte write cache, default event buffer size should be 30-40 Mbytes (current size is 33 Mbytes, so does
not need to change).
this computation is for 1 writer (1 reader, mlogger). it is a typical case for our experiments.
multiple writers can run into contention for event buffer space.
consider 10 writers want to flush their 10 Mbyte write cache all at the same time:
if buffer size is the default 33 Mbytes, the first 3 writers will have successful write cache flush,
but the other 7 will stall, there is no space in the buffer, we have to wait for mlogger to free
some (mlogger writing X Mbytes/sec will take Y milliseconds to liberate 10 Mbytes of space for the 4th writer
to successfully flush, writers 5..10 are still stalled).
but in a system with 10 writers writing at 10 Mbytes/sec (1 Hz default cache flush rate) is 100 Mbytes/sec
will likely have SYSTEM buffer size at least 200-300 Mbytes (to buffer 1-2 seconds of data against
any delays in writing to disk/network storage).
so there should be no problem in practice.
K.O. |
12 Aug 2020, Yan Liu, Suggestion, adding db_get_mode ti check access mode for keys
|
Hello,
I am wondering if there is a function that checks the access mode for a key? I
found the db_set_mode() function that allows me to set the access mode for a key,
but failed to find its counterpart get function.
Thanks in advance,
Yan |
13 Aug 2020, Stefan Ritt, Suggestion, adding db_get_mode ti check access mode for keys
|
> Hello,
>
> I am wondering if there is a function that checks the access mode for a key? I
> found the db_set_mode() function that allows me to set the access mode for a key,
> but failed to find its counterpart get function.
>
> Thanks in advance,
> Yan
KEY k;
db_get_key(hDB, handle, &k);
std::cout << k.access_mode << std::endl;
/Stefan |
13 Aug 2020, Yan Liu, Suggestion, adding db_get_mode ti check access mode for keys
|
Thank you!
Yan
> > Hello,
> >
> > I am wondering if there is a function that checks the access mode for a key? I
> > found the db_set_mode() function that allows me to set the access mode for a key,
> > but failed to find its counterpart get function.
> >
> > Thanks in advance,
> > Yan
>
>
> KEY k;
> db_get_key(hDB, handle, &k);
> std::cout << k.access_mode << std::endl;
>
> /Stefan |
|