Back Midas Rome Roody Rootana
  Midas DAQ System, Page 57 of 142  Not logged in ELOG logo
    Reply  05 Feb 2008, Stefan Ritt, Forum, analyzer crashes at high rates 
> I'm using midas to read data from a waveform digitizer at event rates of
> 10-30kHz. To accomplish this the digitizer is read via Block transfers and the
> raw data put into a single MIDAS event.  Thus a MIDAS event could contain upto
> 250 physical events and at maximum 350kBytes.  In the analyzer modules I had
> been analyzing the first physics event contained in a MIDAS event with no
> problem.  Recently I tried to analyze all the physical events.  At low rates,
> 100hz-1khz, this was no problem, 1-5 physical events in a MIDAS event.  At
> higher rates 10-20kHz, where there are about 40physical events per MIDAS event,
>  the analyzer keeps up for a  few seconds then seg faults with " 'shared object
> read from target memory' has disappear; keeping it symbols".  Any suggestions as
> to why the analyzer is crashing would be very helpful.

I personally have never seen this error message. The analyzer is designed such that
it produces "back pressure" if the data rate is higher than the analysis rate and
you have "request all events" on. The only thing I can image are the following two
issues:

- At higher rate where you have more than 40 physical events per MIDAS event, there
is some bug in your analysis code which gets exploited only in that case. Maybe some
temporary array which is only 35 entries long or something like this.

- The back pressure mentioned above will slow down the frontend. If your computer
busy logic is not working correctly, you might get more triggers than you can
acquire. Maybe then the data gets screwed up and the analyzer chokes on it.

Finding the exact reason is not simple. For sure you have to run the analyzer inside
the debugger, to see exactly where the segfault happens. You then maybe have to
produce some dummy data in the frontend (like always sending the same event) to
disentangle some possible trigger problems from other problems.

Best regards,

  Stefan
Entry  28 Jan 2013, Robert Pattie, Forum, analyzer cannot connect to the statistics database 
I've managed to put the analyzer into state where it cannot connect to the 
statistics database.  The error message suggests another analyzer is connected.  
I've recompiled MIDAS and the user code, restarted the computer etc..., and the 
analyzer cannot connect.  If I run "odbedit -c clean", I can start the analyzer, 
but get the same error when exiting or starting a run.  I've commented out all the
user code in the analyzer.c and its associated analyzer module's, and read event
code in the frontend and nothing resolves this issue.  Any suggestion?

The output from attempting to run the analyzer is:

Connect to experiment nnbarxwnr...[odb.c:1013:db_open_database,ERROR] Removed ODB
client 'Analyzer', index 0 because process pid 31982 does not exists
Deleted entry '/System/Clients/31982' for client 'Analyzer' because it is not
connected to ODB
OK
Root server listening on port 9090...
Loading previous online histos from ./data/last.root
ss_mutex_wait_for: pthread_mutex_lock() returned errno 22 (Invalid argument),
aborting...


When attempting to clean up the Analyzer tree in the ODB I receive the message
:"deletion of key not allowed."  

It appears that running the analyzer sets the permissions of the Statistics tree of
my analyzer module into RWDE.  

Adding the following lines to my start up script eliminate the above problem:
odbedit -c clean
odbedit -c "chmod 7 Analyzer/"
odbedit -c "rm /Analyzer/fADCs/Statistics"

Now when starting a run the analyzer crashes with this error:analyzer:
src/midas.c:11443: rpc_execute: Assertion `return_buffer' failed.
Aborted (core dumped)

and the messages in the odb are :

[system.c:4295:recv_tcp,ERROR] header: recv returned 0, n_received = 0, unexpected
connection closure
[midas.c:10042:rpc_client_call,ERROR] recv_tcp() failed, routine = "rc_transition",
host = "LANL-FADC-DAQ"
[midas.c:4130:cm_transition,ERROR] Could not start a run: cm_transition() status 503,
message 'Unknown error 503 from client 'Analyzer' on host LANL-FADC-DAQ'
Deleted entry '/System/Clients/1001' for client 'Analyzer' because process pid 1001
does not exists
[midas.c:8893:rpc_client_check,ERROR] Connection broken to "Analyzer" on host
LANL-FADC-DAQ
Run #180 start aborted
Error: Unknown error 503 from client 'Analyzer' on host LANL-FADC-DAQ

20:05:02 [Logger,INFO] Deleting previous file "./data/run00180.mid"

20:05:02 [ODBEdit,ERROR] [system.c:4295:recv_tcp,ERROR] header: recv returned 0,
n_received = 0, unexpected connection closure

20:05:02 [ODBEdit,ERROR] [midas.c:10042:rpc_client_call,ERROR] recv_tcp() failed,
routine = "rc_transition", host = "LANL-FADC-DAQ"

20:05:02 [ODBEdit,ERROR] [midas.c:4130:cm_transition,ERROR] Could not start a run:
cm_transition() status 503, message 'Unknown error 503 from client 'Analyzer' on host
LANL-FADC-DAQ'

20:05:02 [ODBEdit,INFO] Deleted entry '/System/Clients/1001' for client 'Analyzer'
because process pid 1001 does not exists

20:05:02 [ODBEdit,ERROR] [midas.c:8893:rpc_client_check,ERROR] Connection broken to
"Analyzer" on host LANL-FADC-DAQ

20:05:02 [ODBEdit,INFO] Run #180 start aborted
20:05:03 [mdump,INFO] Client 'Analyzer' on buffer 'SYSTEM' removed by cm_watchdog
because process pid 1001 does not exist
20:05:11 [mhttpd,INFO] Client 'Analyzer' (PID 1001) on database 'ODB' removed by
cm_watchdog (idle 10.1s,TO 10s)


Thanks,
Robert Pattie
    Reply  01 Feb 2013, Randolf Pohl, Forum, analyzer cannot connect to the statistics database 
The simplest thing is probably to delete all files .[A-Z]*.SHM in the odb directory (the
one you specified in /etc/exptab).
This wipes the ODB, shared memory and all the other obscure stuff, giving you a clean,
fresh start.

Of course it wipes all the valuable stuff, too. That's why it's handy to sometimes open
odbedit and "save odb_<yyyymmdd>.odb". You can reload the thing after such a fatal 
"rm .[A-Z]*.SHM" 
    Reply  01 Feb 2013, Stefan Ritt, Forum, analyzer cannot connect to the statistics database 
> The simplest thing is probably to delete all files .[A-Z]*.SHM in the odb directory (the
> one you specified in /etc/exptab).
> This wipes the ODB, shared memory and all the other obscure stuff, giving you a clean,
> fresh start.
> 
> Of course it wipes all the valuable stuff, too. That's why it's handy to sometimes open
> odbedit and "save odb_<yyyymmdd>.odb". You can reload the thing after such a fatal 
> "rm .[A-Z]*.SHM" 

Thanks Randolf for helping out, I was not in the office this week.

In addition of deleting the *SHM files, it's sometimes necessary to delete the shared memory. You do this with the 
command line tools

ipcs -m
ipcrm -m <shmid>


/Stefan
Entry  13 May 2022, Konstantin Olchanski, Info, analysis of corner cases in event buffer write cache 
introduction:

to remember, bm_send_event() writes an event to the write cache, bm_flush_cache() 
writes the contents of the write cache into the shared memory event buffer, buffer 
free space is consumed. in the usual case, mlogger is reading events from the shared 
memory event buffer, buffer free space is released. there is also a read cache, not 
part of this discussion.

the purpose of the write cache is to reduce contention for the shared memory 
semaphore. in the case of large number of small events, semaphore is locked per 
cache-flush, instead of per-event. correct tuning of write cache and event size can 
reduce lock rate from >100 kHz to around 100 Hz or lower.

analysis:

for correct operation of bm_send_event() under all conditions we need to consider 
all corner cases:

1) no write cache: (cache size set to 0)

- event_size > buffer_size -> reject the event (obviously)
- event_size > 0.5 * buffer_size -> only 1 event fits into the buffer, next write 
will stall until mlogger reads the previous event (sequential operation, bad)
- event_size < 0.3 * buffer_size -> at least 2 events fit into the buffer (good)

decision: limit event size to 0.5 to 0.3 * buffer_size (current limit is 0.5 * 
buffer_size, I think).

consequence: buffer size limit is 2 Gbytes (32-bit byte offsets, code is only 31-
bit-clean), max event size is between 1 Gbytes and 0.6 Gbytes.

2) writing to write cache:

- event_size > cache_size -> flush cache, write event to directly to buffer
- event_size > 0.5 * cache_size -> inefficient use of cache: write to cache, next 
event does not fit, flush to buffer, repeat. no gain in semaphore locking (bad), one 
additional memcpy() (event to cache and cache to buffer) (bad)
- event_size < 0.3 * cache_size -> multiple events fit into cache, but probably no 
gain in semaphore locking

decision: events that are bigger than 0.3 to 0.1 * cache_size should not go through 
the cache. (flush cache, write directly to buffer).

3) flush write cache to buffer:

- cache_size > buffer_size -> cannot flush in 1 operation, must have a loop and 
flush the cache in pieces
- cache_size between 0.5 and 1.0 * buffer_size -> can flush in 1 operation, but must 
wait for mlogger to fully empty the buffer (sequential operation, bad)
- cache size < 0.3 * buffer_size -> can flush in 1 operation, at least 2 "flushes" 
fit inside the buffer (good)

decision: limit write cache size to 0.3 * buffer_size. (current limit is 
0.25*buffer_size).

consequences:

- write cache size limit is 0.3..0.25 * 2GB = 0.6..0.5 Gbytes
- cached event size limit is 0.3..0.1 * 0.5 GBytes = 150..50 Mbytes
- minimum number of cached events: 3 to 10
- semaphore locks reduced: 3 to 10 locks become 1 lock (all events cached),
4 to 11 locks become 2 locks (big event causes cache flush).

4) complications:

- there is a periodic 1/second bm_flush_cache() that flushes the cache early and 
reduces it's efficiency (but needed to avoid having data stuck in cache for long 
time)
- if multiple frontends use large write cache (~ 0.3..0.5 * buffer_size), again, 
sequential operation can happen (bad)
- write cache is per-frontend, not per-equipment. if different equipments request 
different cache sizes, mfe.c and tmfe c++ frontends complain about this, but the 
user has to sort it out.

K.O.
    Reply  16 May 2022, Konstantin Olchanski, Info, analysis of corner cases in event buffer write cache 
> for correct operation of bm_send_event() under all conditions we need to ...

to continue computation from last message:

default SYSTEM buffer size: 32 MiBytes
default max event size: 4 MiBytes

hard max buffer size: 2 Gbytes (code is only 31-bit-clean)
hard max event size: 2 Gbytes (code is only 31-bit-clean)

max event size currently: 32 Mbytes (same as buffer size)
max event size per (1) in previous post: 32*0.5..0.3 = 16..9 MiBytes

number of default-max-size events buffered: 32/4 = 8.
number of per (1) max-size events buffered: 2 or 3
number of current max-size events buffered: 0 (bad, frontend is serialized with mlogger)

default write cache size: 100 kbytes

max write cache size currently: buffer size / 4 = 32/4 = 8 MiBytes
max write cache size per (3) in previous post: buffer_size / 3 = 10 Mbytes
hard max write cache size per (3): 2 Gbytes/3 = 600 Mbytes

max size of cached events:

current: 100 kbytes (size as cache size)
per (2) in previous post: 0.1..0.3 * cache size = 10..30 kbytes
per (2), 1 Mbyte cahe: 0.1..0.3 * cache size = 100..300 kbytes
hard max size: 0.1..0.3 * hard_max_cache_size = 0.1..0.3 * 600 = 60..180 Mbytes.

max data rate before event buffer semaphore locking rate exceeds 100 Hz:

1 kbyte events, no write cache: 100 kbytes/sec
1 kbyte events, 100 kbyte cache: 100 events cached, cache flush rate 100 Hz -> 100*1kbyte*100Hz -> 10 Mbytes/sec
1 kbyte events, 1 Mbyte cache: 1000 events cached, cache flush rate 100 Hz -> 100 Mbytes/sec (1gige ethernet)
N kbyte events, 1 Mbyte cache: same thing (data rate is limited by cache flush rate 100 Hz)
100 kbyte events, 1 Mbyte cache, not cached per (2): 100kbyte*100Hz = 10 Mbytes/sec
300 kbyte events, 1 Mbyte cache, not cached per (2): 300kbyte*100Hz = 30 Mbytes/sec
N00 kbyte events: N0 Mbytes/sec (500->50, etc)
1 kbyte events, 10 Mbyte cache: 10000 events cached, cache flush rate 100 Hz -> 1000 Mbytes/sec (10gige ethernet)
N kbyte events, 10 Mbyte cache: same thing (data rate is limited by cache flush rate 100 Hz)
1000 kbyte events, 10 Mbyte cache, not cached per (2): 1000kbyte*100Hz = 100 Mbytes/sec
3000 kbyte events, 10 Mbyte cache, not cached per (2): 3000kbyte*100Hz = 300 Mbytes/sec
N000 kbyte events: N00 Mbytes/sec (4000->400, 5000->500, etc)
default max event size: 4 Mibytes*100Hz = 400 Mbytes/sec (exceeds 1gige ethernet)
hard max event size (divided by 10 to buffer 10 events): 200 Mbytes*100Hz -> 20 Gbytes/sec

max event rate before event buffer semaphore locking rate exceeds 100 Hz:

1 kbyte events, no write cache: 100 Hz (obviously)
1 kbyte events, 100 kbyte cache: 100 events cached, cache flush rate 100 Hz -> 10 kHz
1 kbyte events, 1 Mbyte cache: 1000 events cached, cache flush rate 100 Hz -> 100 kHz
N kbyte events, 1 Mbyte cache: 1000/N events cached, cache flush rate 100 Hz -> 100/N kHz
1 kbyte events, 10 Mbyte cache: 10000 events cached, cache flush rate 100 Hz -> 1000 kHz
N kbyte events, 10 Mbyte cache: 10000/N events cached, cache flush rate 100 Hz -> 1000/N kHz
100 kbyte events, not cached per (2): 100 Hz (obviously)
300 kbyte events, not cached per (2): 100 Hz (obviously)
default max event size: 100 Hz (obviously)

K.O.
    Reply  16 May 2022, Konstantin Olchanski, Info, analysis of corner cases in event buffer write cache 
> > for correct operation of bm_send_event() under all conditions we need to ...
> to continue computation from last message:

if I got my numbers right, for present-day hardware (1gige/10gige data rates, 100 Hz max locking rate), we should 
increase the default buffer write cache size from 100 kbytes to 10 Mbytes.

this cache size will permit processing of the full mix of small/big events
at the full mix of event rates without exceeding the 100 Hz semaphore locking rate.

with the 10 Mbyte write cache, default event buffer size should be 30-40 Mbytes (current size is 33 Mbytes, so does 
not need to change).

this computation is for 1 writer (1 reader, mlogger). it is a typical case for our experiments.

multiple writers can run into contention for event buffer space.

consider 10 writers want to flush their 10 Mbyte write cache all at the same time:

if buffer size is the default 33 Mbytes, the first 3 writers will have successful write cache flush,
but the other 7 will stall, there is no space in the buffer, we have to wait for mlogger to free
some (mlogger writing X Mbytes/sec will take Y milliseconds to liberate 10 Mbytes of space for the 4th writer
to successfully flush, writers 5..10 are still stalled).

but in a system with 10 writers writing at 10 Mbytes/sec (1 Hz default cache flush rate) is 100 Mbytes/sec
will likely have SYSTEM buffer size at least 200-300 Mbytes (to buffer 1-2 seconds of data against
any delays in writing to disk/network storage).

so there should be no problem in practice.

K.O.
Entry  12 Aug 2020, Yan Liu, Suggestion, adding db_get_mode ti check access mode for keys 
Hello,

I am wondering if there is a function that checks the access mode for a key? I 
found the db_set_mode() function that allows me to set the access mode for a key, 
but failed to find its counterpart get function.

Thanks in advance,
Yan
    Reply  13 Aug 2020, Stefan Ritt, Suggestion, adding db_get_mode ti check access mode for keys 
> Hello,
> 
> I am wondering if there is a function that checks the access mode for a key? I 
> found the db_set_mode() function that allows me to set the access mode for a key, 
> but failed to find its counterpart get function.
> 
> Thanks in advance,
> Yan


  KEY k;
  db_get_key(hDB, handle, &k);
  std::cout << k.access_mode << std::endl;

/Stefan
    Reply  13 Aug 2020, Yan Liu, Suggestion, adding db_get_mode ti check access mode for keys 
Thank you!

Yan

> > Hello,
> > 
> > I am wondering if there is a function that checks the access mode for a key? I 
> > found the db_set_mode() function that allows me to set the access mode for a key, 
> > but failed to find its counterpart get function.
> > 
> > Thanks in advance,
> > Yan
> 
> 
>   KEY k;
>   db_get_key(hDB, handle, &k);
>   std::cout << k.access_mode << std::endl;
> 
> /Stefan
Entry  22 Jun 2012, Zisis Papandreou, Info, adding 2nd ADC and TDC to crate frontend.canalyzer.canalyzer.h
Hi folks:

we've been running midas-1.9.5 for a few years here at Regina.  We are now
working on a larger cosmic ray testing that requires a second ADC and second TDC
module in our Camac crate (we use the hytek1331 controller by the way).  We're
baffled as to how to set this up properly.  Specifically we have tried:

frontend.c

/* number of channels */
#define N_ADC  12 
(changed this from the old '8' to '12', and it seems to work for Lecroy 2249)

#define SLOT_ADC0   10
#define SLOT_TDC0   9
#define SLOT_ADC1   15
#define SLOT_TDC1   14

Is this the way to define the additional slots (by adding 0, 1 indices)?

Also, we were not able to get a new bank (ADC1) working, so we used a loop to
tag the second ADC values onto those of the first.

If someone has an example of how to handle multiple ADCs and TDCs and
suggestions as to where changes need to be made (header files, analyser, etc)
this would be great.

Thanks, Zisis...

P.S.  I am attaching the relevant files.
Entry  30 Apr 2022, Konstantin Olchanski, Info, added web pages for "show odb clients" and "show open records" 
for a long time, midas web pages have been missing the equivalent of odbedit 
"scl" and "sor" to display current odb clients and current odb open records.

this is now added as buttons "show open records" and "show odb clients" in the 
odb editor page.

as in odbedit, "sor" shows open records under the current subtree, i.e. if you 
are looking at /equipment, you will not see open records for /experiment. to see 
all open records, go to "/".

commit b1ab7e67ecf785744fff092708d8389f222b14a4

K.O.
    Reply  04 May 2022, Stefan Ritt, Info, added web pages for "show odb clients" and "show open records" 
Concerning the "scl" page, we are currently having a discussion. At the moment, one can 
see midas clients in three different places:

1) the main status page at the bottom, only names and hosts are there

2) the programs page, where one can also start/stop program

3) now the new page "Show ODB clients" in the ODB editor page, which shows also the 
alive status, PID and timeout

I'm thinking that three locations are two too much, so we are considering to merge the 
tree pages into one. That would mean that 1) goes away, and the "Programs" page will 
show more information. We have some rare cases that programs are removed from 
/System/Clients in the ODB but still attached to the ODB. For those "zombies" we would 
add a "hard kill" function.

I would like to hear feedback from the midas community before we proceed with the 
plans. Anybody desperately in need of the programs shown on the status page?

Best,
Stefan
Entry  01 May 2022, Konstantin Olchanski, Info, added web page for "mdump" 
added JSON RPC for bm_receive_event() and added a web page for "mdump".

the event dump is a hex dump for now.

if somebody can contribute a javascript decoder for midas bank format, it would be greatly appreciated.

otherwise, I will eventually write my own decoder library patterned on midasio.h and midasio.cxx.

as of commit 5882d55d1f5bbbdb0d9238ada639e63ac27d8825
K.O.
    Reply  01 May 2022, Konstantin Olchanski, Info, added web page for "mdump" 
> added JSON RPC for bm_receive_event()

there is a number of problems with implementing bm_receive_event() as a RPC:

1) mhttpd has only event buffer 1 read pointer for all javascript connections, if two browser tabs are 
running mdump, they will "steal" events from each other.
2) javascript connections are state-less and we cannot specify per-connection event_id and trigger_mask 
filters to bm_receive_event(). our bm_request_event() has to be for all event_id and all trigger_mask.
3) for same reason, we cannot have some requests to be GET_ALL, some to be GET_RECENT and some to be 
GET_OLD (if GET_OLD is ever implemented).

Problem (1) is hard to fix. Only solution I can see is to have mhttpd have it's own event buffer that can 
somehow track which events have been sent to which javascript connection.

The same scheme allows implementing GET_ALL and per-connection event_id and trigger_mask filters.

The difficulty is in detecting javascript connections that are no longer active and it's event request and 
events we have buffered for it can be deleted. Unlike proper rpc clients, javascript browser tabs can be 
closed without warning and without opportunity to tell rpc server that they are closed, gone.

K.O.
    Reply  01 May 2022, Konstantin Olchanski, Info, added web page for "mdump" 
> added a web page for "mdump".

missing functions:
- get a list of existing event buffers (should read event buffer names from /Experiment/Buffer sizes)
- selector box to select event buffer
- button for "get next" and "get new" (should call bm_skip_event() before bm_receive_event())
- entry fields for event_id and trigger_mask event filter
- check box for "keep getting new data" and entry field for update frequency
- (eventually) entry field for bank name filter

K.O.
    Reply  02 May 2022, Stefan Ritt, Info, added web page for "mdump" 
Here are some of my thoughts:

- I volunteer to write the JavaScript midas bank decoder. Just a couple of pure javascript functions, no 
midasio.cxx library needed.

- If different javascript connections "steal" events from each other, I would not be concerned. Actually I 
would rather like that all connections see the SAME event. So mhttpd keeps one event, serves it to all 
links, so displays are consistent. If a browser wants to see the "next" event, it send the old serial 
number and days "please send next event AFTER serial number". If the serial number is larger than the 
event in the buffer, mhttpd fetches a new event and puts it into its buffer.

- Since javascript connections are connectionless, I would rather pass event_id and trigger_mask with each 
request. Then mhttpd can retrieve events until event_id and trigger_mask match, then serve that event. 
Since reading events from a midas buffer is fast (many 10'000s of events per second), the won't be much of 
a delay.

- GET_ALL does not make sense for browsers, you don't want to slow down any frontend. If someone wants to 
do histogramming in the browser, then GET_SOME (which is kind of GET_OLD) would make sense, but most of 
the cases we have some single event display, and there a GET_RECENT is most appropriate.
    Reply  14 Feb 2024, Konstantin Olchanski, Bug Fix, added ubuntu-22 to nightly build on bitbucket, now need python! 
> Are we running these tests as part of the nightly build on bitbucket? They would be part of 
> the "make test" target. Correct python dependancies may need to be added to the bitbucket OS 
> image in bitbucket-pipelines.yml. (This is a PITA to get right).

I added ubuntu-22 to the nightly builds.

but I notice the build says "no python" and I am not sure what packages I need to install for 
midas python to work.

Ben, can you help me with this?

https://bitbucket.org/tmidas/midas/pipelines/results/1106/steps/%7B9ef2cf97-bd9f-4fd3-9ca2-9c6aa5e20828%7D

K.O.
Entry  24 May 2024, Konstantin Olchanski, Info, added ubuntu 22 arm64 cross-compilation 
Ubuntu 22 has almost everything necessary to cross-build arm64 MIDAS frontends:

# apt install g++-12-aarch64-linux-gnu gcc-12-aarch64-linux-gnu-base libstdc++-12-dev-arm64-cross
$ aarch64-linux-gnu-gcc-12 -o ttcp.aarch64 ttcp.c -static

to cross-build MIDAS:

make arm64_remoteonly -j

run programs from $MIDASSYS/linux-arm64-remoteonly/bin
link frontends to libraries in $MIDASSYS/linux-arm64-remoteonly/lib

Ubuntu 22 do not provide an arm64 libz.a, as a workaround, I build a fake one. (we do not have HAVE_ZLIB anymore...). or you 
can link to libz.a from your arm64 linux image, assuming include/zlib.h are compatible.

K.O.
Entry  11 Oct 2017, Konstantin Olchanski, Info, added support for ucLinux 
Support for building for ucLinux was added to MIDAS. I use the emcraft toolchain and userland on 
some kind of embedded ARM CPU that does not have an MMU. See the Makefile for details. The 
main difference of ucLinux is lack of fork(), which cannot be done without an MMU. Not everything 
works, but at the least I can run a frontend and connect to an experiment on a remote host 
computer (mserver connection). K.O.
ELOG V3.1.4-2e1708b5