Back Midas Rome Roody Rootana
  Midas DAQ System, Page 38 of 121  Not logged in ELOG logo
ID Date Author Topic Subjectdown
  2367   24 Mar 2022 Stefan RittBug Fixfix for event buffer corruption in bm_flush_cache()
> > ... instead of struggling with all your locks.
> 
> it is better to have midas fully thread safe. ODB has been so for a long time,
> event buffer partially (except for this bug), now fully.
> 
> without that the problem still exists, because in many frontends,
> bm_flush_buffer() is called from the main thread, and will race
> against the "bm_send_event() thread". Of course you can do
> everything on the main thread, but this opens you to RPC timeouts
> during run transitions (if you sleep in bm_wait_for_free_space()).

Just for the record: in the mfe.cxx framework both bm_send_event() and 
bm_flush_buffer() are called from the main thread, as can be seen in the 
midas/examples/mtfe/mtfe.cxx example.

But I agree that having all buffer operations thread safe is a clear benefit.

Stefan
  2369   24 Mar 2022 Konstantin OlchanskiBug Fixfix for event buffer corruption in bm_flush_cache()
> Thanks for the investigation. Back in 2020, we had some issues
> of losing data between the system buffer and the logger writing them
> to disk (https://daq00.triumf.ca/elog-midas/Midas/1966). This was polled equipment
> but we had a multithreaded FE running at the same time. Could this be related to the same problem?

I think we will have to follow up on your problem 1966 separately.

I think this bug cannot lose events. Writing events to the write cache has correct
locking, no loss here. writing write cache to shared memory has correct locking,
no loss there. the bug will cause the *next* event in the event buffer to be overwritten,
this will be detected by most programs as shared memory corruption and everybody
will quit. (mhttpd, mserver, odbedit will probably survive).

I guess there could be unlucky corruption that looks like nothing was corrupted,
but this will affect only a few events right at the shared memory read/write
pointer, it so happens that they are the oldest events in the buffer and likely
mlogger already wrote them to disk. mlogger read pointer will likely follow
the shared memory write pointer closely, well ahead of the shared memory
read pointer which always pointe to the older event and where this bug's corruption
will happen.

So no, I do not think this bug can cause event loss between frontend and mlogger.

K.O.
  258   25 May 2006 Konstantin OlchanskiBug Fixfix crash in xml odb load
There is a crash in odbedit when loading some xml odb files: a missing check for NULL pointer when 
loading an array of strings and one of the array elements is blank. This check is present when loading 
other string values. Here is the diff:

-bash-3.00$ diff odb.c odb.c-new
5621c5621,5624
<                db_set_data_index(hDB, hKey, mxml_get_value(child), size, i, tid);
---
>                if (mxml_get_value(child) == NULL)
>                   db_set_data_index(hDB, hKey, "", size, i, tid);
>                else
>                   db_set_data_index(hDB, hKey, mxml_get_value(child), size, i, tid);

K.O.
  263   08 Jun 2006 Konstantin OlchanskiBug Fixfix compilation of musbstd.h, add it back to libmidas
I fixed the compilation of musbstd.h (it required -DHAVE_LIBUSB on Linux, but
nothing knew about defining it) and put musbstd.o back into libmidas (USB
support should be part of the standard base midas library). K.O.
  267   09 Jun 2006 Stefan RittBug Fixfix compilation of musbstd.h, add it back to libmidas
> I fixed the compilation of musbstd.h (it required -DHAVE_LIBUSB on Linux, but
> nothing knew about defining it) and put musbstd.o back into libmidas (USB
> support should be part of the standard base midas library). K.O.

I'm not so sure about that. One could consider musbstd.o as a driver, and the
philosophy used for midas programs is that drivers get added explicitly when
compiling a frontend. We do not put mvmestd.c and mcstd.c into libmidas since for
different interfaces a different driver might be required. If we at some point use
an usb library different than libusb.a, we would have to compile different
libmidas for these different drivers.

I know it's convenient to have things in libmidas and not having to specify it
expliceitely for each frontend, but it is then somehow inconsistent with drivers
for vme and camac. So please reconsider this again.

- Stefan
  2424   15 Aug 2022 Zaher SalmanBug Reportfirefox hangs due to mhistory
Firefox is hanging/becoming unresponsive due to javascript code. After stopping the script manually to get firefox back in control I have the following message in the console

17:21:28.821 Script terminated by timeout at:
MhistoryGraph.prototype.drawTAxis@http://lem03.psi.ch:8081/mhistory.js:2828:7
MhistoryGraph.prototype.draw@http://lem03.psi.ch:8081/mhistory.js:1792:9
mhistory.js:2828:7

Any ideas how to resolve this??
  2425   15 Aug 2022 Stefan RittBug Reportfirefox hangs due to mhistory
> Firefox is hanging/becoming unresponsive due to javascript code. After stopping the script manually to get firefox back in control I have the following message in the console
> 
> 17:21:28.821 Script terminated by timeout at:
> MhistoryGraph.prototype.drawTAxis@http://lem03.psi.ch:8081/mhistory.js:2828:7
> MhistoryGraph.prototype.draw@http://lem03.psi.ch:8081/mhistory.js:1792:9
> mhistory.js:2828:7
> 
> Any ideas how to resolve this??

I have to reproduce the problem. Can you send me the full URL from your browser when you see that problem? Probably you have some "special" axis limits, so we don't see that 
problem anywhere else.

Stefan
  2426   16 Aug 2022 Zaher SalmanBug Reportfirefox hangs due to mhistory
> > Firefox is hanging/becoming unresponsive due to javascript code. After stopping the script manually to get firefox back in control I have the following message in the console
> > 
> > 17:21:28.821 Script terminated by timeout at:
> > MhistoryGraph.prototype.drawTAxis@http://lem03.psi.ch:8081/mhistory.js:2828:7
> > MhistoryGraph.prototype.draw@http://lem03.psi.ch:8081/mhistory.js:1792:9
> > mhistory.js:2828:7
> > 
> > Any ideas how to resolve this??
> 
> I have to reproduce the problem. Can you send me the full URL from your browser when you see that problem? Probably you have some "special" axis limits, so we don't see that 
> problem anywhere else.
> 
> Stefan

Hi Stefan and Konstantin,

The URL (reachable only within PSI) is http://lem03.psi.ch:8081/?cmd=custom&page=Mudas
Firefox is version 91.12.0esr (64-bit), but I had similar issues with chrome/chromium too.
The hangs seem to happen randomly so I have not been able to reproduce it yet. 
I have histories here  http://lem03.psi.ch:8081/?cmd=custom&page=Mudas&tab=3 (30 minutes each), but I have also histories popping up in modals though they do not cause any issues. 
I'll try to reproduce it in the coming few days and report again.

thanks,
Zaher
  2427   16 Aug 2022 Zaher SalmanBug Reportfirefox hangs due to mhistory
I found the bug. The problem is triggered by changing the firefox window. This calls a function that is supposed to change the size of the history plot and it works well when the history plots are visible but not if the history plots are hidden in a javascript tab (not another firefox tab).

Is there a clean way to resize the history plot if the parent div changes size?? The offending code is
mhist[i].mhg = new MhistoryGraph(mhist[i]);
mhist[i].mhg.initializePanel(i);
mhist[i].mhg.resize();
mhist[i].resize = function () {
   mhis.mhg.resize();
};
  2428   16 Aug 2022 Konstantin OlchanskiBug Reportfirefox hangs due to mhistory
> > > Firefox is hanging/becoming unresponsive due to javascript code.
> 
> The URL (reachable only within PSI) is http://lem03.psi.ch:8081/?cmd=custom&page=Mudas

so malfunction is not in the midas history page, but in a custom page. I could help you debug it,
but you would have to provide the complete source code (javascript and html).

> Firefox is version 91.12.0esr (64-bit), but I had similar issues with chrome/chromium too.

my firefox is 103.something. when you say google-chrome has "similar issues",
I read it as "google-chrome does not show this same bug, but shows some other
bug somewhere else". (if I misread you, you have to write better).

but this gives you a front to attack your bugs. basically all browsers should render your
custom page exactly the same (unless you use some obscure or experimental feature, which I
recommend against).

so you tweak your page to identify the source of different rendering results, and try to eliminate it,
hopefully by the time you get your page render exactly the same everywhere, all the real bugs
have gotten shaken out, too. (this is similar to debugging a c++ program by compiling
it on linux, mac, windows, vax, raspbery pi, etc and checking that you get the same result everywhere).

> The hangs seem to happen randomly so I have not been able to reproduce it yet.

I find that javascript debuggers are not setup to debug hangs. I think debugger runs partially
inside the same javascript engine you are debugging, so both hang and debugging is impossible.

(latest google-chrome has another improvement, all pages from the same computer run in the same
javascript engine, so if one midas page stops (on exception or because I debug it), all midas pages
stop and I have to run two different browsers if I want to debug (i.e.) a history page crash
and look at odb at the same time. fun).

K.O.
  1181   13 Jun 2016 Konstantin OlchanskiBug Fixexample ssl certificate removed
I removed the example ssl certificate from the midas git repository (ssl_cert.pem). Now every midas 
installation must generate their own certificate - because to have any security at all each encryption 
private key has to be unique (and it has to be secret).

The command for generating a self-signed certificate is printed by mhttpd on startup:

openssl req -new -nodes -newkey rsa:2048 -sha256 -out ssl_cert.csr -keyout ssl_cert.key; openssl 
x509 -req -days 365 -sha256 -in ssl_cert.csr -signkey ssl_cert.key -out ssl_cert.pem; cat 
ssl_cert.key >> ssl_cert.pem

K.O.
  144   17 Jun 2003 Stefan Ritt example experiment makefile for NT
I have added ROOT support to midas\examples\experiment\makefile.nt. To 
compile the example experiment under Windows, one needs

1) Installed version of ROOT
2) Having ROOTSYS environment variable defined
3) Invoke "nmake -f makefile.nt" in the midas\examples\experiment directory

Please note that in the current release 3.05 of ROOT, sockets are not yet 
working under Windows, so the histogram server built into the analyzer 
cannot be accessed. It is however possible to output the analyzed data into 
a .root file and visualize it with the root browser like

analyzer -i run00001.mid -o run00001.root
  353   27 Feb 2007 Piotr ZolnierczukForumevent builder scalability
Hi there:
I have a question if there's anybody out there running MIDAS with event builder
that assembles events from more that just a few front ends (say on the order of
0x10 or more)?
Any experiences with scalability?

Cheers
 Piotr
  354   27 Feb 2007 Stefan RittForumevent builder scalability
> Hi there:
> I have a question if there's anybody out there running MIDAS with event builder
> that assembles events from more that just a few front ends (say on the order of
> 0x10 or more)?
> Any experiences with scalability?

At the MEG experiment at PSI we run with 5 front-ends (later 8), each running at
about 10 MB/sec. This gives an overall rate of 50MB/sec without any problem. The
CPU load on the backend (2.6 GHz dual Xenon) is 30% for the event builder and 26%
for the logger. The DANCE experiment at Los Alamos runs 17 front-ends if I'm not
mistaken (John?).
  355   27 Feb 2007 John M O'DonnellForumevent builder scalability
At Los Alamos, we have 15+1 frontends - the 15 between them read about 2 or 3
TB/hour and reduce it to 1 to 5 GB/hour which is then sent to the mevb on a 17th
computer.  The 16th frontend handles deadtime issues and scalers (small data rate).

frontends are 1GHz pentium 3, and backend is 2.8GHz dual CPU with hyperthreading.
Interconnect is 100Mb ethernet from frontends to switch, and 1Gb ethernet from
switch to backend.

Our bottle neck is (a) compactPCI backplane reading data from waveform digitizers
to the frontend CPUs and (b) CPU power on the frontend CPUs to analyzer the waveforms.

John
  356   27 Feb 2007 Stefan RittForumevent builder scalability
> Our bottle neck is (a) compactPCI backplane reading data from waveform digitizers
> to the frontend CPUs and (b) CPU power on the frontend CPUs to analyzer the waveforms.

I forgot to mention that our front-ends at MEG are 2.8 GHz dual Xenon with Hyperthreading.
This gives "virtual" 4 CPU cores which are really necessary for waveform calibration and
analysis. It makes use of the new multi-threading feature in the midas front-end. I run
actually 7 threads (one VME readout, 4 calibration threads, one encoding thread and the
main thread sending data to the backend. This speeds up data taking by a factor of four
compared to a single thread. So if one plans for waveform analysis in the frontend to
reduce the data, I would recommend a box with dual quad cores.
  357   02 Mar 2007 Kevin LynchForumevent builder scalability
> Hi there:
> I have a question if there's anybody out there running MIDAS with event builder
> that assembles events from more that just a few front ends (say on the order of
> 0x10 or more)?
> Any experiences with scalability?
> 
> Cheers
>  Piotr

Mulan (which you hopefully remember with great fondness :-) is currently running
around ten frontends, six of which produce data at any rate.  If I'm remembering
correctly, the event builder handles about 30-40MB/s.  You could probably ping Tim
Gorringe or his current postdoc Volodya Tishenko (tishenko@pa.uky.edu) if you want
more details.  Volodya solved a significant number of throughput related
bottlenecks in the year leading up to our 2006 run.
  358   03 Mar 2007 Piotr ZolnierczukForumevent builder scalability
Hi all,
thank you for all responses. 

It seems that there's no problem running MIDAS with event builder assembling
data from ~10 front-ends. How about ~100? One possible solution is to have a
multi-tiered architecture. 

The reason I am asking is that we are in the process of designing an Ethernet
based DAQ system with front-ends running on embedded computers (Linux/ARM
CPU/Xilinix FPGA) and MIDAS is one of my options as a DAQ framework.
I am open for advice/suggestions.

Thanks again
  Piotr
  359   03 Mar 2007 Stefan RittForumevent builder scalability
> It seems that there's no problem running MIDAS with event builder assembling
> data from ~10 front-ends. How about ~100? One possible solution is to have a
> multi-tiered architecture. 
> 
> The reason I am asking is that we are in the process of designing an Ethernet
> based DAQ system with front-ends running on embedded computers (Linux/ARM
> CPU/Xilinix FPGA) and MIDAS is one of my options as a DAQ framework.
> I am open for advice/suggestions.

The event builder is a standalone application not part of the "midas core". It
receives data from N producers and combines the fragments into events based on
their serial number as a dedicated process. If it would become a bottleneck, it
can simply be redesigned and optimized. I made currently good experience with
multi-threaded applications running on multi-core CPUs. Implementing your
multi-tiered architecture as a multi-threaded event builder, where each of ten
threads receives data from ten front-ends, combines them and passes them to the
"collector thread" would make sense to me. Between the threads you can pass data
with many GB/sec, as compared to an ethernet-based architecture. I currently
implemented the rb_xxx functions inside midas.c which lets you pass data between
threads on a zero-copy basis.

Inside the core functions of midas there is no limitations whatsoever. All
counters etc. are 32-bit, so you can run 2^32 data consumers etc. You will first
hit the OS process limit. What I'm more concerned is your network bandwidth. If
you run 100 front-ends each with more than 1MB/sec, you would hit the 1GBit limit
of your network card. If you put more network interfaces, you will hit the disk
I/O limit which is around 100-200MB/sec even on larger RAID1 disk arrays (unless
you do data compression during event building). 

Another limit I see is the run transition. On each start/stop of a run, the
process which wants to start/stop the run has to contact all producers via a TCP
connection. Opening 100 TCP connection will take maybe 10-30 seconds, which is not
very convenient. A multi-threaded approach will help, but this is not (yet)
implemented, maybe you would have to do it yourself.

Another approach would be that you put the event building "in front of midas". All
your front-ends run a specific protocol outside of midas. They send their data to
a collecting process which acts as a single front-end to midas. So in the midas
framework you see only a single front-end, which gets it's data not from hardware,
but from 100 other nodes. This way you can optimize the protocol between your
front-end nodes and the collector process for your application. Run transitions
can be done through multicast UDP messages for example, which will even work with
1000 front-ends. But you have to implement that yourself.

I would start with the first approach: Taking the out-of-the box midas, see how
far I get. If you have access to a normal linux cluster, you can simply run ten
dummy front-ends on each of ten nodes, thus simulating 100 front-ends and see how
far you get. If the event builder is the bottle neck, do an optimization or
redesign. If the run transitions become your bottle neck, switch to method two. In
both ways you can utilize the downstream part of midas, like the logger, the
history system, etc. so you would still gain a lot compared to a design from scratch.

Best regards,

  Stefan
  1624   21 Jul 2019 Konstantin OlchanskiInfoerror handling is hard
Happy summer to everybody.

When programming in general, and when programming MIDAS, there is always a struggle
with error handling. Too much? Too little? Visually, some MIDAS functions are 90% error handling, 10% useful work, if that.

What is the right way to do this?
Where is the balance?
Would c++ exceptions help or hinder?
How to make it better?

It turns out that the Go community have been struggling with exactly this for the last year or so.

Read the (ultimately rejected) proposal for improved error handling in Go. All the problems and difficulties
and struggles facing the programmer and the language designer spread right in front of us.

https://go.googlesource.com/proposal/+/master/design/go2draft-error-handling-overview.md

(To remember, Go is this: https://www.oreilly.com/library/view/the-go-programming/9780134190570/
The language designers are Brian W. Kernighan, Rob Pike, Ken Thompson and "some other people"
(Dennis Ritchie is no longer with us). These people gave us UNIX and C (and C++, the C++ guy was
their next-door-office mate), when that crowd speaks, I listen)

That write-up refers to some blogger's vivid illustration how correct error handling is hard,
with special focus on error handling in the presence of exceptions:

https://devblogs.microsoft.com/oldnewthing/?p=39683
https://devblogs.microsoft.com/oldnewthing/?p=36693

I read all this stuff and said, "wow, this is the reader's digest version of my life in computer programming".

The clincher from this last guy? "My point isnít that exceptions are bad.
My point is that exceptions are too hard and Iím not smart
enough to handle them."

"back to writing some error handling code",
K.O.
ELOG V3.1.4-2e1708b5