Back Midas Rome Roody Rootana
  Midas DAQ System, Page 46 of 146  Not logged in ELOG logo
New entries since:Wed Dec 31 16:00:00 1969
ID Date Author Topic Subjectdown
  353   27 Feb 2007 Piotr ZolnierczukForumevent builder scalability
Hi there:
I have a question if there's anybody out there running MIDAS with event builder
that assembles events from more that just a few front ends (say on the order of
0x10 or more)?
Any experiences with scalability?

Cheers
 Piotr
  354   27 Feb 2007 Stefan RittForumevent builder scalability
> Hi there:
> I have a question if there's anybody out there running MIDAS with event builder
> that assembles events from more that just a few front ends (say on the order of
> 0x10 or more)?
> Any experiences with scalability?

At the MEG experiment at PSI we run with 5 front-ends (later 8), each running at
about 10 MB/sec. This gives an overall rate of 50MB/sec without any problem. The
CPU load on the backend (2.6 GHz dual Xenon) is 30% for the event builder and 26%
for the logger. The DANCE experiment at Los Alamos runs 17 front-ends if I'm not
mistaken (John?).
  355   27 Feb 2007 John M O'DonnellForumevent builder scalability
At Los Alamos, we have 15+1 frontends - the 15 between them read about 2 or 3
TB/hour and reduce it to 1 to 5 GB/hour which is then sent to the mevb on a 17th
computer.  The 16th frontend handles deadtime issues and scalers (small data rate).

frontends are 1GHz pentium 3, and backend is 2.8GHz dual CPU with hyperthreading.
Interconnect is 100Mb ethernet from frontends to switch, and 1Gb ethernet from
switch to backend.

Our bottle neck is (a) compactPCI backplane reading data from waveform digitizers
to the frontend CPUs and (b) CPU power on the frontend CPUs to analyzer the waveforms.

John
  356   27 Feb 2007 Stefan RittForumevent builder scalability
> Our bottle neck is (a) compactPCI backplane reading data from waveform digitizers
> to the frontend CPUs and (b) CPU power on the frontend CPUs to analyzer the waveforms.

I forgot to mention that our front-ends at MEG are 2.8 GHz dual Xenon with Hyperthreading.
This gives "virtual" 4 CPU cores which are really necessary for waveform calibration and
analysis. It makes use of the new multi-threading feature in the midas front-end. I run
actually 7 threads (one VME readout, 4 calibration threads, one encoding thread and the
main thread sending data to the backend. This speeds up data taking by a factor of four
compared to a single thread. So if one plans for waveform analysis in the frontend to
reduce the data, I would recommend a box with dual quad cores.
  357   02 Mar 2007 Kevin LynchForumevent builder scalability
> Hi there:
> I have a question if there's anybody out there running MIDAS with event builder
> that assembles events from more that just a few front ends (say on the order of
> 0x10 or more)?
> Any experiences with scalability?
> 
> Cheers
>  Piotr

Mulan (which you hopefully remember with great fondness :-) is currently running
around ten frontends, six of which produce data at any rate.  If I'm remembering
correctly, the event builder handles about 30-40MB/s.  You could probably ping Tim
Gorringe or his current postdoc Volodya Tishenko (tishenko@pa.uky.edu) if you want
more details.  Volodya solved a significant number of throughput related
bottlenecks in the year leading up to our 2006 run.
  358   03 Mar 2007 Piotr ZolnierczukForumevent builder scalability
Hi all,
thank you for all responses. 

It seems that there's no problem running MIDAS with event builder assembling
data from ~10 front-ends. How about ~100? One possible solution is to have a
multi-tiered architecture. 

The reason I am asking is that we are in the process of designing an Ethernet
based DAQ system with front-ends running on embedded computers (Linux/ARM
CPU/Xilinix FPGA) and MIDAS is one of my options as a DAQ framework.
I am open for advice/suggestions.

Thanks again
  Piotr
  359   03 Mar 2007 Stefan RittForumevent builder scalability
> It seems that there's no problem running MIDAS with event builder assembling
> data from ~10 front-ends. How about ~100? One possible solution is to have a
> multi-tiered architecture. 
> 
> The reason I am asking is that we are in the process of designing an Ethernet
> based DAQ system with front-ends running on embedded computers (Linux/ARM
> CPU/Xilinix FPGA) and MIDAS is one of my options as a DAQ framework.
> I am open for advice/suggestions.

The event builder is a standalone application not part of the "midas core". It
receives data from N producers and combines the fragments into events based on
their serial number as a dedicated process. If it would become a bottleneck, it
can simply be redesigned and optimized. I made currently good experience with
multi-threaded applications running on multi-core CPUs. Implementing your
multi-tiered architecture as a multi-threaded event builder, where each of ten
threads receives data from ten front-ends, combines them and passes them to the
"collector thread" would make sense to me. Between the threads you can pass data
with many GB/sec, as compared to an ethernet-based architecture. I currently
implemented the rb_xxx functions inside midas.c which lets you pass data between
threads on a zero-copy basis.

Inside the core functions of midas there is no limitations whatsoever. All
counters etc. are 32-bit, so you can run 2^32 data consumers etc. You will first
hit the OS process limit. What I'm more concerned is your network bandwidth. If
you run 100 front-ends each with more than 1MB/sec, you would hit the 1GBit limit
of your network card. If you put more network interfaces, you will hit the disk
I/O limit which is around 100-200MB/sec even on larger RAID1 disk arrays (unless
you do data compression during event building). 

Another limit I see is the run transition. On each start/stop of a run, the
process which wants to start/stop the run has to contact all producers via a TCP
connection. Opening 100 TCP connection will take maybe 10-30 seconds, which is not
very convenient. A multi-threaded approach will help, but this is not (yet)
implemented, maybe you would have to do it yourself.

Another approach would be that you put the event building "in front of midas". All
your front-ends run a specific protocol outside of midas. They send their data to
a collecting process which acts as a single front-end to midas. So in the midas
framework you see only a single front-end, which gets it's data not from hardware,
but from 100 other nodes. This way you can optimize the protocol between your
front-end nodes and the collector process for your application. Run transitions
can be done through multicast UDP messages for example, which will even work with
1000 front-ends. But you have to implement that yourself.

I would start with the first approach: Taking the out-of-the box midas, see how
far I get. If you have access to a normal linux cluster, you can simply run ten
dummy front-ends on each of ten nodes, thus simulating 100 front-ends and see how
far you get. If the event builder is the bottle neck, do an optimization or
redesign. If the run transitions become your bottle neck, switch to method two. In
both ways you can utilize the downstream part of midas, like the logger, the
history system, etc. so you would still gain a lot compared to a design from scratch.

Best regards,

  Stefan
  1624   21 Jul 2019 Konstantin OlchanskiInfoerror handling is hard
Happy summer to everybody.

When programming in general, and when programming MIDAS, there is always a struggle
with error handling. Too much? Too little? Visually, some MIDAS functions are 90% error handling, 10% useful work, if that.

What is the right way to do this?
Where is the balance?
Would c++ exceptions help or hinder?
How to make it better?

It turns out that the Go community have been struggling with exactly this for the last year or so.

Read the (ultimately rejected) proposal for improved error handling in Go. All the problems and difficulties
and struggles facing the programmer and the language designer spread right in front of us.

https://go.googlesource.com/proposal/+/master/design/go2draft-error-handling-overview.md

(To remember, Go is this: https://www.oreilly.com/library/view/the-go-programming/9780134190570/
The language designers are Brian W. Kernighan, Rob Pike, Ken Thompson and "some other people"
(Dennis Ritchie is no longer with us). These people gave us UNIX and C (and C++, the C++ guy was
their next-door-office mate), when that crowd speaks, I listen)

That write-up refers to some blogger's vivid illustration how correct error handling is hard,
with special focus on error handling in the presence of exceptions:

https://devblogs.microsoft.com/oldnewthing/?p=39683
https://devblogs.microsoft.com/oldnewthing/?p=36693

I read all this stuff and said, "wow, this is the reader's digest version of my life in computer programming".

The clincher from this last guy? "My point isn’t that exceptions are bad.
My point is that exceptions are too hard and I’m not smart
enough to handle them."

"back to writing some error handling code",
K.O.
  1081   29 Jul 2015 Javier PraenaForumerror
Hello, I am new in the forum. We are running an experiment for a week with no
problems. Now we add a detector a we found an error. Even we come back to our
previous configuration the error continues appearing. Please, may someone help
us? You can find the error in the attachment. Thanks!
Attachment 1: sigsegv-error.jpg
sigsegv-error.jpg
  1083   29 Jul 2015 Wes GohnForumerror
SIGSEGV is a segmentation fault. Most often it means some ODB parameter is out of bounds or there is 
an invalid memcpy somewhere in your code.

> Hello, I am new in the forum. We are running an experiment for a week with no
> problems. Now we add a detector a we found an error. Even we come back to our
> previous configuration the error continues appearing. Please, may someone help
> us? You can find the error in the attachment. Thanks!
  1084   29 Jul 2015 Konstantin OlchanskiForumerror
> Hello, I am new in the forum. We are running an experiment for a week with no
> problems. Now we add a detector a we found an error. Even we come back to our
> previous configuration the error continues appearing. Please, may someone help
> us? You can find the error in the attachment. Thanks!

The error reported is SIGSEGV, which is a software fault (as opposed to a hardware fault like "printer is on fire" or "disk full").

Next step is to identify which program crashed and attach a debugger to the crashing executable or to the core dump.
You will use the debugger to generate the stack trace which will identify exactly the place where the program failed.

I recommend that one should always attach the stack trace to the problem reports on this forum. These stack traces are sometimes long and 
scary and it is a bit of an art to read them, so do not worry if you do not understand what they say.

If you use "gdb", I recommend that you post your full debugger session:

bash> gdb myprogram
gdb> run my command line arguments
*crash*
gdb> where
... stack trace
gdb> quit

(If you use threads, please generate a stack trace for each thread)

If the crash location is inside midas code, congratulations, you may have found a bug in midas.
If the crash location is in your code, you have some debugging to do.
If you do not understand what I am talking about (gdb? core dump?), please read "unix/linux software development for dummies" book first.

K.O.
  2378   30 Mar 2022 Konstantin OlchanskiBug Fixerroneous removal of odb clients, fixed
commit https://bitbucket.org/tmidas/midas/commits/b1fe21445109774be3f059c2124727b414abf835
made on 2022-02-21 fixed a serious bug in ODB.

a multithread race condition against an incorrectly updated shared variable caused removal of 
random clients from ODB with error message:

My client index %d in ODB is invalid: out of range 0..%d. Maybe this client was removed by a 
timeout, see midas.log. Cannot continue, aborting...

the race is between db_open_database() in one program (executed when any midas program starts) and 
db_get_my_client_locked() in all running midas programs.

as long as no midas programs are started (db_open_database() is not executed), this bug does not 
happen.

if i.e. odbedit is executed very often, i.e. from a script, probability of hitting this bug becomes 
quite high.

fixed now.

K.O.
  2606   19 Sep 2023 Frederik WautersBug Reportepics fe "Start Command"
The epics frontend overwrites the "start command" odb after each start:

  // set start command in ODB
   midas::odb efe("/Programs/EPICS Frontend");
   std::string p(__FILE__);
   std::string s("build/epics_fe");
   auto i = p.find("epics_fe.cxx");
   p.replace(i, s.length(), s);
   p = p.substr(0, i + s.length());
   efe["Start command"].set_string_size(p, 256);

this should be set such that it only writes when the key is not there. It causes the following issue: on a pc with multiple experiments defined, you need to start the fe's with a "-e <name>" flag. 
  2607   20 Sep 2023 Stefan RittBug Reportepics fe "Start Command"
Thanks for reporting this problem. It has been fixed today, so the start command is only written if it's emtpy.

Stefan
  1369   17 May 2018 Zaher SalmanForumembedding history in SVG
I am embedding histories into a custom page within an SVG,

<image x="21000" y="1000" width="6000" height="6000"
href="../HS/SampleCryo/SampleTemp.gif?width=230&scale=0.5h"/>

this works fine. However, I would like to update this regularly without
refreshing the full page via

<meta http-equiv="Refresh" content="60">

is there a good way to do that? By the way, the "Periodic update of parts of a
custom page" from the documentation does not seem to work here.
  2128   10 Mar 2021 Zaher SalmanSuggestionembed modbvalue in SVG
Is it possible to embed modbvalue in an SVG for use within a custom page?

thanks.
  2129   10 Mar 2021 Stefan RittSuggestionembed modbvalue in SVG
You can't really embed it, but you can overlay it. You tag the SVG with a 
"relative" position and then move the modbvalue with an "absolute" position over 
it:

<svg style="position:relative" width="400" height="100">
  <rect width="300" height="100" style="fill:rgb(255,0,0);stroke-width:3;stroke:rgb(0,0,0)" />
  <div class="modbvalue" style="position:absolute;top:50px;left:50px" data-odb-path="/Runinfo/Run number"></div>
</svg>
  2156   26 Apr 2021 Zaher SalmanSuggestionembed modbvalue in SVG
I found a way to embed modbvalue into a SVG:

<text x="100" y="100" font-size="30rem">
Run=<tspan class="modbvalue" data-odb-path="/Runinfo/Run number"></tspan>
</text>

This seems to behave better that the suggestion below.

> You can't really embed it, but you can overlay it. You tag the SVG with a 
> "relative" position and then move the modbvalue with an "absolute" position over 
> it:
> 
> <svg style="position:relative" width="400" height="100">
>   <rect width="300" height="100" style="fill:rgb(255,0,0);stroke-width:3;stroke:rgb(0,0,0)" />
>   <div class="modbvalue" style="position:absolute;top:50px;left:50px" data-odb-path="/Runinfo/Run number"></div>
> </svg>
  1172   22 Mar 2016 Konstantin OlchanskiInfoemacs web-mode.el
For those who use emacs to edit web pages - the built-in CSS and Javascript modes seem to work 
just fine for editing files.css and files.js, but the built-in html modes fall flat on modern web pages
which contain a mix of html, javascript inside <script> tags and javascript inside button "onclick" 
attributes.

So I looked at several emacs "html5 modes" and web-mode.el works well for me - html is indented 
correctly (default indent level is easy to change), javascript inside <script> tags is indented 
correctly (default indent level is easy to change), javascript inside "onclick" attributes has to be 
indented manually.

web-mode code repository and instructions are here, the author is very responsive and fixed my 
one request (permit manual intentation of javascript inside html attributes):
https://github.com/fxbois/web-mode

I now edit the html files in the MIDAS repository using these emacs settings:

8s-macbook-pro:web-mode 8ss$ more ~/.emacs
(setq-default indent-tabs-mode nil)
(setq-default tab-width 3)
(add-to-list 'load-path "~/git/web-mode")
(require 'web-mode)
(add-to-list 'auto-mode-alist '("\\.html\\'" . web-mode))
(setq web-mode-markup-indent-offset 2)
(setq web-mode-css-indent-offset 2)
(setq web-mode-code-indent-offset 3)
(setq web-mode-script-padding 0)
(setq web-mode-attr-indent-offset 2)
8s-macbook-pro:web-mode 8ss$ 

K.O.
  64   30 Mar 2004 Konstantin Olchanski elog fixes
I am about to commit the mhttpd Elog fixes we have been using in TWIST since
about October. The infamous Elog "last N days" problem is fixed, sundry
memory overruns are caught and assert()ed.

For the curious, the "last N days" problem was caused by uninitialized data
in the elog handling code. A non-zero-terminated string was read from a file
and passed to atoi(). Here is a simplifed illustration:

char str[256]; // uninitialized, filled with whatever happens on the stack
read(file,str,6); // read 6 bytes, non-zero terminated
// str now looks like this: "123456UUUUUUUUU....", "U" is uninitialized memory
int len = atoi(str); // if the first "U" happens to be a number, we lose.

The obvious fix is to add "str[6]=0" before the atoi() call.

Attached is the CVS diff for the proposed changes. Please comment.

K.O.
Attachment 1: elog-fixes.txt
Index: src/midas.c
===================================================================
RCS file: /usr/local/cvsroot/midas/src/midas.c,v
retrieving revision 1.203
diff -u -r1.203 midas.c
--- src/midas.c	19 Mar 2004 09:58:22 -0000	1.203
+++ src/midas.c	31 Mar 2004 05:11:00 -0000
@@ -14814,8 +14814,9 @@
 \********************************************************************/
 
 /********************************************************************/
-void el_decode(char *message, char *key, char *result)
+void el_decode(char *message, char *key, char *result, int size)
 {
+   char *rstart = result;
    char *pc;
 
    if (result == NULL)
@@ -14828,6 +14829,8 @@
          *result++ = *pc++;
       *result = 0;
    }
+
+   assert(strlen(rstart) < size);
 }
 
 /**dox***************************************************************/
@@ -15020,9 +15023,9 @@
          size = atoi(str + 9);
          read(fh, message, size);
 
-         el_decode(message, "Date: ", date);
-         el_decode(message, "Thread: ", thread);
-         el_decode(message, "Attachment: ", attachment);
+         el_decode(message, "Date: ", date, sizeof(date));
+         el_decode(message, "Thread: ", thread, sizeof(thread));
+         el_decode(message, "Attachment: ", attachment, sizeof(attachment));
 
          /* buffer tail of logfile */
          lseek(fh, 0, SEEK_END);
@@ -15092,7 +15095,7 @@
       sprintf(message + strlen(message), "========================================\n");
       strcat(message, text);
 
-      assert(strlen(message) < sizeof(message));        // bomb out on array overrun.
+      assert(strlen(message) < sizeof(message)); /* bomb out on array overrun. */
 
       size = 0;
       sprintf(start_str, "$Start$: %6d\n", size);
@@ -15104,6 +15107,9 @@
          sprintf(tag, "%02d%02d%02d.%d", tms->tm_year % 100, tms->tm_mon + 1,
                  tms->tm_mday, (int) TELL(fh));
 
+      /* size has to fit in 6 digits */
+      assert(size < 999999);
+
       sprintf(start_str, "$Start$: %6d\n", size);
       sprintf(end_str, "$End$:   %6d\n\f", size);
 
@@ -15339,13 +15345,20 @@
          return EL_FILE_ERROR;
       }
 
-      if (strncmp(str, "$End$: ", 7) == 0) {
-         size = atoi(str + 7);
-         lseek(*fh, -size, SEEK_CUR);
-      } else {
+      if (strncmp(str, "$End$: ", 7) != 0) {
          close(*fh);
          return EL_FILE_ERROR;
       }
+        
+      /* make sure the input string to atoi() is zero-terminated:
+       * $End$:      355garbage
+       * 01234567890123456789 */
+      str[15] = 0;
+
+      size = atoi(str + 7);
+      assert(size > 15);
+
+      lseek(*fh, -size, SEEK_CUR);
 
       /* adjust tag */
       sprintf(strchr(tag, '.') + 1, "%d", (int) TELL(*fh));
@@ -15364,14 +15377,21 @@
       }
       lseek(*fh, -15, SEEK_CUR);
 
-      if (strncmp(str, "$Start$: ", 9) == 0) {
-         size = atoi(str + 9);
-         lseek(*fh, size, SEEK_CUR);
-      } else {
+      if (strncmp(str, "$Start$: ", 9) != 0) {
          close(*fh);
          return EL_FILE_ERROR;
       }
 
+      /* make sure the input string to atoi() is zero-terminated
+       * $Start$:    606garbage
+       * 01234567890123456789 */
+      str[15] = 0;
+
+      size = atoi(str+9);
+      assert(size > 15);
+
+      lseek(*fh, size, SEEK_CUR);
+
       /* if EOF, goto next day */
       i = read(*fh, str, 15);
       if (i < 15) {
@@ -15444,7 +15464,7 @@
 
 \********************************************************************/
 {
-   int size, fh, offset, search_status;
+   int size, fh = 0, offset, search_status, rd;
    char str[256], *p;
    char message[10000], thread[256], attachment_all[256];
 
@@ -15462,10 +15482,24 @@
 
    /* extract message size */
    offset = TELL(fh);
-   read(fh, str, 16);
-   size = atoi(str + 9);
+   rd = read(fh, str, 15);
+   assert(rd == 15);
+   
+   /* make sure the input string is zero-terminated before we call atoi() */
+   str[15] = 0;
+
+   /* get size */
+   size = atoi(str+9);
+   
+   assert(strncmp(str,"$Start$:",8) == 0);
+   assert(size > 15);
+   assert(size < sizeof(message));
+   
    memset(message, 0, sizeof(message));
-   read(fh, message, size);
+
+   rd = read(fh, message, size);
+   assert(rd > 0);
+   assert((rd+15 == size)||(rd == size));
 
    close(fh);
 
@@ -15473,14 +15507,14 @@
    if (strstr(message, "Run: ") && run)
       *run = atoi(strstr(message, "Run: ") + 5);
 
-   el_decode(message, "Date: ", date);
-   el_decode(message, "Thread: ", thread);
-   el_decode(message, "Author: ", author);
-   el_decode(message, "Type: ", type);
-   el_decode(message, "System: ", system);
-   el_decode(message, "Subject: ", subject);
-   el_decode(message, "Attachment: ", attachment_all);
-   el_decode(message, "Encoding: ", encoding);
+   el_decode(message, "Date: ",     date,     80); /* size from show_elog_submit_query() */
+   el_decode(message, "Thread: ", thread, sizeof(thread));
+   el_decode(message, "Author: ",   author,   80); /* size from show_elog_submit_query() */
+   el_decode(message, "Type: ",     type,     80); /* size from show_elog_submit_query() */
+   el_decode(message, "System: ",   system,   80); /* size from show_elog_submit_query() */
+   el_decode(message, "Subject: ",  subject, 256); /* size from show_elog_submit_query() */
+   el_decode(message, "Attachment: ", attachment_all, sizeof(attachment_all));
+   el_decode(message, "Encoding: ", encoding, 80); /* size from show_elog_submit_query() */
 
    /* break apart attachements */
    if (attachment1 && attachment2 && attachment3) {
@@ -15496,6 +15530,10 @@
                strcpy(attachment3, p);
          }
       }
+
+      assert(strlen(attachment1) < 256); /* size from show_elog_submit_query() */
+      assert(strlen(attachment2) < 256); /* size from show_elog_submit_query() */
+      assert(strlen(attachment3) < 256); /* size from show_elog_submit_query() */
    }
 
    /* conver thread in reply-to and reply-from */
ELOG V3.1.4-2e1708b5