Back Midas Rome Roody Rootana
  Midas DAQ System, Page 77 of 142  Not logged in ELOG logo
Entry  05 Sep 2024, Jack Carlton, Forum, Python frontend rate limitations? frontend.pyfrontend.cxx
I'm trying to get a sense of the rate limitations of a python frontend. I 
understand this will vary from system to system.

I adapted two frontends from the example templates, one in C++ and one in python. 
Both simply fill a midas bank with a fixed length array of zeros at a given polled 
rate. However, the C++ frontend is about 100 times faster in both data and event 
rates. This seems slow, even for an interpreted language like python. Furthermore, 
I can effectively increase the maximum rate by concurrently running a second 
python frontend (this is not the case for the C++ frontend). In short, there is 
some limitation with using python here unrelated to hardware.

In my case, poll_func appears to be called at 100Hz at best. What limits the rate 
that poll_func is called in a python frontend? Is there a more appropriate 
solution for increasing the python frontend data/event rate than simply launching 
more frontends?

I've attached my C++ and python frontend files for reference.

Thanks,
Jack
    Reply  05 Sep 2024, Ben Smith, Forum, Python frontend rate limitations? 
> What limits the rate that poll_func is called in a python frontend? 

First the general advice: if you reduce the "period" of your equipment, then your function will get called more frequently. You can set it to 0 and we'll call it as often as possible. You can set this in the ODB at "/Equipment/Python Data Simulator/Common/Period"

If that's still not fast enough, then you can return a *list* of events from your readout_func. I've seen real-world cases of 25kHz+ of midas events generated in this fashion.


However in your case the limitation is likely that you're sending 1.25MB per event and we have a lot of data marshalling to do between the python and C++ layer. In particular it takes 15ms on my machine to just pack the data into a memory buffer (see timeit command below). I am sure there must be a faster way to do this packing, especially in the case where the bank contains a numpy array rather than a python list.

I'll add it to my to-do list to investigate improving the performance of medium-to-large events in the python code.


Cheers,
Ben


P.S. You may have a bug in your calculations (depending on how you did your testing). In poll_func I think you should be updating the stats every time the function is called, not just the times when you return True.


P.P.S. Command I used to test how slow it is to pack the data. One-time setup of creating the buffers, then multiple tests of the pack_into function:

python -m timeit -s "import struct;import ctypes;arr = [0]*1250001;buf = ctypes.create_string_buffer(10000000);fmt = \">1250000d\"" "struct.pack_into(fmt, buf, *arr)"
20 loops, best of 5: 15.3 msec per loop
    Reply  05 Sep 2024, Jack Carlton, Forum, Python frontend rate limitations? 
Thank you, this was very helpful.

> First the general advice: if you reduce the "period" of your equipment, then your function will get called more frequently. You can set it to 0 and we'll call it as often as possible. You can set this in the ODB at "/Equipment/Python Data Simulator/Common/Period"

Thanks, I thought that was just for periodic triggering (or at least that's how I've used it in C++ frontends). Changing this allowed me to get past the 100Hz event rate cap I described.

> If that's still not fast enough, then you can return a *list* of events from your readout_func. I've seen real-world cases of 25kHz+ of midas events generated in this fashion.
> 
> 
> However in your case the limitation is likely that you're sending 1.25MB per event and we have a lot of data marshalling to do between the python and C++ layer. In particular it takes 15ms on my machine to just pack the data into a memory buffer (see timeit command below). I am sure there must be a faster way to do this packing, especially in the case where the bank contains a numpy array rather than a python list.
> 
> I'll add it to my to-do list to investigate improving the performance of medium-to-large events in the python code.
> 
> 
> Cheers,
> Ben


> P.S. You may have a bug in your calculations (depending on how you did your testing). In poll_func I think you should be updating the stats every time the function is called, not just the times when you return True.
I had tested the way you described at first, then later changed

> P.P.S. Command I used to test how slow it is to pack the data. One-time setup of creating the buffers, then multiple tests of the pack_into function:
> 
> python -m timeit -s "import struct;import ctypes;arr = [0]*1250001;buf = ctypes.create_string_buffer(10000000);fmt = \">1250000d\"" "struct.pack_into(fmt, buf, *arr)"
> 20 loops, best of 5: 15.3 msec per loop
    Reply  05 Sep 2024, Stefan Ritt, Forum, Python frontend rate limitations? 
> First the general advice: if you reduce the "period" of your equipment, then your function will get called more frequently. 
> You can set it to 0 and we'll call it as often as possible. You can set this in the ODB at "/Equipment/Python Data Simulator/Common/Period"

Just for your general understanding: The "period" i the C framework works differently. It calls the poll function with a number, 
and then that number is used in the poll function like (simplified):

poll(INT count) {
   for (i=0 ; i<count ; i++)
      if (new_event())
         return TRUE;
   return FALSE;
}

This ensures that polling is done as quickly as possible, even staying in the same function (poll) rather than called from the 
framework in a loop (which would require a function call to poll each time). The "count" is determined from the framework
during startup of the framework such that the execution time of the poll() routine equals the "period". Like if the period 
is 0.1, the count might be a few millions, so that the poll routine returns immediately when a new event occurs or when
100ms have expired. During the polling the frontend is "dead" meaning it cannot react on run transitions for example. That's
why most experiments use 0.1-0.5 seconds. But this does then NOT mean that you can only have 10-2 events per second, but that
the reaction time if the frontend is at maximum 0.1-0.5 seconds which is acceptable most of the case. 

Due to this design, the C frontend is capable of producing millions of events per second. It took me some while in the early 1990's
to work out that scheme sitting in the "R" trailer at TRIUMF (old guys will remember...).

Best,
Stefan
    Reply  06 Sep 2024, Jack Carlton, Forum, Python frontend rate limitations? 
Thanks for the responses, they were very helpful.

>First the general advice: if you reduce the "period" of your equipment, then your function will get called more frequently. You can set it to 0 and we'll 
call it as often as possible.

Thanks, this solves the event rate limitation I described. I didn't think to change this because the "period" did not affect the observed rate in C (and now 
I know why thanks to Stefan).

A couple more questions:

1. 
For me, 
python -m timeit -s "import struct;import ctypes;arr = [0]*1250001;buf = ctypes.create_string_buffer(10000000);fmt = \">1250000d\"" "struct.pack_into(fmt, 
buf, *arr)"
10 loops, best of 3: 43.7 msec per loop

which suggests my maximum data rate is about 1.25 MB * 1000/43.7 Hz = 23 MB/s (?). But I see data rates up to 60 MB/s with a python frontend. Am I 
misinterpreting the meaning of this result?


2. I can effectively bypass the rate limitations in python by running two concurrent frontends. For example, with one python frontend at best I can generate 
60 MB/s of data (setting "period" to 0 now); but with two frontends I can double this to 120 MB/s. This implies one python frontend is not bottlenecked by 
hardware limitations in my case.

Am I doing something wrong to artificially bottleneck my frontends? Perhaps there's a multi-threading solution I can implement to avoid needing multiple 
frontends?


Thanks,
Jack
    Reply  11 Sep 2024, Konstantin Olchanski, Forum, Python frontend rate limitations? 
> I'm trying to get a sense of the rate limitations of a python frontend.

1) python is single-threaded, for ultimate performance, a MIDAS frontend (or any DAQ 
application) has to be multithreaded:
a) thread with busy loop read the data and place it into a FIFO
b) thread to read data from FIFO and send it to SYSTEM buffer shared memory or to 
mserver
c) thread to respond to begin-run, end-run, etc RPCs
d) probably a thread to recycle memory from thread (b) back to thread (a) if per-event 
malloc()/free() adds too much overhead

2) data readout. C++ AXI bus access is compiled into 1 instruction and results in 1 AXI 
bus operation. comparable for python likely has much more overhead, slows you down.

3) event bank filling. C++ for() loop is compiled into very compact machine code, 
python loop cannot because each array element can be random data type, shows you down.

bottom line, there is a reason high speed data acquisitions are written in C/C++, not 
in shell, perl, tcl/tk, or (today's favourite) python.

> The C++ frontend is about 100 times faster in both data and event rates.

This is as expected. You can probably improve python code to get closer to 10 times 
slower than C++. But consider:

a) will it be "fast enough" for the task?
b) learning C++ and optimizing python to within "2-3-10x slower than C++" may involve a 
similar amount of time and effort.

And you have not looked at the real-time properties of your frontend. You may discover 
that it's actually faster than you think, but occasionally stops for a millisecond (or 
two or hundred). some applications a notorious for running memory garbage collection 
just at the wrong time.

I am working right now on exactly this problem, I have a 1 GHz ARM CPU (Cyclone-V FPGA) 
and I need to push data out at 100 Mbytes/sec while avoiding and bad-real-time dropouts  
that cause the FPGA data FIFO to overflow. And I only have 2 CPU cores, 1 to read the 
FPGA FIFO, 1 to run the TCP/IP stack and the ethernet driver. No this can be done with 
python.

K.O.
    Reply  11 Sep 2024, Konstantin Olchanski, Forum, Python frontend rate limitations? 
> 
> poll(INT count) {
>    for (i=0 ; i<count ; i++)
>       if (new_event())
>          return TRUE;
>    return FALSE;
> }

in the c++ frontend (tmfe.h) this loop usually runs in a separate thread, and I am now working on the linux magic to assign this thread maximum 
uninterruptible priority. otherwise on my Cyclone-V FPGA SoC I see 1-10 msec dropouts, I think from taking ethernet interrupts.

K.O.
    Reply  11 Sep 2024, Konstantin Olchanski, Forum, Python frontend rate limitations? 
> > I'm trying to get a sense of the rate limitations of a python frontend.

forgot one more:

c++ toolchain comes with extensive profiler tools aimed to answer the question "why is my 
program so slow, where is it spending all the time?". some of these tools go all the way to 
the hardware level and report CPU cache misses, TLB flushes, context switches and any other 
hardware events that interrupt or slow down computations. programmer than uses this 
information to restructure the code to avoid the worst slow downs (i.e. avoid branch mis-
predictions, avoid cache misses, etc).

I doubt the python toolchain will ever profiler tools as good.

K.O.
    Reply  27 Sep 2024, Ben Smith, Forum, Python frontend rate limitations? 
> in your case the limitation is likely that you're sending 1.25MB per event and we have a lot of data marshalling to do between the python and C++ layer.
> 
> I'll add it to my to-do list to investigate improving the performance of medium-to-large events in the python code.

I've now added better support for numpy arrays in the python code that encodes a `midas.event.Event` object. If you use the "correct" numpy data type then you can get vastly improved performance as numpy already stores the data in memory in the format that we need.

In your example, if you change
        self.zero_buffer = [0] * self.total_data_size
to 
        self.zero_buffer = np.ndarray(self.total_data_size, np.int16)

then the max data rate of the frontend goes from 330MB/s to 7600MB/s on my laptop (a factor 20 improvement from one line of code!) 

To ensure you're using the optimal numpy dtype for your bank, you can reference a dict called `midas.tid_np_formats`. For example `midas.tid_np_formats[midas.TID_SHORT]` is equivalent to `np.int16`. If you use an int16 array and write it as a TID_SHORT bank, then we'll use the fast path. If there is a mismatch, we'll have to do type conversions and will end up on the slow path.
Entry  10 Mar 2022, Gennaro Tortone, Bug Report, Python ODB watch 
Hi,

I have an issue with ODB watch on MIDAS Python library;

I wrote a simple frontend that read/write FPGA registers through 
ODB keys (simplified version at link below):

https://gist.github.com/gtortone/cd035a9ac4ea7a78ea9cd931e80e2c75

Everything works fine but there is a boolean array
in Settings (Enable ADC sampling) that I need to "toggle" 
(19 bit to 0 and 19 bit to 1). This operation is handled by
detailed_settings_changed_func that write the value of
toggled bit to FPGA.

The issue is that if I quickly toggle the boolean array by
odbedit:

set "/Equipment/odbtest/Settings/Enable ADC sampling[0-18]" 0
set "/Equipment/odbtest/Settings/Enable ADC sampling[0-18]" 1

I see in the Python script the following list of callbacks:

detailed_settings_changed_func: /Equipment/odbtest/Settings/Enable ADC sampling[0] - new value 0
detailed_settings_changed_func: /Equipment/odbtest/Settings/Enable ADC sampling[1] - new value 0
detailed_settings_changed_func: /Equipment/odbtest/Settings/Enable ADC sampling[2] - new value 0
detailed_settings_changed_func: /Equipment/odbtest/Settings/Enable ADC sampling[3] - new value 0
detailed_settings_changed_func: /Equipment/odbtest/Settings/Enable ADC sampling[4] - new value 0
detailed_settings_changed_func: /Equipment/odbtest/Settings/Enable ADC sampling[5] - new value 0
detailed_settings_changed_func: /Equipment/odbtest/Settings/Enable ADC sampling[6] - new value 0
detailed_settings_changed_func: /Equipment/odbtest/Settings/Enable ADC sampling[7] - new value 0
detailed_settings_changed_func: /Equipment/odbtest/Settings/Enable ADC sampling[8] - new value 1    ***
detailed_settings_changed_func: /Equipment/odbtest/Settings/Enable ADC sampling[9] - new value 1
detailed_settings_changed_func: /Equipment/odbtest/Settings/Enable ADC sampling[10] - new value 1
detailed_settings_changed_func: /Equipment/odbtest/Settings/Enable ADC sampling[11] - new value 1
detailed_settings_changed_func: /Equipment/odbtest/Settings/Enable ADC sampling[12] - new value 1
detailed_settings_changed_func: /Equipment/odbtest/Settings/Enable ADC sampling[13] - new value 1
detailed_settings_changed_func: /Equipment/odbtest/Settings/Enable ADC sampling[14] - new value 1
detailed_settings_changed_func: /Equipment/odbtest/Settings/Enable ADC sampling[15] - new value 1
detailed_settings_changed_func: /Equipment/odbtest/Settings/Enable ADC sampling[16] - new value 1
detailed_settings_changed_func: /Equipment/odbtest/Settings/Enable ADC sampling[17] - new value 1
detailed_settings_changed_func: /Equipment/odbtest/Settings/Enable ADC sampling[18] - new value 1
detailed_settings_changed_func: /Equipment/odbtest/Settings/Enable ADC sampling[0] - new value 1
detailed_settings_changed_func: /Equipment/odbtest/Settings/Enable ADC sampling[1] - new value 1
detailed_settings_changed_func: /Equipment/odbtest/Settings/Enable ADC sampling[2] - new value 1
detailed_settings_changed_func: /Equipment/odbtest/Settings/Enable ADC sampling[3] - new value 1
detailed_settings_changed_func: /Equipment/odbtest/Settings/Enable ADC sampling[4] - new value 1
detailed_settings_changed_func: /Equipment/odbtest/Settings/Enable ADC sampling[5] - new value 1
detailed_settings_changed_func: /Equipment/odbtest/Settings/Enable ADC sampling[6] - new value 1
detailed_settings_changed_func: /Equipment/odbtest/Settings/Enable ADC sampling[7] - new value 1
detailed_settings_changed_func: /Equipment/odbtest/Settings/Enable ADC sampling[8] - new value 1
detailed_settings_changed_func: /Equipment/odbtest/Settings/Enable ADC sampling[9] - new value 1
detailed_settings_changed_func: /Equipment/odbtest/Settings/Enable ADC sampling[10] - new value 1
detailed_settings_changed_func: /Equipment/odbtest/Settings/Enable ADC sampling[11] - new value 1
detailed_settings_changed_func: /Equipment/odbtest/Settings/Enable ADC sampling[12] - new value 1
detailed_settings_changed_func: /Equipment/odbtest/Settings/Enable ADC sampling[13] - new value 1
detailed_settings_changed_func: /Equipment/odbtest/Settings/Enable ADC sampling[14] - new value 1
detailed_settings_changed_func: /Equipment/odbtest/Settings/Enable ADC sampling[15] - new value 1
detailed_settings_changed_func: /Equipment/odbtest/Settings/Enable ADC sampling[16] - new value 1
detailed_settings_changed_func: /Equipment/odbtest/Settings/Enable ADC sampling[17] - new value 1
detailed_settings_changed_func: /Equipment/odbtest/Settings/Enable ADC sampling[18] - new value 1

It seems that the second write operation "overlaps" the first one...

The same behavior is not observed using a 'watch' in odbedit...

I can overcame this problem using the value of register as ODB key to avoid 
array of boolean... but I report this issue as "possible" bug/limitation on Python ODB watch;

Cheers,
Gennaro
    Reply  16 Mar 2022, Ben Smith, Bug Report, Python ODB watch 
> It seems that the second write operation "overlaps" the first one...

Hi Gennaro,

In principle the same issue can happen in C++ code, but is much less likely as the callbacks get executed  more quickly (partly due to C++/python in general, and partly because the python code does some extra work to make the interface more user-friendly). The C++ code at the end of this message adds a 100ms sleep to the callback and can result in output like this when you do quick edits of "Test[0-19]" in odbedit.

Element 1 is 0
Element 2 is 0
Element 3 is 0
Element 4 is 0
Element 5 is 0
Element 6 is 0
Element 7 is 1
Element 8 is 1
Element 9 is 1
etc...


I agree that this can be a really nasty source of bugs if you need to react to every change. I'll add a warning to the python docstrings, but I can't think of a way to make this more robust at the midas level - I think we'd need some sort of ODB "snapshot" system...









#include "midas.h"

void watch_fn(HNDLE hDB, HNDLE hKey, int index, void *info) {
  DWORD data = 0;
  INT buf_size = sizeof(data);
  db_get_data_index(hDB, hKey, &data, &buf_size, index, TID_DWORD);
  printf("Element %d is %u\n", index, data);
  ss_sleep(100);
}

int main() {
  HNDLE hDB, hClient, hTestKey;

  std::string host, expt;
  cm_get_environment(&host, &expt);
  cm_connect_experiment(host.c_str(), expt.c_str(), "test_odb", nullptr);
  cm_get_experiment_database(&hDB, &hClient);

  static const DWORD numValues = 20;
  DWORD data[numValues] = {};
  db_set_value(hDB, 0, "Test", data, sizeof(DWORD) * numValues, numValues, TID_DWORD);
  db_find_key(hDB, 0, "Test", &hTestKey);
  db_watch(hDB, hTestKey, watch_fn, nullptr);

  printf("Press any key to exit loop...\n");

  while (!ss_kbhit()) {
    cm_yield(1);
  }

  db_unwatch_all();
  db_delete_key(hDB, hTestKey, FALSE);
  cm_disconnect_experiment();

  return 0;
}
    Reply  21 Mar 2022, Stefan Ritt, Bug Report, Python ODB watch 
What you describe is a well-known problem with the ODB. At PSI we have similar issues. There are
two approaches to solve it:

1) Write values one-by-one to the ODB, but do not trigger a watch update. In the sequencer, this
can be achieved with the ODBSET command (see https://daq00.triumf.ca/MidasWiki/index.php/Sequencer 
and the last paragraph right of the ODBSET command). You use notify=0 for all set commands except
the last one where you use notify=1. On the C++ API, you can use db_set_data_index1() which has
this notify flag as the last parameter.

2) You add intelligence to your front-end. If you get a watchdog update, you do not apply this
directly to the hardware, but put it into a FIFO. Once you do not get any more update for a certain
period (like 1s is a good value), you empty the FIFO and apply all setting immediately.

Both methods have been used at PSI successfully, although 1) is much easier to implement, especially
if you use the midas sequencer.

Stefan
Entry  03 Sep 2009, Exaos Lee, Bug Report, Prompt problem about odbedit Screenshot-10.png
I tried to use odbedit to set the "/System/Prompt" to "%h:%e:%s %p> " and got a
problem: pressing "Return" doesn't work any more. But "[%h:%e:%s]%p> " works fine.
Please see the attachment.
    Reply  03 Sep 2009, Stefan Ritt, Bug Report, Prompt problem about odbedit 
> I tried to use odbedit to set the "/System/Prompt" to "%h:%e:%s %p> " and got a
> problem: pressing "Return" doesn't work any more. But "[%h:%e:%s]%p> " works fine.
> Please see the attachment.

I fixed that problem in SVN revision 4556. It occurred when the prompt does start with a 
'%' which nobody tried before...
Entry  28 Aug 2018, Lukas Gerritzen, Forum, Problems with virtual history events 
Hi,
I am trying to set up virtual history events following
https://midas.triumf.ca/MidasWiki/index.php/History_System#Virtual_History_Event 

Trying it the first way, using the following setup:
Key name                        Type    #Val  Size  Last Opn Mode Value
---------------------------------------------------------------------------
Links                           DIR
    dirlink -> External/dir     KEY     1     12    >99d 0   RWD  <subdirectory>


Key name                        Type    #Val  Size  Last Opn Mode Value
---------------------------------------------------------------------------
External                        DIR
    dir                         DIR
        foo                     FLOAT   1     4     16s  0   RWD  12.5


Then I get the following error message:
==================== History link "dirlink", ID 28150  =======================
[Logger,ERROR] [mlogger.cxx:4942:open_history,ERROR] History event dirlink has
no variables in ODB


Trying the second way, I set up the following:
Key name                        Type    #Val  Size  Last Opn Mode Value
---------------------------------------------------------------------------
Links                           DIR
    dir                         DIR
        testlink -> External/foo
                                FLOAT   1     4     8m   0   RWD  5.2

Key name                        Type    #Val  Size  Last Opn Mode Value
---------------------------------------------------------------------------
External                        DIR
    foo                         FLOAT   1     4     6m   0   RWD  5.2


Starting mlogger in verbose mode yields the following error:
==================== History link "dir", ID 28150  =======================
[Logger,ERROR] [mlogger.cxx:4935:open_history,ERROR] History link
/History/Links/dir/testlink is invalid
Error in history system, aborting startup.

I'm not sure if this is a bug or just a case of PEBCAK.

Finally, to set the update period, do I need entries in /history/links periods
with the tag name? Is there a way to only write them in the history file when
they change? I want to use the virtual history events for measurements I get
from external scripts, some periodic, some manual.

Thanks
    Reply  28 Aug 2018, Konstantin Olchanski, Forum, Problems with virtual history events 
Hi, what you try should have worked. Perhaps your symlink is wrong and should say "/External/..." (with a leading slash). The "links period" would have
worked same as equipment/common/history period - as a rate limiter.

Anyhow, I suggest another way to do the same - create a fake equipment - the logger does
not care if the equipment is real or not and if you write into /eq/fake/variables from a proper frontend
or from a script. To hide the fake equipment from the status page, set /eq/fake/common/hidden to "true".

This will work for sure.

K.O.




> Hi,
> I am trying to set up virtual history events following
> https://midas.triumf.ca/MidasWiki/index.php/History_System#Virtual_History_Event 
> 
> Trying it the first way, using the following setup:
> Key name                        Type    #Val  Size  Last Opn Mode Value
> ---------------------------------------------------------------------------
> Links                           DIR
>     dirlink -> External/dir     KEY     1     12    >99d 0   RWD  <subdirectory>
> 
> 
> Key name                        Type    #Val  Size  Last Opn Mode Value
> ---------------------------------------------------------------------------
> External                        DIR
>     dir                         DIR
>         foo                     FLOAT   1     4     16s  0   RWD  12.5
> 
> 
> Then I get the following error message:
> ==================== History link "dirlink", ID 28150  =======================
> [Logger,ERROR] [mlogger.cxx:4942:open_history,ERROR] History event dirlink has
> no variables in ODB
> 
> 
> Trying the second way, I set up the following:
> Key name                        Type    #Val  Size  Last Opn Mode Value
> ---------------------------------------------------------------------------
> Links                           DIR
>     dir                         DIR
>         testlink -> External/foo
>                                 FLOAT   1     4     8m   0   RWD  5.2
> 
> Key name                        Type    #Val  Size  Last Opn Mode Value
> ---------------------------------------------------------------------------
> External                        DIR
>     foo                         FLOAT   1     4     6m   0   RWD  5.2
> 
> 
> Starting mlogger in verbose mode yields the following error:
> ==================== History link "dir", ID 28150  =======================
> [Logger,ERROR] [mlogger.cxx:4935:open_history,ERROR] History link
> /History/Links/dir/testlink is invalid
> Error in history system, aborting startup.
> 
> I'm not sure if this is a bug or just a case of PEBCAK.
> 
> Finally, to set the update period, do I need entries in /history/links periods
> with the tag name? Is there a way to only write them in the history file when
> they change? I want to use the virtual history events for measurements I get
> from external scripts, some periodic, some manual.
> 
> Thanks
Entry  28 Feb 2018, Thomas Lindner, Bug Report, Problems with start program button with new mhttpd webpages 
Pierre Gorel identified a problem with the 'start program' button on the new version of MIDAS that uses the 
mjsonrpc functions for building the webpages.  In particular, he tracked the problem down to some 
questionable std::string / char* handling.  

Interestingly, the particular 'start program' problem was seen on Pierre's Ubuntu 16.04.3 LTS machine, but 
could not be reproduced on RHEL-7 or Macos 10.13 machines.  So the manifestation of the code error 
seemed to depend on the compiler.

The problem should now be fixed in the HEAD version of MIDAS.  If you are using the newer MIDAS (since last 
summer), particularly on Ubuntu, then you may want to update your installation. 

Details of the problem are on the bitbucket issue tracker:

https://bitbucket.org/tmidas/midas/issues/132/corruption-of-char-in-mjsonrpccxx
Entry  17 Jan 2011, Andreas Suter, Bug Report, Problems with midas history SVN 4936 
I have the following problems after updating to midas SVN 4936: the history 
system (web-page via mhttpd) seems to stop working. I checked the history files 
themself and they are indeed written, except that the events ID's are not the 
same anymore (I mean the ones defined under /Equipment/XXX/Common/Event ID), 
rather the mlogger seems to choose an ID by itself.
Currently the only way to get things working again was to recompile midas with 
adding -DOLD_HISTORY to the CFLAGS which is troublesome since it is likely to be 
forgotton with the next SVN update. When looking into the SVN I have the 
impression there is something going on concerning the history system, however
I couldn't find any documentation.
What is the best practice for the future, in order not to run into any problems 
but still being able to look at the old history (also from within the web-page 
via mhttpd)?
    Reply  13 Feb 2011, Lee Pool, Bug Report, Problems with midas history SVN 4936 
> I have the following problems after updating to midas SVN 4936: the history 
> system (web-page via mhttpd) seems to stop working. I checked the history files 
> themself and they are indeed written, except that the events ID's are not the 
> same anymore (I mean the ones defined under /Equipment/XXX/Common/Event ID), 
> rather the mlogger seems to choose an ID by itself.
> Currently the only way to get things working again was to recompile midas with 
> adding -DOLD_HISTORY to the CFLAGS which is troublesome since it is likely to be 
> forgotton with the next SVN update. When looking into the SVN I have the 
> impression there is something going on concerning the history system, however
> I couldn't find any documentation.
> What is the best practice for the future, in order not to run into any problems 
> but still being able to look at the old history (also from within the web-page 
> via mhttpd)?

Hi... 

Do you mind giving little more detail? We might have the same issue, where we got
complaints that midas history stops working after a certain time.

L
    Reply  16 Feb 2011, Konstantin Olchanski, Bug Report, Problems with midas history SVN 4936 
> I have the following problems after updating to midas SVN 4936: the history 
> system (web-page via mhttpd) seems to stop working. I checked the history files 
> themself and they are indeed written, except that the events ID's are not the 
> same anymore (I mean the ones defined under /Equipment/XXX/Common/Event ID), 
> rather the mlogger seems to choose an ID by itself.

Yes, I found the problem - it was introduced around svn rev 4827 in September 2010.

It is fixed now, please do this:
1) update history_midas.c to latest svn rev 4979
1a) do NOT update any other files - update only history_midas.c
2) rebuild mlogger (it will do no harm and no good if you rebuild everything)
3) odbedit save odb.xml
4) in odb, remove /history/events and /history/tags (you can also set "/History/DisableTags" to "y")
5) restart mlogger
6) observe that odb /history/events now has event ids same as equipment ids
7) restart your frontend, observe that history file is growing
8) use mhdump to observe that history is now written with correct event id
9) go to mhttpd history plot, you should see the new data coming in. Plot history in the "1 year" scale, you 
should see the old data and you should see a gap where data was written with wrong event id
10) I should still have  an mhrewrite program sitting somewhere that can change the event ids inside midas 
history files, if you have many data files with wrong event id, let me know, I will find this program and tell you 
how to use it to repair your data files.

> Currently the only way to get things working again was to recompile midas with 
> adding -DOLD_HISTORY to the CFLAGS which is troublesome since it is likely to be 
> forgotton with the next SVN update.

Yes, I am glad you found OLD_HISTORY, I kept it just for the case some breakage like this happens. I will still 
keep it around until the dust settles.

> When looking into the SVN I have the  impression there is something going on concerning the history 
system, however I couldn't find any documentation.

Yes, you found the right stuff, and it is partially documented. mlogger uses /History/Events to map history 
event names (equipment names in your case) to history event ids. But in your case, the wrong event id has 
been assigned by mlogger so nothing worked right. As a bonus, I now see inconsistency between event_id 
code remaining in mlogger (which is not used) and event_id code in history_midas (which *is* used). I will be 
straightening this stuff over the next few days.

I hope my correction to history_midas.cxx is good enough to get you going for now.

> What is the best practice for the future, in order not to run into any problems 
> but still being able to look at the old history (also from within the web-page 
> via mhttpd)?

Personally, I think that the midas history storage into binary files is not robust enough
when facing changes to equipment and event ids, renaming and deleting of stuff, etc. There
are other limitations, as well, i.e. the 16-bit history event id, etc.

The newly implemented SQL history storage (uses ODBC layer, MySQL supported, PgSQL partially
implemented) does not have any of these problems and seems to work well enough
for T2K/ND280. Sometimes MySQL history is even faster when making history plots in mhttpd.

I am now thinking about implementing SQL history storage in SQLite files, and it will not have
any of these problems, too. Performance and robustness for database corruption remain a question, though.

K.O.
ELOG V3.1.4-2e1708b5