05 Sep 2024, Jack Carlton, Forum, Python frontend rate limitations?
|
I'm trying to get a sense of the rate limitations of a python frontend. I
understand this will vary from system to system.
I adapted two frontends from the example templates, one in C++ and one in python.
Both simply fill a midas bank with a fixed length array of zeros at a given polled
rate. However, the C++ frontend is about 100 times faster in both data and event
rates. This seems slow, even for an interpreted language like python. Furthermore,
I can effectively increase the maximum rate by concurrently running a second
python frontend (this is not the case for the C++ frontend). In short, there is
some limitation with using python here unrelated to hardware.
In my case, poll_func appears to be called at 100Hz at best. What limits the rate
that poll_func is called in a python frontend? Is there a more appropriate
solution for increasing the python frontend data/event rate than simply launching
more frontends?
I've attached my C++ and python frontend files for reference.
Thanks,
Jack |
05 Sep 2024, Ben Smith, Forum, Python frontend rate limitations?
|
> What limits the rate that poll_func is called in a python frontend?
First the general advice: if you reduce the "period" of your equipment, then your function will get called more frequently. You can set it to 0 and we'll call it as often as possible. You can set this in the ODB at "/Equipment/Python Data Simulator/Common/Period"
If that's still not fast enough, then you can return a *list* of events from your readout_func. I've seen real-world cases of 25kHz+ of midas events generated in this fashion.
However in your case the limitation is likely that you're sending 1.25MB per event and we have a lot of data marshalling to do between the python and C++ layer. In particular it takes 15ms on my machine to just pack the data into a memory buffer (see timeit command below). I am sure there must be a faster way to do this packing, especially in the case where the bank contains a numpy array rather than a python list.
I'll add it to my to-do list to investigate improving the performance of medium-to-large events in the python code.
Cheers,
Ben
P.S. You may have a bug in your calculations (depending on how you did your testing). In poll_func I think you should be updating the stats every time the function is called, not just the times when you return True.
P.P.S. Command I used to test how slow it is to pack the data. One-time setup of creating the buffers, then multiple tests of the pack_into function:
python -m timeit -s "import struct;import ctypes;arr = [0]*1250001;buf = ctypes.create_string_buffer(10000000);fmt = \">1250000d\"" "struct.pack_into(fmt, buf, *arr)"
20 loops, best of 5: 15.3 msec per loop |
05 Sep 2024, Jack Carlton, Forum, Python frontend rate limitations?
|
Thank you, this was very helpful.
> First the general advice: if you reduce the "period" of your equipment, then your function will get called more frequently. You can set it to 0 and we'll call it as often as possible. You can set this in the ODB at "/Equipment/Python Data Simulator/Common/Period"
Thanks, I thought that was just for periodic triggering (or at least that's how I've used it in C++ frontends). Changing this allowed me to get past the 100Hz event rate cap I described.
> If that's still not fast enough, then you can return a *list* of events from your readout_func. I've seen real-world cases of 25kHz+ of midas events generated in this fashion.
>
>
> However in your case the limitation is likely that you're sending 1.25MB per event and we have a lot of data marshalling to do between the python and C++ layer. In particular it takes 15ms on my machine to just pack the data into a memory buffer (see timeit command below). I am sure there must be a faster way to do this packing, especially in the case where the bank contains a numpy array rather than a python list.
>
> I'll add it to my to-do list to investigate improving the performance of medium-to-large events in the python code.
>
>
> Cheers,
> Ben
> P.S. You may have a bug in your calculations (depending on how you did your testing). In poll_func I think you should be updating the stats every time the function is called, not just the times when you return True.
I had tested the way you described at first, then later changed
> P.P.S. Command I used to test how slow it is to pack the data. One-time setup of creating the buffers, then multiple tests of the pack_into function:
>
> python -m timeit -s "import struct;import ctypes;arr = [0]*1250001;buf = ctypes.create_string_buffer(10000000);fmt = \">1250000d\"" "struct.pack_into(fmt, buf, *arr)"
> 20 loops, best of 5: 15.3 msec per loop |
05 Sep 2024, Stefan Ritt, Forum, Python frontend rate limitations?
|
> First the general advice: if you reduce the "period" of your equipment, then your function will get called more frequently.
> You can set it to 0 and we'll call it as often as possible. You can set this in the ODB at "/Equipment/Python Data Simulator/Common/Period"
Just for your general understanding: The "period" i the C framework works differently. It calls the poll function with a number,
and then that number is used in the poll function like (simplified):
poll(INT count) {
for (i=0 ; i<count ; i++)
if (new_event())
return TRUE;
return FALSE;
}
This ensures that polling is done as quickly as possible, even staying in the same function (poll) rather than called from the
framework in a loop (which would require a function call to poll each time). The "count" is determined from the framework
during startup of the framework such that the execution time of the poll() routine equals the "period". Like if the period
is 0.1, the count might be a few millions, so that the poll routine returns immediately when a new event occurs or when
100ms have expired. During the polling the frontend is "dead" meaning it cannot react on run transitions for example. That's
why most experiments use 0.1-0.5 seconds. But this does then NOT mean that you can only have 10-2 events per second, but that
the reaction time if the frontend is at maximum 0.1-0.5 seconds which is acceptable most of the case.
Due to this design, the C frontend is capable of producing millions of events per second. It took me some while in the early 1990's
to work out that scheme sitting in the "R" trailer at TRIUMF (old guys will remember...).
Best,
Stefan |
06 Sep 2024, Jack Carlton, Forum, Python frontend rate limitations?
|
Thanks for the responses, they were very helpful.
>First the general advice: if you reduce the "period" of your equipment, then your function will get called more frequently. You can set it to 0 and we'll
call it as often as possible.
Thanks, this solves the event rate limitation I described. I didn't think to change this because the "period" did not affect the observed rate in C (and now
I know why thanks to Stefan).
A couple more questions:
1.
For me,
python -m timeit -s "import struct;import ctypes;arr = [0]*1250001;buf = ctypes.create_string_buffer(10000000);fmt = \">1250000d\"" "struct.pack_into(fmt,
buf, *arr)"
10 loops, best of 3: 43.7 msec per loop
which suggests my maximum data rate is about 1.25 MB * 1000/43.7 Hz = 23 MB/s (?). But I see data rates up to 60 MB/s with a python frontend. Am I
misinterpreting the meaning of this result?
2. I can effectively bypass the rate limitations in python by running two concurrent frontends. For example, with one python frontend at best I can generate
60 MB/s of data (setting "period" to 0 now); but with two frontends I can double this to 120 MB/s. This implies one python frontend is not bottlenecked by
hardware limitations in my case.
Am I doing something wrong to artificially bottleneck my frontends? Perhaps there's a multi-threading solution I can implement to avoid needing multiple
frontends?
Thanks,
Jack |
11 Sep 2024, Konstantin Olchanski, Forum, Python frontend rate limitations?
|
> I'm trying to get a sense of the rate limitations of a python frontend.
1) python is single-threaded, for ultimate performance, a MIDAS frontend (or any DAQ
application) has to be multithreaded:
a) thread with busy loop read the data and place it into a FIFO
b) thread to read data from FIFO and send it to SYSTEM buffer shared memory or to
mserver
c) thread to respond to begin-run, end-run, etc RPCs
d) probably a thread to recycle memory from thread (b) back to thread (a) if per-event
malloc()/free() adds too much overhead
2) data readout. C++ AXI bus access is compiled into 1 instruction and results in 1 AXI
bus operation. comparable for python likely has much more overhead, slows you down.
3) event bank filling. C++ for() loop is compiled into very compact machine code,
python loop cannot because each array element can be random data type, shows you down.
bottom line, there is a reason high speed data acquisitions are written in C/C++, not
in shell, perl, tcl/tk, or (today's favourite) python.
> The C++ frontend is about 100 times faster in both data and event rates.
This is as expected. You can probably improve python code to get closer to 10 times
slower than C++. But consider:
a) will it be "fast enough" for the task?
b) learning C++ and optimizing python to within "2-3-10x slower than C++" may involve a
similar amount of time and effort.
And you have not looked at the real-time properties of your frontend. You may discover
that it's actually faster than you think, but occasionally stops for a millisecond (or
two or hundred). some applications a notorious for running memory garbage collection
just at the wrong time.
I am working right now on exactly this problem, I have a 1 GHz ARM CPU (Cyclone-V FPGA)
and I need to push data out at 100 Mbytes/sec while avoiding and bad-real-time dropouts
that cause the FPGA data FIFO to overflow. And I only have 2 CPU cores, 1 to read the
FPGA FIFO, 1 to run the TCP/IP stack and the ethernet driver. No this can be done with
python.
K.O. |
11 Sep 2024, Konstantin Olchanski, Forum, Python frontend rate limitations?
|
>
> poll(INT count) {
> for (i=0 ; i<count ; i++)
> if (new_event())
> return TRUE;
> return FALSE;
> }
in the c++ frontend (tmfe.h) this loop usually runs in a separate thread, and I am now working on the linux magic to assign this thread maximum
uninterruptible priority. otherwise on my Cyclone-V FPGA SoC I see 1-10 msec dropouts, I think from taking ethernet interrupts.
K.O. |
11 Sep 2024, Konstantin Olchanski, Forum, Python frontend rate limitations?
|
> > I'm trying to get a sense of the rate limitations of a python frontend.
forgot one more:
c++ toolchain comes with extensive profiler tools aimed to answer the question "why is my
program so slow, where is it spending all the time?". some of these tools go all the way to
the hardware level and report CPU cache misses, TLB flushes, context switches and any other
hardware events that interrupt or slow down computations. programmer than uses this
information to restructure the code to avoid the worst slow downs (i.e. avoid branch mis-
predictions, avoid cache misses, etc).
I doubt the python toolchain will ever profiler tools as good.
K.O. |
27 Sep 2024, Ben Smith, Forum, Python frontend rate limitations?
|
> in your case the limitation is likely that you're sending 1.25MB per event and we have a lot of data marshalling to do between the python and C++ layer.
>
> I'll add it to my to-do list to investigate improving the performance of medium-to-large events in the python code.
I've now added better support for numpy arrays in the python code that encodes a `midas.event.Event` object. If you use the "correct" numpy data type then you can get vastly improved performance as numpy already stores the data in memory in the format that we need.
In your example, if you change
self.zero_buffer = [0] * self.total_data_size
to
self.zero_buffer = np.ndarray(self.total_data_size, np.int16)
then the max data rate of the frontend goes from 330MB/s to 7600MB/s on my laptop (a factor 20 improvement from one line of code!)
To ensure you're using the optimal numpy dtype for your bank, you can reference a dict called `midas.tid_np_formats`. For example `midas.tid_np_formats[midas.TID_SHORT]` is equivalent to `np.int16`. If you use an int16 array and write it as a TID_SHORT bank, then we'll use the fast path. If there is a mismatch, we'll have to do type conversions and will end up on the slow path. |
10 Mar 2022, Gennaro Tortone, Bug Report, Python ODB watch
|
Hi,
I have an issue with ODB watch on MIDAS Python library;
I wrote a simple frontend that read/write FPGA registers through
ODB keys (simplified version at link below):
https://gist.github.com/gtortone/cd035a9ac4ea7a78ea9cd931e80e2c75
Everything works fine but there is a boolean array
in Settings (Enable ADC sampling) that I need to "toggle"
(19 bit to 0 and 19 bit to 1). This operation is handled by
detailed_settings_changed_func that write the value of
toggled bit to FPGA.
The issue is that if I quickly toggle the boolean array by
odbedit:
set "/Equipment/odbtest/Settings/Enable ADC sampling[0-18]" 0
set "/Equipment/odbtest/Settings/Enable ADC sampling[0-18]" 1
I see in the Python script the following list of callbacks:
detailed_settings_changed_func: /Equipment/odbtest/Settings/Enable ADC sampling[0] - new value 0
detailed_settings_changed_func: /Equipment/odbtest/Settings/Enable ADC sampling[1] - new value 0
detailed_settings_changed_func: /Equipment/odbtest/Settings/Enable ADC sampling[2] - new value 0
detailed_settings_changed_func: /Equipment/odbtest/Settings/Enable ADC sampling[3] - new value 0
detailed_settings_changed_func: /Equipment/odbtest/Settings/Enable ADC sampling[4] - new value 0
detailed_settings_changed_func: /Equipment/odbtest/Settings/Enable ADC sampling[5] - new value 0
detailed_settings_changed_func: /Equipment/odbtest/Settings/Enable ADC sampling[6] - new value 0
detailed_settings_changed_func: /Equipment/odbtest/Settings/Enable ADC sampling[7] - new value 0
detailed_settings_changed_func: /Equipment/odbtest/Settings/Enable ADC sampling[8] - new value 1 ***
detailed_settings_changed_func: /Equipment/odbtest/Settings/Enable ADC sampling[9] - new value 1
detailed_settings_changed_func: /Equipment/odbtest/Settings/Enable ADC sampling[10] - new value 1
detailed_settings_changed_func: /Equipment/odbtest/Settings/Enable ADC sampling[11] - new value 1
detailed_settings_changed_func: /Equipment/odbtest/Settings/Enable ADC sampling[12] - new value 1
detailed_settings_changed_func: /Equipment/odbtest/Settings/Enable ADC sampling[13] - new value 1
detailed_settings_changed_func: /Equipment/odbtest/Settings/Enable ADC sampling[14] - new value 1
detailed_settings_changed_func: /Equipment/odbtest/Settings/Enable ADC sampling[15] - new value 1
detailed_settings_changed_func: /Equipment/odbtest/Settings/Enable ADC sampling[16] - new value 1
detailed_settings_changed_func: /Equipment/odbtest/Settings/Enable ADC sampling[17] - new value 1
detailed_settings_changed_func: /Equipment/odbtest/Settings/Enable ADC sampling[18] - new value 1
detailed_settings_changed_func: /Equipment/odbtest/Settings/Enable ADC sampling[0] - new value 1
detailed_settings_changed_func: /Equipment/odbtest/Settings/Enable ADC sampling[1] - new value 1
detailed_settings_changed_func: /Equipment/odbtest/Settings/Enable ADC sampling[2] - new value 1
detailed_settings_changed_func: /Equipment/odbtest/Settings/Enable ADC sampling[3] - new value 1
detailed_settings_changed_func: /Equipment/odbtest/Settings/Enable ADC sampling[4] - new value 1
detailed_settings_changed_func: /Equipment/odbtest/Settings/Enable ADC sampling[5] - new value 1
detailed_settings_changed_func: /Equipment/odbtest/Settings/Enable ADC sampling[6] - new value 1
detailed_settings_changed_func: /Equipment/odbtest/Settings/Enable ADC sampling[7] - new value 1
detailed_settings_changed_func: /Equipment/odbtest/Settings/Enable ADC sampling[8] - new value 1
detailed_settings_changed_func: /Equipment/odbtest/Settings/Enable ADC sampling[9] - new value 1
detailed_settings_changed_func: /Equipment/odbtest/Settings/Enable ADC sampling[10] - new value 1
detailed_settings_changed_func: /Equipment/odbtest/Settings/Enable ADC sampling[11] - new value 1
detailed_settings_changed_func: /Equipment/odbtest/Settings/Enable ADC sampling[12] - new value 1
detailed_settings_changed_func: /Equipment/odbtest/Settings/Enable ADC sampling[13] - new value 1
detailed_settings_changed_func: /Equipment/odbtest/Settings/Enable ADC sampling[14] - new value 1
detailed_settings_changed_func: /Equipment/odbtest/Settings/Enable ADC sampling[15] - new value 1
detailed_settings_changed_func: /Equipment/odbtest/Settings/Enable ADC sampling[16] - new value 1
detailed_settings_changed_func: /Equipment/odbtest/Settings/Enable ADC sampling[17] - new value 1
detailed_settings_changed_func: /Equipment/odbtest/Settings/Enable ADC sampling[18] - new value 1
It seems that the second write operation "overlaps" the first one...
The same behavior is not observed using a 'watch' in odbedit...
I can overcame this problem using the value of register as ODB key to avoid
array of boolean... but I report this issue as "possible" bug/limitation on Python ODB watch;
Cheers,
Gennaro |
16 Mar 2022, Ben Smith, Bug Report, Python ODB watch
|
> It seems that the second write operation "overlaps" the first one...
Hi Gennaro,
In principle the same issue can happen in C++ code, but is much less likely as the callbacks get executed more quickly (partly due to C++/python in general, and partly because the python code does some extra work to make the interface more user-friendly). The C++ code at the end of this message adds a 100ms sleep to the callback and can result in output like this when you do quick edits of "Test[0-19]" in odbedit.
Element 1 is 0
Element 2 is 0
Element 3 is 0
Element 4 is 0
Element 5 is 0
Element 6 is 0
Element 7 is 1
Element 8 is 1
Element 9 is 1
etc...
I agree that this can be a really nasty source of bugs if you need to react to every change. I'll add a warning to the python docstrings, but I can't think of a way to make this more robust at the midas level - I think we'd need some sort of ODB "snapshot" system...
#include "midas.h"
void watch_fn(HNDLE hDB, HNDLE hKey, int index, void *info) {
DWORD data = 0;
INT buf_size = sizeof(data);
db_get_data_index(hDB, hKey, &data, &buf_size, index, TID_DWORD);
printf("Element %d is %u\n", index, data);
ss_sleep(100);
}
int main() {
HNDLE hDB, hClient, hTestKey;
std::string host, expt;
cm_get_environment(&host, &expt);
cm_connect_experiment(host.c_str(), expt.c_str(), "test_odb", nullptr);
cm_get_experiment_database(&hDB, &hClient);
static const DWORD numValues = 20;
DWORD data[numValues] = {};
db_set_value(hDB, 0, "Test", data, sizeof(DWORD) * numValues, numValues, TID_DWORD);
db_find_key(hDB, 0, "Test", &hTestKey);
db_watch(hDB, hTestKey, watch_fn, nullptr);
printf("Press any key to exit loop...\n");
while (!ss_kbhit()) {
cm_yield(1);
}
db_unwatch_all();
db_delete_key(hDB, hTestKey, FALSE);
cm_disconnect_experiment();
return 0;
} |
21 Mar 2022, Stefan Ritt, Bug Report, Python ODB watch
|
What you describe is a well-known problem with the ODB. At PSI we have similar issues. There are
two approaches to solve it:
1) Write values one-by-one to the ODB, but do not trigger a watch update. In the sequencer, this
can be achieved with the ODBSET command (see https://daq00.triumf.ca/MidasWiki/index.php/Sequencer
and the last paragraph right of the ODBSET command). You use notify=0 for all set commands except
the last one where you use notify=1. On the C++ API, you can use db_set_data_index1() which has
this notify flag as the last parameter.
2) You add intelligence to your front-end. If you get a watchdog update, you do not apply this
directly to the hardware, but put it into a FIFO. Once you do not get any more update for a certain
period (like 1s is a good value), you empty the FIFO and apply all setting immediately.
Both methods have been used at PSI successfully, although 1) is much easier to implement, especially
if you use the midas sequencer.
Stefan |
03 Sep 2009, Exaos Lee, Bug Report, Prompt problem about odbedit
|
I tried to use odbedit to set the "/System/Prompt" to "%h:%e:%s %p> " and got a
problem: pressing "Return" doesn't work any more. But "[%h:%e:%s]%p> " works fine.
Please see the attachment. |
03 Sep 2009, Stefan Ritt, Bug Report, Prompt problem about odbedit
|
> I tried to use odbedit to set the "/System/Prompt" to "%h:%e:%s %p> " and got a
> problem: pressing "Return" doesn't work any more. But "[%h:%e:%s]%p> " works fine.
> Please see the attachment.
I fixed that problem in SVN revision 4556. It occurred when the prompt does start with a
'%' which nobody tried before... |
28 Aug 2018, Lukas Gerritzen, Forum, Problems with virtual history events
|
Hi,
I am trying to set up virtual history events following
https://midas.triumf.ca/MidasWiki/index.php/History_System#Virtual_History_Event
Trying it the first way, using the following setup:
Key name Type #Val Size Last Opn Mode Value
---------------------------------------------------------------------------
Links DIR
dirlink -> External/dir KEY 1 12 >99d 0 RWD <subdirectory>
Key name Type #Val Size Last Opn Mode Value
---------------------------------------------------------------------------
External DIR
dir DIR
foo FLOAT 1 4 16s 0 RWD 12.5
Then I get the following error message:
==================== History link "dirlink", ID 28150 =======================
[Logger,ERROR] [mlogger.cxx:4942:open_history,ERROR] History event dirlink has
no variables in ODB
Trying the second way, I set up the following:
Key name Type #Val Size Last Opn Mode Value
---------------------------------------------------------------------------
Links DIR
dir DIR
testlink -> External/foo
FLOAT 1 4 8m 0 RWD 5.2
Key name Type #Val Size Last Opn Mode Value
---------------------------------------------------------------------------
External DIR
foo FLOAT 1 4 6m 0 RWD 5.2
Starting mlogger in verbose mode yields the following error:
==================== History link "dir", ID 28150 =======================
[Logger,ERROR] [mlogger.cxx:4935:open_history,ERROR] History link
/History/Links/dir/testlink is invalid
Error in history system, aborting startup.
I'm not sure if this is a bug or just a case of PEBCAK.
Finally, to set the update period, do I need entries in /history/links periods
with the tag name? Is there a way to only write them in the history file when
they change? I want to use the virtual history events for measurements I get
from external scripts, some periodic, some manual.
Thanks |
28 Aug 2018, Konstantin Olchanski, Forum, Problems with virtual history events
|
Hi, what you try should have worked. Perhaps your symlink is wrong and should say "/External/..." (with a leading slash). The "links period" would have
worked same as equipment/common/history period - as a rate limiter.
Anyhow, I suggest another way to do the same - create a fake equipment - the logger does
not care if the equipment is real or not and if you write into /eq/fake/variables from a proper frontend
or from a script. To hide the fake equipment from the status page, set /eq/fake/common/hidden to "true".
This will work for sure.
K.O.
> Hi,
> I am trying to set up virtual history events following
> https://midas.triumf.ca/MidasWiki/index.php/History_System#Virtual_History_Event
>
> Trying it the first way, using the following setup:
> Key name Type #Val Size Last Opn Mode Value
> ---------------------------------------------------------------------------
> Links DIR
> dirlink -> External/dir KEY 1 12 >99d 0 RWD <subdirectory>
>
>
> Key name Type #Val Size Last Opn Mode Value
> ---------------------------------------------------------------------------
> External DIR
> dir DIR
> foo FLOAT 1 4 16s 0 RWD 12.5
>
>
> Then I get the following error message:
> ==================== History link "dirlink", ID 28150 =======================
> [Logger,ERROR] [mlogger.cxx:4942:open_history,ERROR] History event dirlink has
> no variables in ODB
>
>
> Trying the second way, I set up the following:
> Key name Type #Val Size Last Opn Mode Value
> ---------------------------------------------------------------------------
> Links DIR
> dir DIR
> testlink -> External/foo
> FLOAT 1 4 8m 0 RWD 5.2
>
> Key name Type #Val Size Last Opn Mode Value
> ---------------------------------------------------------------------------
> External DIR
> foo FLOAT 1 4 6m 0 RWD 5.2
>
>
> Starting mlogger in verbose mode yields the following error:
> ==================== History link "dir", ID 28150 =======================
> [Logger,ERROR] [mlogger.cxx:4935:open_history,ERROR] History link
> /History/Links/dir/testlink is invalid
> Error in history system, aborting startup.
>
> I'm not sure if this is a bug or just a case of PEBCAK.
>
> Finally, to set the update period, do I need entries in /history/links periods
> with the tag name? Is there a way to only write them in the history file when
> they change? I want to use the virtual history events for measurements I get
> from external scripts, some periodic, some manual.
>
> Thanks |
28 Feb 2018, Thomas Lindner, Bug Report, Problems with start program button with new mhttpd webpages
|
Pierre Gorel identified a problem with the 'start program' button on the new version of MIDAS that uses the
mjsonrpc functions for building the webpages. In particular, he tracked the problem down to some
questionable std::string / char* handling.
Interestingly, the particular 'start program' problem was seen on Pierre's Ubuntu 16.04.3 LTS machine, but
could not be reproduced on RHEL-7 or Macos 10.13 machines. So the manifestation of the code error
seemed to depend on the compiler.
The problem should now be fixed in the HEAD version of MIDAS. If you are using the newer MIDAS (since last
summer), particularly on Ubuntu, then you may want to update your installation.
Details of the problem are on the bitbucket issue tracker:
https://bitbucket.org/tmidas/midas/issues/132/corruption-of-char-in-mjsonrpccxx |
17 Jan 2011, Andreas Suter, Bug Report, Problems with midas history SVN 4936
|
I have the following problems after updating to midas SVN 4936: the history
system (web-page via mhttpd) seems to stop working. I checked the history files
themself and they are indeed written, except that the events ID's are not the
same anymore (I mean the ones defined under /Equipment/XXX/Common/Event ID),
rather the mlogger seems to choose an ID by itself.
Currently the only way to get things working again was to recompile midas with
adding -DOLD_HISTORY to the CFLAGS which is troublesome since it is likely to be
forgotton with the next SVN update. When looking into the SVN I have the
impression there is something going on concerning the history system, however
I couldn't find any documentation.
What is the best practice for the future, in order not to run into any problems
but still being able to look at the old history (also from within the web-page
via mhttpd)? |
13 Feb 2011, Lee Pool, Bug Report, Problems with midas history SVN 4936
|
> I have the following problems after updating to midas SVN 4936: the history
> system (web-page via mhttpd) seems to stop working. I checked the history files
> themself and they are indeed written, except that the events ID's are not the
> same anymore (I mean the ones defined under /Equipment/XXX/Common/Event ID),
> rather the mlogger seems to choose an ID by itself.
> Currently the only way to get things working again was to recompile midas with
> adding -DOLD_HISTORY to the CFLAGS which is troublesome since it is likely to be
> forgotton with the next SVN update. When looking into the SVN I have the
> impression there is something going on concerning the history system, however
> I couldn't find any documentation.
> What is the best practice for the future, in order not to run into any problems
> but still being able to look at the old history (also from within the web-page
> via mhttpd)?
Hi...
Do you mind giving little more detail? We might have the same issue, where we got
complaints that midas history stops working after a certain time.
L |
16 Feb 2011, Konstantin Olchanski, Bug Report, Problems with midas history SVN 4936
|
> I have the following problems after updating to midas SVN 4936: the history
> system (web-page via mhttpd) seems to stop working. I checked the history files
> themself and they are indeed written, except that the events ID's are not the
> same anymore (I mean the ones defined under /Equipment/XXX/Common/Event ID),
> rather the mlogger seems to choose an ID by itself.
Yes, I found the problem - it was introduced around svn rev 4827 in September 2010.
It is fixed now, please do this:
1) update history_midas.c to latest svn rev 4979
1a) do NOT update any other files - update only history_midas.c
2) rebuild mlogger (it will do no harm and no good if you rebuild everything)
3) odbedit save odb.xml
4) in odb, remove /history/events and /history/tags (you can also set "/History/DisableTags" to "y")
5) restart mlogger
6) observe that odb /history/events now has event ids same as equipment ids
7) restart your frontend, observe that history file is growing
8) use mhdump to observe that history is now written with correct event id
9) go to mhttpd history plot, you should see the new data coming in. Plot history in the "1 year" scale, you
should see the old data and you should see a gap where data was written with wrong event id
10) I should still have an mhrewrite program sitting somewhere that can change the event ids inside midas
history files, if you have many data files with wrong event id, let me know, I will find this program and tell you
how to use it to repair your data files.
> Currently the only way to get things working again was to recompile midas with
> adding -DOLD_HISTORY to the CFLAGS which is troublesome since it is likely to be
> forgotton with the next SVN update.
Yes, I am glad you found OLD_HISTORY, I kept it just for the case some breakage like this happens. I will still
keep it around until the dust settles.
> When looking into the SVN I have the impression there is something going on concerning the history
system, however I couldn't find any documentation.
Yes, you found the right stuff, and it is partially documented. mlogger uses /History/Events to map history
event names (equipment names in your case) to history event ids. But in your case, the wrong event id has
been assigned by mlogger so nothing worked right. As a bonus, I now see inconsistency between event_id
code remaining in mlogger (which is not used) and event_id code in history_midas (which *is* used). I will be
straightening this stuff over the next few days.
I hope my correction to history_midas.cxx is good enough to get you going for now.
> What is the best practice for the future, in order not to run into any problems
> but still being able to look at the old history (also from within the web-page
> via mhttpd)?
Personally, I think that the midas history storage into binary files is not robust enough
when facing changes to equipment and event ids, renaming and deleting of stuff, etc. There
are other limitations, as well, i.e. the 16-bit history event id, etc.
The newly implemented SQL history storage (uses ODBC layer, MySQL supported, PgSQL partially
implemented) does not have any of these problems and seems to work well enough
for T2K/ND280. Sometimes MySQL history is even faster when making history plots in mhttpd.
I am now thinking about implementing SQL history storage in SQLite files, and it will not have
any of these problems, too. Performance and robustness for database corruption remain a question, though.
K.O. |
|