16 Mar 2022, Ben Smith, Bug Report, Python ODB watch
|
> It seems that the second write operation "overlaps" the first one...
Hi Gennaro,
In principle the same issue can happen in C++ code, but is much less likely as the callbacks get executed more quickly (partly due to C++/python in general, and partly because the python code does some extra work to make the interface more user-friendly). The C++ code at the end of this message adds a 100ms sleep to the callback and can result in output like this when you do quick edits of "Test[0-19]" in odbedit.
Element 1 is 0
Element 2 is 0
Element 3 is 0
Element 4 is 0
Element 5 is 0
Element 6 is 0
Element 7 is 1
Element 8 is 1
Element 9 is 1
etc...
I agree that this can be a really nasty source of bugs if you need to react to every change. I'll add a warning to the python docstrings, but I can't think of a way to make this more robust at the midas level - I think we'd need some sort of ODB "snapshot" system...
#include "midas.h"
void watch_fn(HNDLE hDB, HNDLE hKey, int index, void *info) {
DWORD data = 0;
INT buf_size = sizeof(data);
db_get_data_index(hDB, hKey, &data, &buf_size, index, TID_DWORD);
printf("Element %d is %u\n", index, data);
ss_sleep(100);
}
int main() {
HNDLE hDB, hClient, hTestKey;
std::string host, expt;
cm_get_environment(&host, &expt);
cm_connect_experiment(host.c_str(), expt.c_str(), "test_odb", nullptr);
cm_get_experiment_database(&hDB, &hClient);
static const DWORD numValues = 20;
DWORD data[numValues] = {};
db_set_value(hDB, 0, "Test", data, sizeof(DWORD) * numValues, numValues, TID_DWORD);
db_find_key(hDB, 0, "Test", &hTestKey);
db_watch(hDB, hTestKey, watch_fn, nullptr);
printf("Press any key to exit loop...\n");
while (!ss_kbhit()) {
cm_yield(1);
}
db_unwatch_all();
db_delete_key(hDB, hTestKey, FALSE);
cm_disconnect_experiment();
return 0;
} |
21 Mar 2022, Stefan Ritt, Bug Report, Python ODB watch
|
What you describe is a well-known problem with the ODB. At PSI we have similar issues. There are
two approaches to solve it:
1) Write values one-by-one to the ODB, but do not trigger a watch update. In the sequencer, this
can be achieved with the ODBSET command (see https://daq00.triumf.ca/MidasWiki/index.php/Sequencer
and the last paragraph right of the ODBSET command). You use notify=0 for all set commands except
the last one where you use notify=1. On the C++ API, you can use db_set_data_index1() which has
this notify flag as the last parameter.
2) You add intelligence to your front-end. If you get a watchdog update, you do not apply this
directly to the hardware, but put it into a FIFO. Once you do not get any more update for a certain
period (like 1s is a good value), you empty the FIFO and apply all setting immediately.
Both methods have been used at PSI successfully, although 1) is much easier to implement, especially
if you use the midas sequencer.
Stefan |
24 Mar 2022, Konstantin Olchanski, Bug Report, data missing in runXXXXXX.mid
|
> > It would be good to pin point there the data is lost. This is the sequence:
> >
> > frontend user code -> mfe.c code -> SYSTEM buffer -> mlogger -> disk
> >
> > To see if correct data arrives to the SYSTEM buffer, run:
> > mdump -z SYSTEM
> >
> > To see if mlogger is receiving events from the SYSTEM buffer, run:
> > mlogger -v ### mlogger should report all events, history and data
> >
> > To see if mlogger writes events to disk, examine the disk file (in this case, you already did, data is not there).
> >
> > I would guess that your data does not make it out from the frontend (mdump shows "nothing"),
> > if data were to arrive into the SYSTEM buffer, it would make it to disk, unless
> > mlogger is misconfigured (but you already checked that).
> >
> > If you have trouble with the frontend framework code, you can try to switch from the mfe.c frontend
> > to the newer c++ tmfe frontend (see progs/fetest_tmfe.cxx and progs/fetest_tmfe_thread.cxx).
> >
> > K.O.
>
> Good evening
>
> I tried to reproduce the behavior in a very simple FE but it did not work out.
> The next thing for me would be to take the FE that is producing this behavior,
> replace all the device communication and data with dummies. If the problem is still
> there I would start to simplify as much as possible.
>
> Following the inputs of KO, I pin-pointed the data loss. The system buffer still
> gets the data but the mlogger does not write the data event. Then of course the data
> is also not anymore present in the data file. Therefore, I checked the logger
> settings again, Event ID and Trigger Mask still -1. Nothing else, at least from my point of view,
> that is misconfigured. Nevertheless, if it helps I can send my ODB settings.
>
> When doing the tests just before I found something else that probably
> can give a hint to the problem. The data is only lost if the time between
> two runs is long (a few seconds). As an example: If I run a sequence with a loop
> and after the FE stops the run the loop ends and the next run is started automatically,
> then only the first run has no data, which is the one after a longer time of
> no data taking. When I add a "WAIT Seconds 5" after the run before starting
> the next, not data is written to the disk for any run. I also found this
> once when adding a sleep(1) at the end of the FE readout function
> but back then did not think about it any further.
>
Looks like this problem fell into the covid crack.
As far as I know, MIDAS does not lose any events between bm_send_event() and the shared memory
buffer. It does not lose any events in the mlogger (unless the "event request" is misconfigured).
(there is lots of opportunity to lose events in complicated frontends).
If you have some evidence otherwise, I would very much like to hear about
it and I want to fix all problems that cause it.
In your previous report I was under the impression that you lose random events here and there,
but your latest report is about mlogger not writing anything at all.
Which case is it?
If you can definitely say that all your events make it to the SYSTEM buffer
but mlogger sometimes does not see some of them and sometimes does not see all of them,
we should look very closely at bm_receive_event() and mlogger itself.
In the case where mlogger is not seeing any events at all (output file is empty), as this is
happening, I would like to see the output of mdump (to confirm events are written to SYSTEM
buffer with correct event_id and trigger_mask) and the output of (say)
"manalyzer_test.exe --dump run01161.mid.lz4" on your output file.
If the output is very long, you can email it to me directly instead of posting it here.
K.O. |
24 Mar 2022, Stefan Ritt, Bug Report, data missing in runXXXXXX.mid
|
One idea: we should have a look at mlogger::close_channels(). There the SYSTEM buffer is emptied through the cm_yield() call. Instrumenting this with some debugging code will enlighten us.
Another possible problem: If the frontend requested to be notified for a run stop AFTER the logger, then the problem might happen: Logger closes file, and THEN the frontend flushes events ending up in the SYSTEM buffer and being logged at the beginning of the next run. The mfe.cxx framework takes care of this by calling
cm_register_transition(TR_SOP, 500);
while the mlogger does
cm_register_transition(TR_STOP, tr_stop, 800);
and since 800 > 500 the logger will be called AFTER the frontend. If one use a framework different from mfe.cxx, this could however be different.
Stefan |
24 Mar 2022, Konstantin Olchanski, Bug Report, data missing in runXXXXXX.mid
|
> One idea: we should have a look at mlogger::close_channels().
> There the SYSTEM buffer is emptied through the cm_yield() call.
> Instrumenting this with some debugging code will enlighten us.
right. this will "last few events are lost at the end of run".
but that code in the mlogger was not touched in years, if there is a problem there,
we would have seen it by now, most experiments check that the number
of events in the data file is same as number of triggers generated, both
numbers are shown on the midas status page.
> Another possible problem: If the frontend requested to be notified for a run stop AFTER the logger, then the problem might happen: Logger closes file, and THEN the frontend flushes events ending up in the SYSTEM buffer and being logged at the beginning of the next run. The mfe.cxx framework takes care of this by calling
> cm_register_transition(TR_STOP, 500);
default sequence, both mfe.c frontend and c++ tmfe frontend:
start of run:
- mlogger first (configure history, open data file)
- frontends last
- (if any frontend fails, TR_STARTABORT is sent to mlogger to close the output file and "undo" the run start)
end of run:
- frontends first (must not send any events after after processing the TR_STOP RPC call, inside the TR_STOP handler, bm_flush_cache() takes care of the write cache)
- mlogger last
- (if any frontend fails, failure is ignored, run stops regardless)
wrong order will be only if they manually change it, and whatever order
they set, you see it on the midas transition page (and mtransition -v and odbedit stop now -v, etc).
K.O. |
25 Mar 2022, Marius Koeppel, Bug Report, Writting MIDAS Events via FPGAs
|
I finally found the problem why the readout stops after a run transition.
In my dummy frontend the serial number was not reset to zero at run start.
This leads to a mismatch of the serial number in the function receive_trigger_event of mfe.cxx:1247.
Which is than resulting in the problem that the function founds never a new event in all ring buffers and nothing get read out of the buffer.
Nevertheless, it would be nice that the system would tell the user that there is a mismatch in the serial number (printing a warning / error etc.).
Cheers,
Marius |
20 Jun 2022, jianrun, Bug Report, Error in "midas/src/mana.cxx"
|
Dear Midas developers,
When we are running the examples in $MIDASSYS/examples/experiment/, we meet some
problems when analyzing the results:
1. When we analyze the data using the analyzer: ./analyzer -i run00001.mid -o
run00001.rz , we find some bugs:
"
Root server listening on port 9090...
Running analyzer offline. Stop with "!"
[Analyzer,ERROR] [mana.cxx:1832:bor,ERROR] HBOOK support is not compiled in
[Analyzer,INFO] Set run number 6 in ODB
Load ODB from run 6...OK
run00006.mid:2680 events, 0.00s
"
We think this occurs in the "midas/src/mana.cxx ". How can we solve this?
2. When we analyze the above data, an error also occurs:
[Analyzer,ERROR] [odb.cxx:847:db_validate_name,ERROR] Invalid name
"/Analyzer/Tests/Always true/Rate [Hz]" passed to db_create_key_wlocked: should
not contain "["
We simply fixed that just by replacing the "Rate [Hz]" with "Rate" in the
test_write in midas/src/mana.cxx
We are curious whether you can fix the problem permanently in the next version,
or we are not running the code properly. Thanks! |
25 Jun 2022, Joseph McKenna, Bug Report, RPC timeout for manalyzer over network
|
In ALPHA, I get RPC timeouts running a (reasonably heavy) analyzer on a remote machine (connected directly via a ~30 meter 10Gbe Ethernet cable) after ~5 minutes of running. If I run the analyser locally, I dont not see a timeout...
gdb trace:
#0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1 0x00007ffff5d35859 in __GI_abort () at abort.c:79
#2 0x00005555555a2a22 in rpc_call (routine_id=11111) at /home/alpha/packages/midas/src/midas.cxx:13866
#3 0x000055555562699d in bm_receive_event_rpc (buffer_handle=buffer_handle@entry=2, buf=buf@entry=0x0, buf_size=buf_size@entry=0x0, ppevent=ppevent@entry=0x0, pvec=pvec@entry=0x7fffffffd700,
timeout_msec=timeout_msec@entry=100) at /home/alpha/packages/midas/src/midas.cxx:10510
#4 0x0000555555631082 in bm_receive_event_vec (buffer_handle=2, pvec=pvec@entry=0x7fffffffd700, timeout_msec=timeout_msec@entry=100) at /home/alpha/packages/midas/src/midas.cxx:10794
#5 0x0000555555673dbb in TMEventBuffer::ReceiveEvent (this=this@entry=0x555557388b30, e=e@entry=0x7fffffffd700, timeout_msec=timeout_msec@entry=100) at /home/alpha/packages/midas/src/tmfe.cxx:312
#6 0x0000555555607b56 in ReceiveEvent (b=0x555557388b30, e=0x7fffffffd6c0, timeout_msec=100) at /home/alpha/packages/midas/manalyzer/manalyzer.cxx:1411
#7 0x000055555560d8dc in ProcessMidasOnlineTmfe (args=..., progname=<optimized out>, hostname=<optimized out>, exptname=<optimized out>, bufname=<optimized out>, event_id=<optimized out>,
trigger_mask=<optimized out>, sampling_type_string=<optimized out>, num_analyze=0, writer=<optimized out>, multithread=<optimized out>, profiler=<optimized out>,
queue_interval_check=<optimized out>) at /home/alpha/packages/midas/manalyzer/manalyzer.cxx:1534
#8 0x000055555560f93b in manalyzer_main (argc=<optimized out>, argv=<optimized out>) at /usr/include/c++/9/bits/basic_string.h:2304
#9 0x00007ffff5d37083 in __libc_start_main (main=0x5555555b1130 <main(int, char**)>, argc=8, argv=0x7fffffffdda8, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>,
stack_end=0x7fffffffdd98) at ../csu/libc-start.c:308
#10 0x00005555555b184e in _start () at /usr/include/c++/9/bits/stl_vector.h:94
Any suggestions? Many thanks |
18 Jul 2022, Konstantin Olchanski, Bug Report, RPC timeout for manalyzer over network
|
> In ALPHA, I get RPC timeouts running a (reasonably heavy) analyzer on a remote machine (connected directly via a ~30 meter 10Gbe Ethernet cable) after ~5 minutes of running. If I run the analyser locally, I dont not see a timeout...
there is a subtle bug in the mserver. under rare conditions, ss_suspend() will recurse in an unexpected way
and mserver will go to sleep waiting for data from a udp socket (that will never arrive, so sleep forever).
remote client will see it as an rpc timeout. in my tests (and in ALPHA-g at CERN, as reported by Joseph),
I see this rare condition to happen about every 5 minutes. in normal use, this is the first time we become
aware of this problem, the best I can tell this bug was in the mserver since day one.
commit https://bitbucket.org/tmidas/midas/commits/fbd06ad9d665b1341bd58b0e28d6625877f3cbd0
to develop and
to release/midas-2022-05
The stack trace that shows the mserver hang/crash (sleep() is the stand-in for the sleep-forever socket read).
(gdb) bt
#0 0x00007f922c53f9e0 in __nanosleep_nocancel () from /lib64/libc.so.6
#1 0x00007f922c53f894 in sleep () from /lib64/libc.so.6
#2 0x0000000000451922 in ss_suspend (millisec=millisec@entry=100, msg=msg@entry=1) at /home/agmini/packages/midas/src/system.cxx:4433
#3 0x0000000000411d53 in bm_wait_for_more_events_locked (pbuf_guard=..., pc=pc@entry=0x7f920639b93c, timeout_msec=timeout_msec@entry=100,
unlock_read_cache=unlock_read_cache@entry=1) at /home/agmini/packages/midas/src/midas.cxx:9429
#4 0x00000000004238c3 in bm_fill_read_cache_locked (timeout_msec=100, pbuf_guard=...) at /home/agmini/packages/midas/src/midas.cxx:9003
#5 bm_read_buffer (pbuf=pbuf@entry=0xdf8b50, buffer_handle=buffer_handle@entry=2, bufptr=bufptr@entry=0x0, buf=buf@entry=0x7f9203d75020,
buf_size=buf_size@entry=0x7f920639aa20, vecptr=vecptr@entry=0x0, timeout_msec=timeout_msec@entry=100, convert_flags=0,
dispatch=dispatch@entry=0) at /home/agmini/packages/midas/src/midas.cxx:10279
#6 0x0000000000424161 in bm_receive_event (buffer_handle=2, destination=0x7f9203d75020, buf_size=0x7f920639aa20, timeout_msec=100)
at /home/agmini/packages/midas/src/midas.cxx:10649
#7 0x0000000000406ae4 in rpc_server_dispatch (index=11111, prpc_param=0x7ffcad70b7a0) at /home/agmini/packages/midas/progs/mserver.cxx:575
#8 0x000000000041ce9c in rpc_execute (sock=10, buffer=buffer@entry=0xe11570 "g+", convert_flags=0)
at /home/agmini/packages/midas/src/midas.cxx:15003
#9 0x000000000041d7a5 in rpc_server_receive_rpc (idx=idx@entry=0, sa=0xde6ba0) at /home/agmini/packages/midas/src/midas.cxx:15958
#10 0x0000000000451455 in ss_suspend (millisec=millisec@entry=1000, msg=msg@entry=0) at /home/agmini/packages/midas/src/system.cxx:4575
#11 0x000000000041deb2 in rpc_server_loop () at /home/agmini/packages/midas/src/midas.cxx:15907
#12 0x0000000000405266 in main (argc=9, argv=<optimized out>) at /home/agmini/packages/midas/progs/mserver.cxx:390
(gdb)
K.O. |
15 Aug 2022, Zaher Salman, Bug Report, firefox hangs due to mhistory
|
Firefox is hanging/becoming unresponsive due to javascript code. After stopping the script manually to get firefox back in control I have the following message in the console
17:21:28.821 Script terminated by timeout at:
MhistoryGraph.prototype.drawTAxis@http://lem03.psi.ch:8081/mhistory.js:2828:7
MhistoryGraph.prototype.draw@http://lem03.psi.ch:8081/mhistory.js:1792:9
mhistory.js:2828:7
Any ideas how to resolve this?? |
15 Aug 2022, Stefan Ritt, Bug Report, firefox hangs due to mhistory
|
> Firefox is hanging/becoming unresponsive due to javascript code. After stopping the script manually to get firefox back in control I have the following message in the console
>
> 17:21:28.821 Script terminated by timeout at:
> MhistoryGraph.prototype.drawTAxis@http://lem03.psi.ch:8081/mhistory.js:2828:7
> MhistoryGraph.prototype.draw@http://lem03.psi.ch:8081/mhistory.js:1792:9
> mhistory.js:2828:7
>
> Any ideas how to resolve this??
I have to reproduce the problem. Can you send me the full URL from your browser when you see that problem? Probably you have some "special" axis limits, so we don't see that
problem anywhere else.
Stefan |
16 Aug 2022, Zaher Salman, Bug Report, firefox hangs due to mhistory
|
> > Firefox is hanging/becoming unresponsive due to javascript code. After stopping the script manually to get firefox back in control I have the following message in the console
> >
> > 17:21:28.821 Script terminated by timeout at:
> > MhistoryGraph.prototype.drawTAxis@http://lem03.psi.ch:8081/mhistory.js:2828:7
> > MhistoryGraph.prototype.draw@http://lem03.psi.ch:8081/mhistory.js:1792:9
> > mhistory.js:2828:7
> >
> > Any ideas how to resolve this??
>
> I have to reproduce the problem. Can you send me the full URL from your browser when you see that problem? Probably you have some "special" axis limits, so we don't see that
> problem anywhere else.
>
> Stefan
Hi Stefan and Konstantin,
The URL (reachable only within PSI) is http://lem03.psi.ch:8081/?cmd=custom&page=Mudas
Firefox is version 91.12.0esr (64-bit), but I had similar issues with chrome/chromium too.
The hangs seem to happen randomly so I have not been able to reproduce it yet.
I have histories here http://lem03.psi.ch:8081/?cmd=custom&page=Mudas&tab=3 (30 minutes each), but I have also histories popping up in modals though they do not cause any issues.
I'll try to reproduce it in the coming few days and report again.
thanks,
Zaher |
16 Aug 2022, Zaher Salman, Bug Report, firefox hangs due to mhistory
|
I found the bug. The problem is triggered by changing the firefox window. This calls a function that is supposed to change the size of the history plot and it works well when the history plots are visible but not if the history plots are hidden in a javascript tab (not another firefox tab).
Is there a clean way to resize the history plot if the parent div changes size?? The offending code is
mhist[i].mhg = new MhistoryGraph(mhist[i]);
mhist[i].mhg.initializePanel(i);
mhist[i].mhg.resize();
mhist[i].resize = function () {
mhis.mhg.resize();
}; |
16 Aug 2022, Konstantin Olchanski, Bug Report, firefox hangs due to mhistory
|
> > > Firefox is hanging/becoming unresponsive due to javascript code.
>
> The URL (reachable only within PSI) is http://lem03.psi.ch:8081/?cmd=custom&page=Mudas
so malfunction is not in the midas history page, but in a custom page. I could help you debug it,
but you would have to provide the complete source code (javascript and html).
> Firefox is version 91.12.0esr (64-bit), but I had similar issues with chrome/chromium too.
my firefox is 103.something. when you say google-chrome has "similar issues",
I read it as "google-chrome does not show this same bug, but shows some other
bug somewhere else". (if I misread you, you have to write better).
but this gives you a front to attack your bugs. basically all browsers should render your
custom page exactly the same (unless you use some obscure or experimental feature, which I
recommend against).
so you tweak your page to identify the source of different rendering results, and try to eliminate it,
hopefully by the time you get your page render exactly the same everywhere, all the real bugs
have gotten shaken out, too. (this is similar to debugging a c++ program by compiling
it on linux, mac, windows, vax, raspbery pi, etc and checking that you get the same result everywhere).
> The hangs seem to happen randomly so I have not been able to reproduce it yet.
I find that javascript debuggers are not setup to debug hangs. I think debugger runs partially
inside the same javascript engine you are debugging, so both hang and debugging is impossible.
(latest google-chrome has another improvement, all pages from the same computer run in the same
javascript engine, so if one midas page stops (on exception or because I debug it), all midas pages
stop and I have to run two different browsers if I want to debug (i.e.) a history page crash
and look at odb at the same time. fun).
K.O. |
17 Aug 2022, Stefan Ritt, Bug Report, firefox hangs due to mhistory
|
The problem lies in your function mhistory_init_one() in Mudas.js:1965. You can only call "new MhistoryGraph(e)" with an element "e" which is something like
<div class="mjshistory" data-group="..." data-panel="..." data-base-u-r-l="https://host.psi.ch/?cmd=history" title="">
Please note the "data-base-u-r-l". This gets automatically added by the function mhistory_init() in mhistory.js:48. The URL is necessary sot that the upper right button in a history graph works which goes to a history page only showing the current graph.
In you function mhistory_init_one() you forgot the call
mhist.dataset.baseURL = baseURL;
where baseURL has to come from the current address bar like
let baseURL = window.location.href;
if (baseURL.indexOf("?cmd") > 0)
baseURL = baseURL.substr(0, baseURL.indexOf("?cmd"));
baseURL += "?cmd=history";
If you duplicate some functionality from mhistory.js, please make sure to duplicate it completely.
Best,
Stefan |
17 Aug 2022, Zaher Salman, Bug Report, firefox hangs due to mhistory
|
> The problem lies in your function mhistory_init_one() in Mudas.js:1965. You can only call "new MhistoryGraph(e)" with an element "e" which is something like
>
> <div class="mjshistory" data-group="..." data-panel="..." data-base-u-r-l="https://host.psi.ch/?cmd=history" title="">
>
> Please note the "data-base-u-r-l". This gets automatically added by the function mhistory_init() in mhistory.js:48. The URL is necessary sot that the upper right button in a history graph works which goes to a history page only showing the current graph.
>
> In you function mhistory_init_one() you forgot the call
>
> mhist.dataset.baseURL = baseURL;
>
> where baseURL has to come from the current address bar like
>
> let baseURL = window.location.href;
> if (baseURL.indexOf("?cmd") > 0)
> baseURL = baseURL.substr(0, baseURL.indexOf("?cmd"));
> baseURL += "?cmd=history";
>
> If you duplicate some functionality from mhistory.js, please make sure to duplicate it completely.
>
Thanks Stefan, but this was not the problem since I am setting the baseURL. You may have looked at the code during my debugging.
Some of my histories are placed in an IFrame object. I eventually realized that my code fails when it tries to resize a history which is placed in an invisible IFrame. I resolved the issue by making sure that I am resizing plots only if they are in a visible IFrame.
|
17 Aug 2022, Stefan Ritt, Bug Report, firefox hangs due to mhistory
|
> Some of my histories are placed in an IFrame object. I eventually realized that my code fails
> when it tries to resize a history which is placed in an invisible IFrame. I resolved the issue
> by making sure that I am resizing plots only if they are in a visible IFrame.
Just to be clear: You could resolve everything on your side, or do you need to change anything in mhistory.js?
Just a tip: IFrames are not good to put anything in. I recommend just to dynamically crate a <div> element,
append it to the document body, make it floating and initially invisible. Then put all inside that div. Have
a look how control.js do it. T |
17 Aug 2022, Stefan Ritt, Bug Report, firefox hangs due to mhistory
|
> Some of my histories are placed in an IFrame object. I eventually realized that my code fails
> when it tries to resize a history which is placed in an invisible IFrame. I resolved the issue
> by making sure that I am resizing plots only if they are in a visible IFrame.
Just to be clear: You could resolve everything on your side, or do you need to change anything in mhistory.js?
Just a tip: IFrames are not good to put anything in. I recommend just to dynamically crate a <div> element,
append it to the document body, make it floating and initially invisible. Then put all inside that div. Have
a look how control.js do it. This takes less resources than a complete IFrame and is much easier to handle.
Stefan |
01 May 2023, Giovanni Mazzitelli, Bug Report, python issue with mathplot lib vs odb query
|
Ciao,
we have a very strange issue with python lib with client.odb_get("/") function
when running as midas process and matplotlib is used.
we are developing a remote console by means of sending via kafka producer the odb,
camera image and pmt waveforms, in the INFN cloud where grafana make available
data for non expert shifters, as well as sending midas events for online
reconstruction to the htcondr queue on cloud. The process work perfectly and allow
use to parallelise to standard midas pipeline for file production, ecc the online
monitoring and data processing where we have computing resources (our DAQ is
underground at LNGS). Part of the work will be presented next weak at CHEP
the full code is available at https://github.com/CYGNUS-
RD/middleware/blob/master/dev/event_producer_s3.py
but to get the strange behaviour I report here a test script:
----
def main(verbose=False):
from matplotlib import pyplot as plt
import time
import midas
import midas.client
client = midas.client.MidasClient("middleware")
buffer_handle = client.open_event_buffer("SYSTEM",None,1000000000)
request_id = client.register_event_request(buffer_handle, sampling_type = 2)
fpath = os.path.dirname(os.path.realpath(sys.argv[0]))
while True:
#
odb = client.odb_get("/")
if verbose:
print(odb)
start1 = time.time()
client.communicate(10)
time.sleep(1)
client.deregister_event_request(buffer_handle, request_id)
client.disconnect()
----
if I run it as cli interactivity including or not matplotlib the everything si ok.
As I run it as midas "program" I get:
-----
Traceback (most recent call last):
File "/home/standard/daq/middleware/dev/test_midas_error.py", line 48, in
<module>
main(verbose=options.verbose)
File "/home/standard/daq/middleware/dev/test_midas_error.py", line 29, in main
odb = client.odb_get("/")
File "/home/standard/packages/midas/python/midas/client.py", line 354, in
odb_get
retval = midas.safe_to_json(buf.value, use_ordered_dict=True)
File "/home/standard/packages/midas/python/midas/__init__.py", line 552, in
safe_to_json
return json.loads(decoded, strict=False,
object_pairs_hook=collections.OrderedDict)
File "/usr/lib/python3.8/json/__init__.py", line 370, in loads
return cls(**kw).decode(s)
File "/usr/lib/python3.8/json/decoder.py", line 337, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/usr/lib/python3.8/json/decoder.py", line 353, in raw_decode
obj, end = self.scan_once(s, idx)
json.decoder.JSONDecodeError: Expecting property name enclosed in double quotes:
line 300 column 26 (char 17535)
----
if I comment out the import of matplotlib every think works perfectly again also
as midas program.
it seams that there is a difference between the to way of use the code, and that
is sufficient the call to matplotlib to corrupt in some way the odb. any ideas? |
01 May 2023, Ben Smith, Bug Report, python issue with mathplot lib vs odb query
|
> it seams that there is a difference between the to way of use the code, and that
> is sufficient the call to matplotlib to corrupt in some way the odb. any ideas?
I can't reproduce this on my machines, so this is going to be fun to debug!
Can you try running the program below please? It takes the important bits from odb_get() but prints out the string before we try to parse it as JSON. Feel free to send me the output via email (bsmith@triumf.ca) if you don't want to post your entire ODB dump in the elog.
import sys
import os
import time
import midas
import midas.client
import ctypes
def debug_get(client):
c_path = ctypes.create_string_buffer(b"/")
hKey = ctypes.c_int()
client.lib.c_db_find_key(client.hDB, 0, c_path, ctypes.byref(hKey))
buf = ctypes.c_char_p()
bufsize = ctypes.c_int()
bufend = ctypes.c_int()
client.lib.c_db_copy_json_save(client.hDB, hKey, ctypes.byref(buf), ctypes.byref(bufsize), ctypes.byref(bufend))
print("-" * 80)
print("FULL DUMP")
print("-" * 80)
print(buf.value)
print("-" * 80)
print("Chars 17000-18000")
print("-" * 80)
print(buf.value[17000:18000])
print("-" * 80)
as_dict = midas.safe_to_json(buf.value, use_ordered_dict=True)
client.lib.c_free(buf)
return as_dict
def main(verbose=False):
client = midas.client.MidasClient("middleware")
buffer_handle = client.open_event_buffer("SYSTEM",None,1000000000)
request_id = client.register_event_request(buffer_handle, sampling_type = 2)
fpath = os.path.dirname(os.path.realpath(sys.argv[0]))
while True:
# odb = client.odb_get("/")
odb = debug_get(client)
if verbose:
print(odb)
start1 = time.time()
client.communicate(10)
time.sleep(1)
client.deregister_event_request(buffer_handle, request_id)
client.disconnect()
if __name__ == "__main__":
main() |
|