ID |
Date |
Author |
Topic |
Subject |
2504
|
08 May 2023 |
Stefan Ritt | Forum | Scrript in sequencer | > I tried different ways to pass parameters to bash script, but there are seems to
> be empty, what could be the problem?
Indeed there was a bug in the sequencer with parameter passing to scripts. I fixed it
and committed the changes to the develop branch.
Stefan |
2503
|
08 May 2023 |
Alexey Kalinin | Forum | Scrript in sequencer | Hello,
I tried different ways to pass parameters to bash script, but there are seems to
be empty, what could be the problem?
We have seuqencer like
ODBGET "/Runinfo/runnumber", firstrun
LOOP n,10
#changing HV
TRANSITION start
WAIT seconds,300
TRANSITION stop
ENDLOOP
ODBGET "/Runinfo/runnumber", lastrun
SCRIPT /.../script.sh ,$firstrun ,$lastrun
and script.sh like
firstrun=$1
lastrun=$2
Thanks. Alexey. |
2502
|
02 May 2023 |
Stefan Ritt | Forum | Problem with running midas odbxx frontends on a remote machine using the -h option |
> Shall I create a pull request for the additional RPC argument or will you just fix this on the fly?
Just fix it in the fly yourself. It’s an obvious bug, so please commit to develop.
Stefan |
2501
|
02 May 2023 |
Niklaus Berger | Forum | Problem with running midas odbxx frontends on a remote machine using the -h option | And now we also fixed the client segfault, odb.cxx L8992 also needs to know about the header:
if (rpc_is_remote())
return rpc_call(RPC_DB_COPY_XML, hDB, hKey, buffer, buffer_size, header);
(last argument was missing before). |
2500
|
02 May 2023 |
Niklaus Berger | Forum | Problem with running midas odbxx frontends on a remote machine using the -h option | Thanks for all the helpful hints. When finally managing to evade all timeouts and attach the debugger in just the right moment, we find that we get a segfault in mserver at L827:
case RPC_DB_COPY_XML:
status = db_copy_xml(CHNDLE(0), CHNDLE(1), CSTRING(2), CPINT(3), CBOOL(4));
Some printf debugging then pointed us to the fact that the culprit is the pointer de-referencing in CBOOL(4). This in turn can be traced back to mrpc.cxx L282 ff, where the line with the arrow was missing:
{RPC_DB_COPY_XML, "db_copy_xml",
{{TID_INT32, RPC_IN},
{TID_INT32, RPC_IN},
{TID_ARRAY, RPC_OUT | RPC_VARARRAY},
{TID_INT32, RPC_IN | RPC_OUT},
-> {TID_BOOL, RPC_IN},
{0}}},
If we put that in, the mserver process completes peacfully and we get a segfault in the client ("Wrong key type in XML file") which we will attempt to debug next. Shall I create a pull request for the additional RPC argument or will you just fix this on the fly? |
2499
|
01 May 2023 |
Giovanni Mazzitelli | Bug Report | python issue with mathplot lib vs odb query | > > Looks like a localisation issue. Your floats are formatted as "6,6584e+01", whereas the JSON decoder expects "6.6584e+01".
>
> This should be fixed in the latest commit to the midas develop branch. The JSON specification requires a dot for the decimal separator, so we must ignore the user's locale when formatting floats/doubles for JSON.
>
> I've tested the fix on my machine by manually changing the locale, and also added an automated test in the python directory.
Thanks very macth Ben,
so if I understand correctly we have to update MIDAS to latest develop branch available? can you sand me the link to be sure of install the right update.
can you also tell me how you fix manually? we are restarting and then well be difficult install and makes updete.
thank you again, regards, Giovanni |
2498
|
01 May 2023 |
Ben Smith | Bug Report | python issue with mathplot lib vs odb query | > Looks like a localisation issue. Your floats are formatted as "6,6584e+01", whereas the JSON decoder expects "6.6584e+01".
This should be fixed in the latest commit to the midas develop branch. The JSON specification requires a dot for the decimal separator, so we must ignore the user's locale when formatting floats/doubles for JSON.
I've tested the fix on my machine by manually changing the locale, and also added an automated test in the python directory. |
2497
|
01 May 2023 |
Ben Smith | Bug Report | python issue with mathplot lib vs odb query | Looks like a localisation issue. Your floats are formatted as "6,6584e+01", whereas the JSON decoder expects "6.6584e+01".
Can you run the following few lines please? Then I'll be able to write a test using the same setup as you:
import locale
print(locale.getlocale())
from matplotlib import pyplot as plt
print(locale.getlocale())
|
2496
|
01 May 2023 |
Giovanni Mazzitelli | Bug Report | python issue with mathplot lib vs odb query | > > it seams that there is a difference between the to way of use the code, and that
> > is sufficient the call to matplotlib to corrupt in some way the odb. any ideas?
>
> I can't reproduce this on my machines, so this is going to be fun to debug!
>
> Can you try running the program below please? It takes the important bits from odb_get() but prints out the string before we try to parse it as JSON. Feel free to send me the output via email (bsmith@triumf.ca) if you don't want to post your entire ODB dump in the elog.
Thank you!
if I added the matplotlib as follow:
#!/usr/bin/env python3
import sys
import os
import time
import midas
import midas.client
import ctypes
from matplotlib import pyplot as plt
def debug_get(client):
c_path = ctypes.create_string_buffer(b"/")
hKey = ctypes.c_int()
client.lib.c_db_find_key(client.hDB, 0, c_path, ctypes.byref(hKey))
buf = ctypes.c_char_p()
bufsize = ctypes.c_int()
bufend = ctypes.c_int()
client.lib.c_db_copy_json_save(client.hDB, hKey, ctypes.byref(buf), ctypes.byref(bufsize), ctypes.byref(bufend))
print("-" * 80)
print("FULL DUMP")
print("-" * 80)
print(buf.value)
print("-" * 80)
print("Chars 17000-18000")
print("-" * 80)
print(buf.value[17000:18000])
print("-" * 80)
as_dict = midas.safe_to_json(buf.value, use_ordered_dict=True)
client.lib.c_free(buf)
return as_dict
def main(verbose=False):
client = midas.client.MidasClient("middleware")
buffer_handle = client.open_event_buffer("SYSTEM",None,1000000000)
request_id = client.register_event_request(buffer_handle, sampling_type = 2)
fpath = os.path.dirname(os.path.realpath(sys.argv[0]))
while True:
# odb = client.odb_get("/")
odb = debug_get(client)
if verbose:
print(odb)
start1 = time.time()
client.communicate(10)
time.sleep(1)
client.deregister_event_request(buffer_handle, request_id)
client.disconnect()
if __name__ == "__main__":
from optparse import OptionParser
parser = OptionParser(usage='usage: %prog\t ')
parser.add_option('-v','--verbose', dest='verbose', action="store_true", default=False, help='verbose output;');
(options, args) = parser.parse_args()
main(verbose=options.verbose)
then tested the code in interactive mode without any error. as soon as I submit as midas "Program" I get the attached output.
thank you again, Giovanni |
Draft
|
01 May 2023 |
Giovanni Mazzitelli | Bug Report | python issue with mathplot lib vs odb query | > > it seams that there is a difference between the to way of use the code, and that
> > is sufficient the call to matplotlib to corrupt in some way the odb. any ideas?
>
> I can't reproduce this on my machines, so this is going to be fun to debug!
>
> Can you try running the program below please? It takes the important bits from odb_get() but prints out the string before we try to parse it as JSON. Feel free to send me the output via email (bsmith@triumf.ca) if you don't want to post your entire ODB dump in the elog.
>
>
>
>
> import sys
> import os
> import time
> import midas
> import midas.client
> import ctypes
>
> def debug_get(client):
> c_path = ctypes.create_string_buffer(b"/")
> hKey = ctypes.c_int()
> client.lib.c_db_find_key(client.hDB, 0, c_path, ctypes.byref(hKey))
>
> buf = ctypes.c_char_p()
> bufsize = ctypes.c_int()
> bufend = ctypes.c_int()
>
> client.lib.c_db_copy_json_save(client.hDB, hKey, ctypes.byref(buf), ctypes.byref(bufsize), ctypes.byref(bufend))
>
> print("-" * 80)
> print("FULL DUMP")
> print("-" * 80)
> print(buf.value)
> print("-" * 80)
> print("Chars 17000-18000")
> print("-" * 80)
> print(buf.value[17000:18000])
> print("-" * 80)
>
> as_dict = midas.safe_to_json(buf.value, use_ordered_dict=True)
>
> client.lib.c_free(buf)
>
> return as_dict
>
> def main(verbose=False):
> client = midas.client.MidasClient("middleware")
> buffer_handle = client.open_event_buffer("SYSTEM",None,1000000000)
> request_id = client.register_event_request(buffer_handle, sampling_type = 2)
>
> fpath = os.path.dirname(os.path.realpath(sys.argv[0]))
>
> while True:
> # odb = client.odb_get("/")
> odb = debug_get(client)
>
> if verbose:
> print(odb)
> start1 = time.time()
>
> client.communicate(10)
> time.sleep(1)
>
>
> client.deregister_event_request(buffer_handle, request_id)
>
> client.disconnect()
>
> if __name__ == "__main__":
> main()
Thank you!
if I added the mat |
2494
|
01 May 2023 |
Ben Smith | Bug Report | python issue with mathplot lib vs odb query | > it seams that there is a difference between the to way of use the code, and that
> is sufficient the call to matplotlib to corrupt in some way the odb. any ideas?
I can't reproduce this on my machines, so this is going to be fun to debug!
Can you try running the program below please? It takes the important bits from odb_get() but prints out the string before we try to parse it as JSON. Feel free to send me the output via email (bsmith@triumf.ca) if you don't want to post your entire ODB dump in the elog.
import sys
import os
import time
import midas
import midas.client
import ctypes
def debug_get(client):
c_path = ctypes.create_string_buffer(b"/")
hKey = ctypes.c_int()
client.lib.c_db_find_key(client.hDB, 0, c_path, ctypes.byref(hKey))
buf = ctypes.c_char_p()
bufsize = ctypes.c_int()
bufend = ctypes.c_int()
client.lib.c_db_copy_json_save(client.hDB, hKey, ctypes.byref(buf), ctypes.byref(bufsize), ctypes.byref(bufend))
print("-" * 80)
print("FULL DUMP")
print("-" * 80)
print(buf.value)
print("-" * 80)
print("Chars 17000-18000")
print("-" * 80)
print(buf.value[17000:18000])
print("-" * 80)
as_dict = midas.safe_to_json(buf.value, use_ordered_dict=True)
client.lib.c_free(buf)
return as_dict
def main(verbose=False):
client = midas.client.MidasClient("middleware")
buffer_handle = client.open_event_buffer("SYSTEM",None,1000000000)
request_id = client.register_event_request(buffer_handle, sampling_type = 2)
fpath = os.path.dirname(os.path.realpath(sys.argv[0]))
while True:
# odb = client.odb_get("/")
odb = debug_get(client)
if verbose:
print(odb)
start1 = time.time()
client.communicate(10)
time.sleep(1)
client.deregister_event_request(buffer_handle, request_id)
client.disconnect()
if __name__ == "__main__":
main() |
2493
|
01 May 2023 |
Giovanni Mazzitelli | Bug Report | python issue with mathplot lib vs odb query | Ciao,
we have a very strange issue with python lib with client.odb_get("/") function
when running as midas process and matplotlib is used.
we are developing a remote console by means of sending via kafka producer the odb,
camera image and pmt waveforms, in the INFN cloud where grafana make available
data for non expert shifters, as well as sending midas events for online
reconstruction to the htcondr queue on cloud. The process work perfectly and allow
use to parallelise to standard midas pipeline for file production, ecc the online
monitoring and data processing where we have computing resources (our DAQ is
underground at LNGS). Part of the work will be presented next weak at CHEP
the full code is available at https://github.com/CYGNUS-
RD/middleware/blob/master/dev/event_producer_s3.py
but to get the strange behaviour I report here a test script:
----
def main(verbose=False):
from matplotlib import pyplot as plt
import time
import midas
import midas.client
client = midas.client.MidasClient("middleware")
buffer_handle = client.open_event_buffer("SYSTEM",None,1000000000)
request_id = client.register_event_request(buffer_handle, sampling_type = 2)
fpath = os.path.dirname(os.path.realpath(sys.argv[0]))
while True:
#
odb = client.odb_get("/")
if verbose:
print(odb)
start1 = time.time()
client.communicate(10)
time.sleep(1)
client.deregister_event_request(buffer_handle, request_id)
client.disconnect()
----
if I run it as cli interactivity including or not matplotlib the everything si ok.
As I run it as midas "program" I get:
-----
Traceback (most recent call last):
File "/home/standard/daq/middleware/dev/test_midas_error.py", line 48, in
<module>
main(verbose=options.verbose)
File "/home/standard/daq/middleware/dev/test_midas_error.py", line 29, in main
odb = client.odb_get("/")
File "/home/standard/packages/midas/python/midas/client.py", line 354, in
odb_get
retval = midas.safe_to_json(buf.value, use_ordered_dict=True)
File "/home/standard/packages/midas/python/midas/__init__.py", line 552, in
safe_to_json
return json.loads(decoded, strict=False,
object_pairs_hook=collections.OrderedDict)
File "/usr/lib/python3.8/json/__init__.py", line 370, in loads
return cls(**kw).decode(s)
File "/usr/lib/python3.8/json/decoder.py", line 337, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/usr/lib/python3.8/json/decoder.py", line 353, in raw_decode
obj, end = self.scan_once(s, idx)
json.decoder.JSONDecodeError: Expecting property name enclosed in double quotes:
line 300 column 26 (char 17535)
----
if I comment out the import of matplotlib every think works perfectly again also
as midas program.
it seams that there is a difference between the to way of use the code, and that
is sufficient the call to matplotlib to corrupt in some way the odb. any ideas? |
2492
|
28 Apr 2023 |
Konstantin Olchanski | Forum | Problem with running midas odbxx frontends on a remote machine using the -h option | > > > As i said we can easily reproduce this with midas/examples/odbxx/odbxx_test.cpp
> > ok, cool. looks like we crashed the mserver.
> Ok. Maybe i have to make this more clear. ANY odbxx access of a remote odb reproduces this error for us on multiple machines.
> It does not matter how much data odbxx is asking for.
> midas commit 9d2ef471 the code from above runs fine
so, a regression. ouch.
if core dumps are turned off, you will not "see" the mserver crash, because the main mserver is still running. it's the mserver forked to
serve your RPC connection that crashes.
> int main() {
> cm_connect_experiment("localhost", "Mu3e", "test", NULL);
> midas::odb o = {
> {"Int32 Key", 42}
> };
> o.connect("/Test/Settings");
> cm_disconnect_experiment();
> return 1;
> }
to debug this, after cm_connect_experiment() one has to put ::sleep(1000000000); (not that big, obviously),
then while it is sleeping do "ps -efw | grep mserver", this will show the mserver for the test program,
connect to it with gdb, wait for ::sleep() to finish and o.connect() to crash, with luck gdb will show
the crash stack trace in the mserver.
so easy to debug? this is why back in the 1970-ies clever people invented core dumps, only to have
even more clever people in the 2020-ies turn them off and generally make debugging more difficult (attaching
gdb to a running program is also disabled-by-default in some recent linuxes).
rant-off.
to check if core dumps work, to "killall -7 mserver". to enable core dumps on ubuntu, see here:
https://daq00.triumf.ca/DaqWiki/index.php/Ubuntu
last known-working point is:
commit 9d2ef471c4e4a5a325413e972862424549fa1ed5
Author: Ben Smith <bsmith@triumf.ca>
Date: Wed Jul 13 14:45:28 2022 -0700
Allow odbxx to handle connecting to "/" (avoid trying to read subkeys as "//Equipment" etc.
K.O. |
2491
|
28 Apr 2023 |
Martin Mueller | Forum | Problem with running midas odbxx frontends on a remote machine using the -h option | > > As i said we can easily reproduce this with midas/examples/odbxx/odbxx_test.cpp (with cm_connect_experiment changed to "localhost")
> > [test,ERROR] [system.cxx:5104:recv_tcp2,ERROR] unexpected connection closure
> > [test,ERROR] [system.cxx:5158:ss_recv_net_command,ERROR] error receiving network command header, see messages
> > [test,ERROR] [midas.cxx:13900:rpc_call,ERROR] routine "db_copy_xml": error, ss_recv_net_command() status 411, program abort
>
> ok, cool. looks like we crashed the mserver. either run mserver attached to gdb or enable mserver core dump, we need it's stack trace,
> the correct stack trace should be rooted in the handler for db_copy_xml.
>
> but most likely odbxx is asking for more data than can be returned through the MIDAS RPC.
>
> what is the ODB key passed to db_copy_xml() and how much data is in ODB at that key? (odbedit "du", right?).
>
> K.O.
Ok. Maybe i have to make this more clear. ANY odbxx access of a remote odb reproduces this error for us on multiple machines.
It does not matter how much data odbxx is asking for.
Something as simple as this reproduces the error, asking for a single integer:
int main() {
cm_connect_experiment("localhost", "Mu3e", "test", NULL);
midas::odb o = {
{"Int32 Key", 42}
};
o.connect("/Test/Settings");
cm_disconnect_experiment();
return 1;
}
at the same time this runs fine:
int main() {
cm_connect_experiment(NULL, NULL, "test", NULL);
midas::odb o = {
{"Int32 Key", 42}
};
o.connect("/Test/Settings");
cm_disconnect_experiment();
return 1;
}
in both cases mserver does not crash. I do not have a stack trace. There is also no error produced by mserver.
Last year we did not have these problems with the same midas frontends (For example in midas commit 9d2ef471 the code from above runs
fine). I am trying to pinpoint the exact commit where this stopped working now. |
2490
|
28 Apr 2023 |
Konstantin Olchanski | Forum | Problem with running midas odbxx frontends on a remote machine using the -h option | > As i said we can easily reproduce this with midas/examples/odbxx/odbxx_test.cpp (with cm_connect_experiment changed to "localhost")
> [test,ERROR] [system.cxx:5104:recv_tcp2,ERROR] unexpected connection closure
> [test,ERROR] [system.cxx:5158:ss_recv_net_command,ERROR] error receiving network command header, see messages
> [test,ERROR] [midas.cxx:13900:rpc_call,ERROR] routine "db_copy_xml": error, ss_recv_net_command() status 411, program abort
ok, cool. looks like we crashed the mserver. either run mserver attached to gdb or enable mserver core dump, we need it's stack trace,
the correct stack trace should be rooted in the handler for db_copy_xml.
but most likely odbxx is asking for more data than can be returned through the MIDAS RPC.
what is the ODB key passed to db_copy_xml() and how much data is in ODB at that key? (odbedit "du", right?).
K.O. |
2489
|
28 Apr 2023 |
Konstantin Olchanski | Suggestion | Maximum ODB size | > > Is this maybe related to what Stefan said about the run start - so that odbedit needs some time to load the bigger ODB?
>
> At the run start mlogger writes the ODB to the .mid file. This needs conversion (binary ODB -> XML ASCII) which can take time.
> This does NOT depend on the ODB size, but on the ODB *content*.
>
Yes and no. They must be storing more than 100 Mbytes of stuff in ODB, if they are asking to bump ODB size from 100 Mbyte to 2 GByte ODB.
On the MIDAS, side, though, we have to plan for the worst case, if max ODB size 1.9 GB and it is full of data,
and mlogger (and odbedit save and load) take 10-30 seconds, then at least all timeouts (watchdog timeout, RPC timeout, etc)
must be increased accordingly.
K.O. |
2488
|
28 Apr 2023 |
Konstantin Olchanski | Suggestion | Maximum ODB size | > > Congratulations. created != "it works".
>
> Two other tings to consider:
>
> 1) The ODB shared memory is dumped into a binary file (".ODB.SHM") after the last client finished and read if the first client starts, to get it persistent.
> So this could slow down starting and stopping, but only the first client, so I guess it's not an issue.
>
typical disk writing speed is 100-1000 Mbytes/sec, so writing 1 GB .ODB.SHM will take 1-10 seconds. NFS over 1gige network is 100 Mbytes/sec, so 10 seconds to
write .ODB.SHM. embedded ARM write speed to SD flash can be as low as 10 Mbytes/sec, so up to 100 seconds.
>
> 2) Traditionally, the ODB gets dumped to the .mid file at the beginning and end of every run, so that one know the exact configuration of the experiment
> for offline analysis. This can be turned off of course, but most experiments use it. If the ODB is dumped in any ASCII format, this can take quite long.
> Assume it takes 10 seconds at the beginning of each run, and we take a run every five minutes. Then we loose 48 mins of precious beam time every day.
>
new default is to save as JSON, (as of my last measurement) JSON encoder is faster than the XML (and ODB?) encoder, by default result is compressed by GZIP-1 (66
Mbytes/sec is my old benchmark, should remeasure on new DDR5 machines), compressed JSON is written .mid.gz file at disk speed (as above). Alternatively, use LZ4
compression, runs roughly at memcpy() speed, less compression, written to .mid.lz4 at disk speed.
if data storage is ZFS, ZFS built-in LZ4 compression is now enabled by default, so result writing uncompressed .mid file (no compression of ODB dump), should be
roughly same as when using MIDAS LZ4 compression and writing .mid.lz4.
bottom line, I need to remeasure gzip and lz4 compression speeds on new computers (DDR4 AMD 5000 series and DDR5 AMD 7000 series).
K.O. |
2487
|
28 Apr 2023 |
Marius Koeppel | Suggestion | Maximum ODB size | > At the run start mlogger writes the ODB to the .mid file. This needs conversion (binary ODB -> XML ASCII) which can take time.
> This does NOT depend on the ODB size, but on the ODB *content*. Every key in the ODB takes time to convert. So if your ODB as 1.5 GB
> but only a few keys, this is still fast. Only if you have 200 million keys int he ODB, then mlogger takes lots of time to convert
> 200 million values to XML or JSON strings.
This was also my assumption. Is this the same for odbedit -c save FILE?
Because this is what I tested with the script and there one can see in the plot that the time increases to write the file if the ODB size increases.
The content of the ODB is always the same - one STRING key in the directory Test.
Best,
Marius |
2486
|
28 Apr 2023 |
Stefan Ritt | Suggestion | Maximum ODB size | > Is this maybe related to what Stefan said about the run start - so that odbedit needs some time to load the bigger ODB?
At the run start mlogger writes the ODB to the .mid file. This needs conversion (binary ODB -> XML ASCII) which can take time.
This does NOT depend on the ODB size, but on the ODB *content*. Every key in the ODB takes time to convert. So if your ODB as 1.5 GB
but only a few keys, this is still fast. Only if you have 200 million keys int he ODB, then mlogger takes lots of time to convert
200 million values to XML or JSON strings.
Stefan |
2485
|
28 Apr 2023 |
Marius Koeppel | Suggestion | Maximum ODB size | > my vote is to bump the ODB size limit to 1999*1000*1000 (not quite 2GB). but this needs to be tested. especially save and restore from ODB, XML and JSON files, including how long it takes to save and load a 1.9GB ODB. K.O.
I had some fun with python and created a test script which can be executed in the MIDASSYS/online folder (test_odb.py). I did not really normalize the time so it will be different at different systems but I guess the trend is important (see create_time.pdf).
What is surprising to me is that even that I only write one STRING key to the time increases. Is this maybe related to what Stefan said about the run start - so that odbedit needs some time to load the bigger ODB?
Second thing is that also the creation / storing and load time is increasing. Should this be or is there a bug in the code I use or again is this related to the previous point?
The test of comparing the ODB after store / load / store already fails for the json format. I know I only test if the dicts are the same, so for timestamps this already fails.
But what is strange here is that sometimes the test works sometimes not and its different from run to run.
I will try to improve the test a bit more but for a short update this is how it looks so fare.
Best,
Marius |
|