12 Feb 2025, Mark Grimes, Forum, TMFeRpcHandlerInterface::HandleEndRun when running offline on a Midas file
|
Hi,
I have a manalyzer that uses a derived class of TMFeRpcHandlerInterface to communicate information to
Midas during online running. At the end of each run it saves out custom data in the
TMFeRpcHandlerInterface::HandleEndRun override. This works really well.
However, when I run offline on a Midas output file the HandleEndRun method is never called and my data is
never saved. Is this intentional? I understand that there is no point for the HandleBinaryRpc method offline,
but the other methods (HandleEndRun, HandleBeginRun etc) could serve a purpose. Or is it a conscious
choice to ignore all of TMFeRpcHandlerInterface when offline?
Thanks,
Mark. |
25 Mar 2025, Mark Grimes, Forum, TMFeRpcHandlerInterface::HandleEndRun when running offline on a Midas file
|
Hi,
The question was about the TMFeRpcHandlerInterface, not the TARunObject interface. Derived classes of TARunObject do indeed work as expected in our
environment. We have worked around the issue by using an implementation of TARunObject as well as the (separate) implementation of
TMFeRpcHandlerInterface.
Thanks,
Mark.
> > I have a manalyzer that uses a derived class of TMFeRpcHandlerInterface to communicate information to
> > Midas during online running. At the end of each run it saves out custom data in the
> > TMFeRpcHandlerInterface::HandleEndRun override. This works really well.
> > However, when I run offline on a Midas output file the HandleEndRun method is never called and my data is
> > never saved. Is this intentional? I understand that there is no point for the HandleBinaryRpc method offline,
> > but the other methods (HandleEndRun, HandleBeginRun etc) could serve a purpose. Or is it a conscious
> > choice to ignore all of TMFeRpcHandlerInterface when offline?
>
> apologies for delayed response.
>
> I saw the question, completely did not understand it, only now got around to figure out what is going on.
>
> according to manalyzer/README.md, section "manalyzer module and object life time", BeginRun() and EndRun() is called
> always. Offline and online. What you see would be a bug that we do not see in our environment. I confirm this by
> running manalyzer in a demo mode: ./bin/manalyzer_test.exe --demo -e10 -t
>
> no, wait, you say you use HandleBeginRun() and HandleEndRun(). this is not right, they are not part of the manalyzer
> API and they indeed are only used when running online.
>
> correct solution would be to use BeginRun() and EndRun() instead of HandleBeginRun() and HandleEndRun().
>
> you could also save your data in the module destructor (although good programming recommendation is to use
> destructor only for unavoidable things, like freeing memory, etc).
>
> K.O. |
26 Mar 2025, Mark Grimes, Forum, TMFeRpcHandlerInterface::HandleEndRun when running offline on a Midas file
|
This was exactly the question, should I expect it to run? There's no point in the HandleBinaryRpc method offline, but there's an argument that the HandleBeginRun/HandleEndRun methods have a use.
I have the answer and we have a workaround, thanks.
> then I do not understand the question. TMFeRpcHandlerInterface stuff is only used when running online and connected to MIDAS. How does it come into the
> picture when you analyze a data file offline? ProcessMidasOnlineTmfe() does not run, the RpcHandler object is not constructed.
>
> maybe if you point me to your source code, I can see what you are doing?
>
> K.O. |
04 Jun 2025, Mark Grimes, Bug Report, Memory leak in mhttpd binary RPC code
|
Hi,
During an evening of running we noticed that memory usage of mhttpd grew to close to 100Gb. We think we've traced this to the following issue when making RPC calls.
- The brpc method allocates memory for the response at src/mjsonrpc.cxx#lines-3449.
- It then makes the call at src/mjsonrpc.cxx#lines-3460, which may set `buf_length` to zero if the response was empty.
- It then uses `MJsonNode::MakeArrayBuffer` to pass ownership of the memory to an `MJsonNode`, providing `buf_length` as the size.
- When the `MJsonNode` is destructed at mjson.cxx#lines-657, it only calls `free` on the buffer if the size is greater than zero.
Hence, mhttpd will leak at least 1024 bytes for every binary RPC call that returns an empty response.
I tried to submit a pull request to fix this but I don't have permission to push to https://bitbucket.org/tmidas/mjson.git. Could somebody take a look?
Thanks,
Mark. |
07 Jun 2025, Mark Grimes, Bug Report, Memory leak in mhttpd binary RPC code
|
Hi,
We applied an intermediate fix for this locally and it seems to have fixed our issue. The attached plot shows the percentage memory use on our machine with 128 Gb memory, as a rough proxy for mhttpd memory use. After applying our fix mhttpd seems to be happy using ~7% of the memory after being up for 2.5 days.
Our fix to mjson was:
diff --git a/mjson.cxx b/mjson.cxx
index 17ee268..2443510 100644
--- a/mjson.cxx
+++ b/mjson.cxx
@@ -654,8 +654,7 @@ MJsonNode::~MJsonNode() // dtor
delete subnodes[i];
subnodes.clear();
- if (arraybuffer_size > 0) {
- assert(arraybuffer_ptr != NULL);
+ if (arraybuffer_ptr != NULL) {
free(arraybuffer_ptr);
arraybuffer_size = 0;
arraybuffer_ptr = NULL;
We also applied the following in midas for good measure, although I don't think it contributed to the leak we were seeing:
diff --git a/src/mjsonrpc.cxx b/src/mjsonrpc.cxx
index 2201d228..38f0b99b 100644
--- a/src/mjsonrpc.cxx
+++ b/src/mjsonrpc.cxx
@@ -3454,6 +3454,7 @@ static MJsonNode* brpc(const MJsonNode* params)
status = cm_connect_client(name.c_str(), &hconn);
if (status != RPC_SUCCESS) {
+ free(buf);
return mjsonrpc_make_result("status", MJsonNode::MakeInt(status));
}
I hope this is useful to someone. As previously mentioned we make heavy use of binary RPC, so maybe other experiments don't run into the same problem.
Thanks,
Mark. |
15 Jun 2025, Mark Grimes, Bug Report, Memory leak in mhttpd binary RPC code
|
Many thanks for the fix. We've applied and see better memory performance. We still have to kill and restart
mhttpd after a few days however. I think the official fix is missing this part:
diff --git a/src/mjsonrpc.cxx b/src/mjsonrpc.cxx
index 2201d228..38f0b99b 100644
--- a/src/mjsonrpc.cxx
+++ b/src/mjsonrpc.cxx
@@ -3454,6 +3454,7 @@ static MJsonNode* brpc(const MJsonNode* params)
status = cm_connect_client(name.c_str(), &hconn);
if (status != RPC_SUCCESS) {
+ free(buf);
return mjsonrpc_make_result("status", MJsonNode::MakeInt(status));
}
When the other process returns a failure the memory block is also currently leaked. I originally stated "...although I
don't think it contributed to the leak we were seeing" but it seems this was false.
Thanks,
Mark.
> I confirm that MJSON_ARRAYBUFFER does not work correctly for zero-size buffers,
> buffer is leaked in the destructor and copied as NULL in MJsonNode::Copy().
>
> I also confirm memory leak in mjsonrpc "brpc" error path (already fixed).
>
> Affected by the MJSON_ARRAYBUFFER memory leak are "brpc" (where user code returns
> a zero-size data buffer) and "js_read_binary_file" (if reading from an empty
> file, return of "new char[0]" is never freed).
>
> "receive_event" and "read_history" RPCs never use zero-size buffers and are not
> affected by this bug.
>
> mjson commit c798c1f0a835f6cea3e505a87bbb4a12b701196c
> midas commit 576f2216ba2575b8857070ce7397210555f864e5
> rootana commit a0d9bb4d8459f1528f0882bced9f2ab778580295
>
> Please post bug reports a plain-text so I can quote from them.
>
> K.O. |
04 Jul 2025, Mark Grimes, Bug Report, Memory leaks in mhttpd
|
Something changed in our system and we started seeing memory leaks in mhttpd again. I guess someone
updated some front end or custom page code that interacted with mhttpd differently.
I found a few memory leaks in some (presumably) rarely seen corner cases and we now see steady
memory usage. The branch is fix/memory_leaks
(https://bitbucket.org/tmidas/midas/branch/fix/memory_leaks) and I opened pull request #55
(https://bitbucket.org/tmidas/midas/pull-requests/55). I couldn't find a BitBucket account for you
Konstantin to add as a reviewer, so it currently has none.
Thanks,
Mark. |
16 Sep 2024, Marius Köppel, Bug Report, Crash using ODB watch
|
Hi all,
last week I was running MIDAS with the commit 3ad98c5. Today I updated MIDAS and now all my watch functions are crashing. Attached I have a minimal example frontend of the problem.
In our software we have two functions one which sets up the ODB values of the frontend and another one which sets up all watch functions. So overall we connect two time to the ODB during fronend_init one time to create the values and one time to create the watch. In the example code a simple version of this setup is shown:
INT frontend_init() {
cm_msg(MINFO, "frontend_init() setup", "Test FE");
odb settings = {
{"Test", 123},
{"sub", {}}
};
settings.connect_and_fix_structure("/Equipment/Test FE/Settings");
// settings.watch(watch); <-- this works without segmentation fault
odb new_settings("/Equipment/Test FE/Settings");
new_settings.watch(watch); // <-- here I am getting a segmentation fault
return CM_SUCCESS;
}
When I directly set the watch everything runs fine however, when I create a new ODB object and use this one to set a watch I am getting the following segmentation fault:
Process 18474 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x34)
frame #0: 0x000000010004fa38 test_fe`midas::odb::watch_callback(hDB=<unavailable>, hKey=<unavailable>, index=0, info=0x00006000002001c0) at odbxx.cxx:96:25 [opt]
93 if (po->m_data == nullptr)
94 mthrow("Callback received for a midas::odb object which went out of scope");
95 midas::odb *poh = search_hkey(po, hKey);
-> 96 poh->m_last_index = index;
97 po->m_watch_callback(*poh);
98 poh->m_last_index = -1;
99 }
Best,
Marius |
12 Jun 2019, Marius Koeppel, Forum, Strange JS array creation
|
Hello everybody,
I have a strange JS behavior. In one of my frontends I create a key in the ODB with:
db_create_key(hDB, 0, "Equipment/Switching/Variables/DATA_WRITE", TID_INT);
In my custom page I have a JS function which loops over an array and sets the
value of this key with:
for (i = 0; i < lines.length; i++) {
modbset("/Equipment/Switching/Variables/DATA_WRITE[" + String(i) + "]",
parseInt(lines[i]));
}
After calling this function I have an array in the ODB now. For my understanding
calling an INT like an array shouldn't be possible. So is this dangerous to do?
Best regards,
Marius |
24 Jun 2019, Marius Koeppel, Forum, Strange JS array creation
|
> > for (i = 0; i < lines.length; i++) {
> > modbset("/Equipment/Switching/Variables/DATA_WRITE[" + String(i) + "]", parseInt(lines[i]));
> > }
>
> this is wrong.
>
> a) you are programming javascript as if it were C/C++. You think this code wrote lines.length() values
> to ODB, when what the code actually did is queued lines.length() RPC requests for later execution.
> Eventually some time later, each RPC request will open a connection to mhttpd, send a request, wait
> for mhtttpd to process it, etc. Where do you wait for the completion of all these RPCs before
> proceeding as if all the data has been successfully written to ODB? (answer: you cannot, javascript
> cannot "wait for things", instead you have to make chains of event handlers. javascript != C/C++.
> They are completely different).
--> Following your discussion about async. functions I will change this part of the code and make chains of
event handlers.
> b) you should write the whole array in one operation instead of looping over each element. see
> mjsonrpc_db_paste() and example.html.
--> In the midas back-end I never created an array. I created an INT in the ODB with db_create_key(hDB, 0,
"Equipment/Switching/Variables/DATA_WRITE", TID_INT). By using modset in javascript and parsing the string
"/Equipment/Switching/Variables/DATA_WRITE[" + String(i) + "]" I call it like an array and it shows up like an
array in the ODB. So for explaining it a bit better how the value changes in the ODB take this pseudo code
example:
// midas part //
> int a = 1; // this is more or less what I think db_create_key is doing in the ODB
// midas part //
// ODB //
> print(a) // this prints me 1 and this is also the value what I see in the ODB
// ODB //
// javascript part //
> for int i in [1,2,3,4] do
> modset(a[i], i) // for simplification I don't use event handlers here
> end for
// javascript part //
// ODB //
> print(a) // now I see [1,2,3,4]
// ODB //
This example violates type safety. I know that javascript is not type safe. According to this I would like
to know if this behavior is wanted or why there is no bounds checking?
> I do not understand your question about "calling an INT like an array".
--> Here I mean that I call the variable in the ODB via string passing, like I would call a variable, which is
an array. I don't speak about function calls.
> parseInt() (defined where?)
--> This is a global JavaScript function (https://www.w3schools.com/jsref/jsref_obj_global.asp)
Cheers,
Marius |
12 Feb 2020, Marius Koeppel, Forum, Difference between "Event Data Size" and "All Bank Size"
|
Dear all,
we are trying to build Midas events on FPGAs and send them directly to the midas
ring buffer via copy_n. According to the wiki
https://midas.triumf.ca/MidasWiki/index.php/Event_Structure Event Data Size is:
"The event data size contains the size of the event in bytes excluding the
header." and All Bank Size is: "Size in bytes of the following data plus the
size of the bank header". So are they actually the same or what header is the
header in the first sentence also the bank header?
Cheers,
Marius
|
13 Feb 2020, Marius Koeppel, Forum, Writting Midas Events via FPGAs
|
Dear all,
we creating Midas events directly inside a FPGA and send them off via DMA into the PC RAM. For reading out this RAM via Midas the FPGA sends as a pointer where it has written the last 4kB of data. We use this pointer for telling the ring buffer of midas where the new events are. The buffer looks something like:
// event 1
dma_buf[0] = 0x00000001; // Trigger and Event ID
dma_buf[1] = 0x00000001; // Serial number
dma_buf[2] = TIME; // time
dma_buf[3] = 18*4-4*4; // event size
dma_buf[4] = 18*4-6*4; // all bank size
dma_buf[5] = 0x11; // flags
// bank 0
dma_buf[6] = 0x46454230; // bank name
dma_buf[7] = 0x6; // bank type TID_DWORD
dma_buf[8] = 0x3*4; // data size
dma_buf[9] = 0xAFFEAFFE; // data
dma_buf[10] = 0xAFFEAFFE; // data
dma_buf[11] = 0xAFFEAFFE; // data
// bank 1
dma_buf[12] = 0x1; // bank name
dma_buf[12] = 0x46454231; // bank name
dma_buf[13] = 0x6; // bank type TID_DWORD
dma_buf[14] = 0x3*4; // data size
dma_buf[15] = 0xAFFEAFFE; // data
dma_buf[16] = 0xAFFEAFFE; // data
dma_buf[17] = 0xAFFEAFFE; // data
// event 2
.....
dma_buf[fpga_pointer] = 0xXXXXXXXX;
And we do something like:
while{true}
// obtain buffer space
status = rb_get_wp(rbh, (void **)&pdata, 10);
fpga_pointer = fpga.read_last_data_add();
wlen = last_fpga_pointer - fpga_pointer; \\ in 32 bit words
copy_n(&dma_buf[last_fpga_pointer], wlen, pdata);
rb_status = rb_increment_wp(rbh, wlen * 4); \\ in byte
last_fpga_pointer = fpga_pointer;
Leaving the case out where the dma_buf wrap around this works fine for a small data rate. But if we increase the rate the fpga_pointer also increases really fast and wlen gets quite big. Actually it gets bigger then max_event_size which is checked in rb_increment_wp leading to an error.
The problem now is that the event size is actually not to big but since we have multi events in the buffer which are read by midas in one step. So we think in this case the function rb_increment_wp is comparing actually the wrong thing. Also increasing the max_event_size does not help.
Remark: dma_buf is volatile so memcpy is not possible here.
Cheers,
Marius |
20 Feb 2020, Marius Koeppel, ,
|
We also agree and found the problem now. Since we build everything (MIDAS Event Header, Bank Header, Banks etc.) in the FPGA we had some struggle with the MIDAS data format (http://lmu.web.psi.ch/docu/manuals/bulk_manuals/software/midas195/html/AppendixA.html). We thought that only the MIDAS Event needs to be aligned to 64 bit but as it turned out also the bank data (Stefan updated the wiki page already) needs to be aligned. Since we are using the BANK32 it was a bit unclear for us since the bank header is not 64 bit aligned. But we managed this now by adding empty data and the system is running now.
Our setup looks like this:
- mfe.cxx multithread equipment
- mfe readout thread grabs pointer from dma ring buffer
- since the dma buffer is volatile we do copy_n for transforming the data to MIDAS
- the data is already in the MIDAS format so done from our side :)
- mfe readout thread increments the ring buffer
- mfe main thread grabs events from ring buffer, sends them to the mserver
From the firmware side we have an Arria 10 development board and
But now I am curious, which DMA controller you use? The Altera or Xilinx PCIe block with the vendor supplied DMA driver? Or you do DMA on an ARM SoC FPGA? (no PCI/PCIe,
different DMA controller, different DMA driver).
I am curious because we will be implementing pretty much what you do on ARM SoC FPGAs pretty soon, so good to know
if there is trouble to expect.
But I will probably use the tmfe.h c++ frontend and a "pure c++" ring buffer instead of mfe.cxx and the midas "rb" ring buffer.
(I did not look at your code at all, there could be a bug right there, this ring buffer stuff is tricky. With luck there is no bug
in your dma driver. The dma drivers for our vme bridges did do have bugs).
K.O. |
20 Feb 2020, Marius Koeppel, Forum, Writting Midas Events via FPGAs
|
We also agree and found the problem now. Since we build everything (MIDAS Event Header, Bank Header, Banks etc.) in the FPGA we had some struggle with the MIDAS data format (http://lmu.web.psi.ch/docu/manuals/bulk_manuals/software/midas195/html/AppendixA.html). We thought that only the MIDAS Event needs to be aligned to 64 bit but as it turned out also the bank data (Stefan updated the wiki page already) needs to be aligned. Since we are using the BANK32 it was a bit unclear for us since the bank header is not 64 bit aligned. But we managed this now by adding empty data and the system is running now.
Our setup looks like this:
Software:
- mfe.cxx multithread equipment
- mfe readout thread grabs pointer from dma ring buffer
- since the dma buffer is volatile we do copy_n for transforming the data to MIDAS
- the data is already in the MIDAS format so done from our side :)
- mfe readout thread increments the ring buffer
- mfe main thread grabs events from ring buffer, sends them to the mserver
Firmware:
- Arria 10 development board
- Altera PCIe block
- Own DMA engine since we are doing burst writing DMA with PCIe 3.0.
- Own device driver
- no interrupts
If you have more questions fell free to ask. |
23 Feb 2020, Marius Koeppel, Forum, Writting Midas Events via FPGAs
|
> > We also agree and found the problem now.
>
> Good. what was wrong?
>
> > - Own DMA engine since we are doing burst writing DMA with PCIe 3.0.
> > - Own device driver
>
> Scary stuff.
>
> > - no interrupts
>
> Right. Best I can tell, interrupts no longer useful in Linux - interrupt handler cannot do any real work, has to hand off to a kernel thread, resulting
> in so much latency and overhead that one might as well poll for the data... And for DMA data transfers, the data rate is well known,
> so easy to predict how long the DMA will run for and sleep for that amount of time instead of waiting for an interrupt.
>
> K.O.
So the problem was that we assumed that the bank (with the header) needs to be 64bit aligned. Even more we aligned the hole Midas event to 256bit in the fpga since we have a 250mhz x 256 Bit interface for PCIe. But then we saw that you align the bank data to 64bit -> crash of mdump etc. For now we generate the data on the FPGA in the „old“ Midas format. So having a flag for changing to a different alignment would be actually really nice.
Cheers,
Marius |
28 May 2020, Marius Koeppel, Suggestion, ODB++ API - documantion updates and odb view after key creation
|
Hello everybody,
I am really appreciate the development of the new odb++ API. So I directly started to rewrite the code for the Mu3e DAQ system.
I have a view questions / suggestions which came up during my work so fare:
1. The documentation seems to be quite new so there are some variables wrong named and small typo stuff. I would like to fix them. Should I request for an account or what else is needed to change them?
2. When I create an ODB structure with the new API I do for example:
midas::odb stream_settings = {
{"Test_odb_api", {
{"Divider", 1000}, // int
{"Enable", false}, // bool
}},
};
stream_settings.connect("/Equipment/Test/Settings", true);
and with
midas::odb datagen("/Equipment/Test/Settings/Test_odb_api");
std::cout << "Datagenerator Enable is " << datagen["Enable"] << std::endl;
I am getting back false. Which looks nice but when I look into the odb via the browser the value is actually "y" meaning true which is stange. I added my frontend where I cleaned all function leaving only the frontend_init() one where I create this key. Its a cuda program but since I clean everything no cuda function is called anymore.
Thank you again for the nice development!
Cheers,
Marius |
04 Jun 2020, Marius Koeppel, Suggestion, ODB++ API - documantion updates and odb view after key creation
|
Hi Stefan,
your test program was only working for me after I changed the following lines inside the odbxx.cpp
diff --git a/src/odbxx.cxx b/src/odbxx.cxx
index 24b5a135..48edfd15 100644
--- a/src/odbxx.cxx
+++ b/src/odbxx.cxx
@@ -753,7 +753,12 @@ namespace midas {
}
} else {
u_odb u = m_data[index];
- status = db_set_data_index(m_hDB, m_hKey, &u, rpc_tid_size(m_tid), index, m_tid);
+ if (m_tid == TID_BOOL) {
+ BOOL ss = bool(u);
+ status = db_set_data_index(m_hDB, m_hKey, &ss, rpc_tid_size(m_tid), index, m_tid);
+ } else {
+ status = db_set_data_index(m_hDB, m_hKey, &u, rpc_tid_size(m_tid), index, m_tid);
+ }
if (m_debug) {
std::string s;
u.get(s);
Likely not the best fix but otherwise I was always getting after running the test program:
[ODBEdit,INFO] Program ODBEdit on host localhost started
[local:Default:S]/>cd Equipment/Test/Settings/Test_odb_api/
key not found
makoeppe@office ~/mu3e/online/online (git)-[odb++_api] % test_connect
Created ODB key /Equipment/Test/Settings
Created ODB key /Equipment/Test/Settings/Test_odb_api
Created ODB key /Equipment/Test/Settings/Test_odb_api/Divider
Set ODB key "/Equipment/Test/Settings/Test_odb_api/Divider" = 1000
Created ODB key /Equipment/Test/Settings/Test_odb_api/Enable
Set ODB key "/Equipment/Test/Settings/Test_odb_api/Enable" = false
Get definition for ODB key "/Equipment/Test/Settings/Test_odb_api"
Get definition for ODB key "/Equipment/Test/Settings/Test_odb_api/Divider"
Get ODB key "/Equipment/Test/Settings/Test_odb_api/Divider": 1000
Get definition for ODB key "/Equipment/Test/Settings/Test_odb_api/Enable"
Get ODB key "/Equipment/Test/Settings/Test_odb_api/Enable": false
Get definition for ODB key "/Equipment/Test/Settings/Test_odb_api/Divider"
Get ODB key "/Equipment/Test/Settings/Test_odb_api/Divider": 1000
Get definition for ODB key "/Equipment/Test/Settings/Test_odb_api/Enable"
Get ODB key "/Equipment/Test/Settings/Test_odb_api/Enable": false
Datagenerator Enable is Get ODB key "/Equipment/Test/Settings/Test_odb_api/Enable": false
false
makoeppe@office ~/mu3e/online/online (git)-[odb++_api] % odbedit
[ODBEdit,INFO] Program ODBEdit on host localhost started
[local:Default:S]/>cd Equipment/Test/Settings/Test_odb_api/
[local:Default:S]Test_odb_api>ls
Divider 1000
Enable y
> > I am getting back false. Which looks nice but when I look into the odb via the browser the value is actually "y" meaning true which is stange.
> > I added my frontend where I cleaned all function leaving only the frontend_init() one where I create this key. Its a cuda program but since
> > I clean everything no cuda function is called anymore. |
08 Jun 2020, Marius Koeppel, Suggestion, ODB++ API - documantion updates and odb view after key creation
|
Hi Stefan,
I agree with your explanation about the size of BOOL and bool.
I checked the program also on my Raspberry and there the old code works like on your mac. I don't really understand
why the behavior is different for my system. The initializing of the union should also work for my system.
At the moment I am using:
Arch Linux
Linux office 5.4.42-1-lts #1 SMP Wed, 20 May 2020 20:42:53 +0000 x86_64 GNU/Linux
gcc version 10.1.0 (GCC)
One thing which makes me a bit suspicious is that if I do:
u_odb u = m_data[index];
char dest[rpc_tid_size(m_tid)];
memcpy(dest, &u, rpc_tid_size(m_tid));
Clion tells me "Clang-Tidy: Undefined behavior, source object type 'midas::u_odb' is not TriviallyCopyable".
I am not sure if this is the problem since I am not so familiar with this TriviallyCopyable. I need to further investigate here.
So fare the update from my side.
Cheers,
Marius
> Hi Marius,
>
> your fix is good. Thanks for digging out this deep-lying issue, which would have haunted us if we would not fix it.
> The problem is that in midas, the "BOOL" type is 4 Bytes long, actually modelled after MS Windows. Now I realized
> that in c++, the "bool" type is only 1 Byte wide. So if we do the memcopy from a "c++ bool" to a "MIDAS BOOL", we
> always copy four bytes, meaning that we copy three Bytes beyond the one-byte value of the c++ bool. So your fix
> is absolutely correct, and I added it in one more space where we deal with bool arrays, where we need the same.
>
> What I don't understand however is the fact why this fails for you. The ODB values are stored in the C union under
>
> union {
> ...
> bool m_bool;
> double m_double;
> std::string *m_string;
> ...
> }
>
> Now the C compiler puts all values at the lowest address, so m_bool is at offset zero, and the string pointer reaches
> over all eight bytes (we are on 64-bit OS).
>
> Now when I initialize this union in odbxx.h:66, I zero the string pointer which is the widest object:
>
> u_odb() : m_string{} {};
>
> which (at least on my Mac) sets all eight bytes to zero. If I then use the wrong code to set the bool value to the ODB
> in odbxx.cxx:756, I do
>
> db_set_data_index(... &u, rpc_tid_size(m_tid), ...);
>
> so it copies four bytes (=rpc_tid_size(TID_BOOL)) to the ODB. The first byte should be the c++ bool value (0 or 1),
> and the other three bytes should be zero from the initialization above. Apparently on your system, this is not
> the case, and I would like you to double check it. Maybe there is another underlying problem which I don't understand
> at the moment but which we better fix.
>
> Otherwise the change is committed and your code should work. But we should not stop here! I really want to understand
> why this is not working for you, maybe I miss something.
>
> Best,
> Stefan
>
> > Hi Stefan,
> >
> > your test program was only working for me after I changed the following lines inside the odbxx.cpp
> >
> > diff --git a/src/odbxx.cxx b/src/odbxx.cxx
> > index 24b5a135..48edfd15 100644
> > --- a/src/odbxx.cxx
> > +++ b/src/odbxx.cxx
> > @@ -753,7 +753,12 @@ namespace midas {
> > }
> > } else {
> > u_odb u = m_data[index];
> > - status = db_set_data_index(m_hDB, m_hKey, &u, rpc_tid_size(m_tid), index, m_tid);
> > + if (m_tid == TID_BOOL) {
> > + BOOL ss = bool(u);
> > + status = db_set_data_index(m_hDB, m_hKey, &ss, rpc_tid_size(m_tid), index, m_tid);
> > + } else {
> > + status = db_set_data_index(m_hDB, m_hKey, &u, rpc_tid_size(m_tid), index, m_tid);
> > + }
> > if (m_debug) {
> > std::string s;
> > u.get(s);
> >
> > Likely not the best fix but otherwise I was always getting after running the test program:
> >
> > [ODBEdit,INFO] Program ODBEdit on host localhost started
> > [local:Default:S]/>cd Equipment/Test/Settings/Test_odb_api/
> > key not found
> > makoeppe@office ~/mu3e/online/online (git)-[odb++_api] % test_connect
> > Created ODB key /Equipment/Test/Settings
> > Created ODB key /Equipment/Test/Settings/Test_odb_api
> > Created ODB key /Equipment/Test/Settings/Test_odb_api/Divider
> > Set ODB key "/Equipment/Test/Settings/Test_odb_api/Divider" = 1000
> > Created ODB key /Equipment/Test/Settings/Test_odb_api/Enable
> > Set ODB key "/Equipment/Test/Settings/Test_odb_api/Enable" = false
> > Get definition for ODB key "/Equipment/Test/Settings/Test_odb_api"
> > Get definition for ODB key "/Equipment/Test/Settings/Test_odb_api/Divider"
> > Get ODB key "/Equipment/Test/Settings/Test_odb_api/Divider": 1000
> > Get definition for ODB key "/Equipment/Test/Settings/Test_odb_api/Enable"
> > Get ODB key "/Equipment/Test/Settings/Test_odb_api/Enable": false
> > Get definition for ODB key "/Equipment/Test/Settings/Test_odb_api/Divider"
> > Get ODB key "/Equipment/Test/Settings/Test_odb_api/Divider": 1000
> > Get definition for ODB key "/Equipment/Test/Settings/Test_odb_api/Enable"
> > Get ODB key "/Equipment/Test/Settings/Test_odb_api/Enable": false
> > Datagenerator Enable is Get ODB key "/Equipment/Test/Settings/Test_odb_api/Enable": false
> > false
> > makoeppe@office ~/mu3e/online/online (git)-[odb++_api] % odbedit
> > [ODBEdit,INFO] Program ODBEdit on host localhost started
> > [local:Default:S]/>cd Equipment/Test/Settings/Test_odb_api/
> > [local:Default:S]Test_odb_api>ls
> > Divider 1000
> > Enable y
> >
> > > > I am getting back false. Which looks nice but when I look into the odb via the browser the value is actually "y" meaning true which is stange.
> > > > I added my frontend where I cleaned all function leaving only the frontend_init() one where I create this key. Its a cuda program but since
> > > > I clean everything no cuda function is called anymore. |
16 Jun 2020, Marius Koeppel, Suggestion, ODB++ API - documantion updates and odb view after key creation
|
Hi Stefan,
I played around with the code a bit more and I found out that if I do:
midas::odb test_settings = {{"Enable", false}};
test_settings.connect("/Equipment/Test/Test", true);
The correct value ends up in the odb. In this case an u_odb instance is created
with a clean m_string. But if I run the other code an odb instanceo is created and
the values of m_data are set in
odbxx.h:
odb(std::initializer_list<std::pair<const char *, midas::odb>> list) : odb() {...
This values are comming from u_odb instances since the code does:
odbxx.h:
auto o = new midas::odb(element.second);
and then
odbxx.h:
odb(T v):odb() {
m_num_values = 1;
m_data = new u_odb[1]{v};
m_tid = m_data[0].get_tid();
m_data[0].set_parent(this);
}
and looking at
odbxx.h:
u_odb(bool v) : m_bool{v}, m_tid{TID_BOOL}, m_parent_odb{nullptr} {};
only m_bool is set for this instance meaning that only the first byte gets a value
(still having only 1 byte for bool in c++). If I check m_string inside the u_odb::get function
of this instance I am getting for a bool (I set false) stuff like 0x7f6633f67a00 and for an int
(I set the int to 1000) 0x7f66000003e8. Since the size of BOOL is larger I am getting the
wrong value. I checked this also on openSUSE having the same behavior.
Like you I am not getting this problem on my Mac. What compiler flags do you use on your Mac?
Cheers,
Marius |
24 Jun 2020, Marius Koeppel, Suggestion, ODB++ API - documantion updates and odb view after key creation
|
Hi Stefan,
now everything works well (Tested on: OpenSuse and Arch Linux) :)
Thank you for the fix.
Cheers,
Marius
> Hi Marius,
>
> thanks for your help, you identified the problematic location. I changed that to
>
> u_odb(bool v) : m_tid{TID_BOOL}, m_parent_odb{nullptr} {m_string = nullptr; m_bool = v;};
>
> which should initialize the full 8 bytes of the u_odb union. I committed to develop. Can you
> please give it a try?
>
> Best,
> Stefan
>
>
> > and looking at
> >
> > odbxx.h:
> > u_odb(bool v) : m_bool{v}, m_tid{TID_BOOL}, m_parent_odb{nullptr} {};
> >
> > only m_bool is set for this instance meaning that only the first byte gets a value
> > (still having only 1 byte for bool in c++). If I check m_string inside the u_odb::get function
> > of this instance I am getting for a bool (I set false) stuff like 0x7f6633f67a00 and for an int
> > (I set the int to 1000) 0x7f66000003e8. Since the size of BOOL is larger I am getting the
> > wrong value. I checked this also on openSUSE having the same behavior. |
|