05 Dec 2018, Konstantin Olchanski, Info, Partial refactoring of ODB code
|
The current ODB code has several structural problems and I think I now figured out how to straighten them out.
Here is the problems:
a) nested (recursive) odb locks
b) no clear separation between read-only access and read-write access
c) no clear separation between odb validation and repair functions
d) cm_msg() is called while holding a database lock
Discussion:
a) odb locks are nested because most functions lock the database, then call other functions that lock the database again. Most locking primitives - SystemV
semaphores, POSIX semaphores and mutexes - usually do not permit nested (recursive) locking.
For locking the odb shared memory we use a SystemV semaphore with recursion implemented "by hand" in ss_semaphore_wait_for(). This works ok.
For making odb thread-safe, we use POSIX mutexes, and we rely on an optional feature (PTHREAD_MUTEX_RECURSIVE) which seems to work on most OSes, but
is not required to exist and work by any standard. For example, recursive mutexes do not work in uclinux (linux for machines without an MMU).
I looked at implementing recursive mutexes "by hand", same as we have the recursive semaphores, and realized that it is quite complicated and computationally
expensive (read: inefficient). (Also I think nested and recursive locks is "not mainstream" and should rather be avoided). As an example you can see full
complexity of a nested lock as recent implementation in ROOT. (good luck finding it).
A solution for this problem is well known. All functions are separated into "unlocked" user-callable functions and "locked" internal functions. Nested locking is
naturally eliminated.
Call sequences:
db_get_key() -> db_find_key() // odb is locked twice
become
db_get_key() -> db_get_key_locked() -> db_find_key_locked() // odb is locked once
Actual implementation of this scheme turns out to be a very clean and mechanical refactoring (moving the code without changing what it does).
As a try, I refactored db_find_key() and db_get_key() and I like the result. Locking is now obvious - obscure error paths with hidden "unlock before return" - are all
gone. Extra conversions between hDB and pheader are gone.
b) in this refactoring, functions that do not (should not) modify odb become easy to identify - the pheader argument is tagged "const".
This simplifies the implementation of "write-protected" odb - instead of ad-hoc db_allow_write_locked() sprinkled everywhere, one can have obvious calls to
"db_lock_read_only()" and "db_lock_read_write()".
Separation of locks into "read" and "write" locks, in turn, improves locking behaviour - helps against problems like lock starvation - which we did see with MIDAS -
as "read" locks are much more efficient - all readers can read the data at the same time, locking is only done when somebody need to "write".
c) some db_validate() functions also try to do repair. this cannot work if validation is called from "read-only" functions like db_find_key(). I now think the "repair"
functions should be separate from "validate" functions. validate functions should detect problems, repair functions would repair them. The question remains -
when is good time to run a full repair. (probably at the time when we connect to the database - this way, simply starting "odbedit" will force a database check and
repair).
d) calls to cm_msg() when odb is locked has been a problem for a long time. because cm_msg() itself calls odb and because it also calls event buffer code
(SYSMSG buffer) which in turn call odb functions, there was trouble with deadlocks between ODB and event buffer semaphores, trouble with recursive use of
ODB, etc.
Right now we have all this partially papered over by having cm_msg() put messages into a memory buffer that we periodically flush, but I was never super happy
with that solution. For example, if we crash before the message buffer is flushed, all error messages are lost, they do not go into midas.log, they are not printed on
the screen, they are not accessible in the core dump.
To resolve this problem, I have all "locked" functions call db_msg() instead of cm_msg(). db_msg() saves the messages in a linked list which is flushed into
cm_msg() immediately after we unlock odb.
If we crash after generating an error message but before it is flushed to cm_msg(), we can still access it through the linked list inside the core dump. This is an
improvement over what we have now. Ideally, all messages should be printed to the terminal and saved to midas.log and pushed into SYSMSG, but most of this is
impractical at a moment when odb is locked - as we already know it leads to deadlocks and other trouble...
Bottom line, I now have a path to improve the odb code and to resolve some of the long standing structural problems.
K.O. |
11 Dec 2018, Stefan Ritt, Info, Partial refactoring of ODB code
|
All makes sense to me. I agree to proceed with the refactoring.
One additional comment: In the 90's when I developed this code, locking was expensive. On a decent computer you could do a couple of thousand lock operations per second before you hit the 100%
CPU limit. Therefore I tried to reduce the number of lock operation as much as possible. Like a db_find_key locks the ODB once and then goes through all keys before it unlocks again. If I would lock for
every key and have an ODB with ten thousands of keys, that would have taken very long in the old days.
Now the world has changed, we can do almost a million locks a second. So a db_get_record() does not have to obtain a whole directory in one go, but can get each value separately, and if necessary lock
the ODB on each key access. This would be slower, but only a negligible amount these days. So in the spirit of making midas more robust, we can even go a step beyond simple refactoring and change the
locking scheme if it becomes more transparent and stable.
Best,
Stefan |
26 Dec 2018, Konstantin Olchanski, Info, Partial refactoring of ODB code
|
> One additional comment: In the 90's when I developed this code, locking was expensive.
> Now the world has changed, we can do almost a million locks a second.
I am not sure this is quite true. The CPU can execute 3000 million operations per second (3GHz CPU, assuming 1 op/Hz),
so 1 lock operation is worth 3000 normal operations. Of course cache misses and branch mispredictions mess up
this simple arithmetic...
But I think cost of mutex lock/unlock can be easily measured. (hmm... now I am curious).
Bigger question is architectural, nested/recursive locks is definitely a bad thing to do (not just my opinion).
But closer to home, as I implemented "write protected" ODB, lock/unlock suddenly has to do MMU operations
(map unmap memory) and this is *very* expensive.
Also as we start doing more multithreading, lock contention is becoming a problem, and the standard solution
is to implement read-locks and write-locks. (everybody holding a read-lock can read ODB at the same time
without waiting).
So, moving in the direction of separate read and write locks and write-protected (and/or read-protected) ODB shared memory,
all points in the direction of reworking of ODB locks in the direction of removing the need for nested/recursive locks.
I think me and Stefan are in agreement here.
K.O. |
27 Dec 2018, Stefan Ritt, Info, Partial refactoring of ODB code
|
> I am not sure this is quite true. The CPU can execute 3000 million operations per second (3GHz CPU, assuming 1 op/Hz),
> so 1 lock operation is worth 3000 normal operations. Of course cache misses and branch mispredictions mess up
> this simple arithmetic...
You can try that with "t1" in odbedit. This times the number of db_get_data() calls midas can do per second. On my MacBook Pro I get 470'000
accesses per second. |
24 Sep 2018, Lars Martin, Suggestion, Self-resetting alarm class
|
I was planning to use the alarm system to display an information banner when a
certain valve is open, but I would like it to go away again when the valve is
closed.
Is there a way to achieve that? Maybe reset the alarm from an alarm script?
(Seems like a hack...)
Maybe this could be a useful feature, to be able to define an alarm class that
resets itself once the condition is no longer met? |
24 Sep 2018, Lukas Gerritzen, Suggestion, Self-resetting alarm class
|
If you run an external script anyway, you can also call "odbedit -c alarm" to
reset all alarms. Or you could try to set the "Triggered" entry of that certain
alarm to 0 (again, with odbedit), that could also work. |
25 Sep 2018, Stefan Ritt, Suggestion, Self-resetting alarm class
|
> If you run an external script anyway, you can also call "odbedit -c alarm" to
> reset all alarms. Or you could try to set the "Triggered" entry of that certain
> alarm to 0 (again, with odbedit), that could also work.
That would not really help, because you cannot trigger a script AFTER an alarm occurred. Having
"self-resetting" alarms is actually not a bad idea. I could add a flag "Auto reset" which is false by
default and can be set to true for this functionality. Will keep that in mind for the next
development cycle.
Stefan |
26 Dec 2018, Konstantin Olchanski, Suggestion, Self-resetting alarm class
|
> > If you run an external script anyway, you can also call "odbedit -c alarm" to
> > reset all alarms. Or you could try to set the "Triggered" entry of that certain
> > alarm to 0 (again, with odbedit), that could also work.
>
> That would not really help, because you cannot trigger a script AFTER an alarm occurred. Having
> "self-resetting" alarms is actually not a bad idea. I could add a flag "Auto reset" which is false by
> default and can be set to true for this functionality. Will keep that in mind for the next
> development cycle.
>
I second, this is a good idea. Sometimes I want "sticky" alarms that stay on to indicate that a bad thing happened in the
past, sometimes I want self-resetting alarms that go away when a bad thing turns back into a good thing.
When I do this in a frontend, I manually trigger the alarm and manually clear the alarm, i.e. you can see this
done in ~addaq/online/src/fectrl.cxx
Use al_trigger_alarm() and al_reset_alarm().
This can also be done through the json-rpc interface - both calls are available as rpc commands - and so easy to use
from javascript. (but there is no simple unix command line tool to issue json-rpc requests. ouch. must write one now.)
K.O. |
25 Sep 2018, Stefan Ritt, Suggestion, Self-resetting alarm class
|
> I was planning to use the alarm system to display an information banner when a
> certain valve is open, but I would like it to go away again when the valve is
> closed.
> Is there a way to achieve that? Maybe reset the alarm from an alarm script?
> (Seems like a hack...)
> Maybe this could be a useful feature, to be able to define an alarm class that
> resets itself once the condition is no longer met?
Actually you can implement such a thing already now pretty quickly using custom javascript on
the status page. Just read the valve state regularly from the ODB and dynamically modify the
status page to show or hide a banner. Look how custom pages work in midas and try to apply
this to the status page status.html which you find in the resources directory.
Stefan |
24 Sep 2018, Devin Burke, Forum, Implementing MIDAS on a Satellite
|
Hello Everybody,
I am a member of a satellite team with a scientific payload and I am considering
coordinating the payload using MIDAS. This looks to be challenging since MIDAS
would be implemented on an Xilinx Spartan 6 FPGA with minimal hardware
resources. The idea would be to install a soft processor on the Spartan 6 and
run MIDAS through UCLinux either on the FPGA or boot it from SPI Flash. Does
anybody have any comments on how feasible this would be or perhaps have
experience implementing a similar system?
-Devin |
25 Sep 2018, Stefan Ritt, Forum, Implementing MIDAS on a Satellite
|
> Hello Everybody,
>
> I am a member of a satellite team with a scientific payload and I am considering
> coordinating the payload using MIDAS. This looks to be challenging since MIDAS
> would be implemented on an Xilinx Spartan 6 FPGA with minimal hardware
> resources. The idea would be to install a soft processor on the Spartan 6 and
> run MIDAS through UCLinux either on the FPGA or boot it from SPI Flash. Does
> anybody have any comments on how feasible this would be or perhaps have
> experience implementing a similar system?
>
> -Devin
While some people successfully implemented a midas *client* in an FPGA softcore, the full midas
backend would probably not fit into a Spartan 6. Having done some FPGA programming and
working on satellites, I doubt that midas would be well suited for such an environment. It's
probably some kind of overkill. The complete GUI is likely useless since you want to minimize your
communication load on the satellite link.
Stefan |
25 Sep 2018, Devin Burke, Forum, Implementing MIDAS on a Satellite
|
> > Hello Everybody,
> >
> > I am a member of a satellite team with a scientific payload and I am considering
> > coordinating the payload using MIDAS. This looks to be challenging since MIDAS
> > would be implemented on an Xilinx Spartan 6 FPGA with minimal hardware
> > resources. The idea would be to install a soft processor on the Spartan 6 and
> > run MIDAS through UCLinux either on the FPGA or boot it from SPI Flash. Does
> > anybody have any comments on how feasible this would be or perhaps have
> > experience implementing a similar system?
> >
> > -Devin
>
> While some people successfully implemented a midas *client* in an FPGA softcore, the full midas
> backend would probably not fit into a Spartan 6. Having done some FPGA programming and
> working on satellites, I doubt that midas would be well suited for such an environment. It's
> probably some kind of overkill. The complete GUI is likely useless since you want to minimize your
> communication load on the satellite link.
>
> Stefan
Thank you for your comment Stefan. We do have some hardware resources on the board such as RAM, ROM and
Flash storage so we wouldn't necessarily have to virtualize everything. Ideally we would like a
completed and compressed file to be produced on board and regularly sent back to ground without
requiring remote access. MIDAS is appealing to us because its easily automated but we wouldn't
necessarily need functions like a GUI or web interface. Part of the discussion now is whether or not a
microblaze processor would be sufficient or if we need a dedicted ARM processor.
Devin |
26 Dec 2018, Konstantin Olchanski, Forum, Implementing MIDAS on a Satellite
|
>
> Thank you for your comment Stefan. We do have some hardware resources on the board such as RAM, ROM and
> Flash storage so we wouldn't necessarily have to virtualize everything. Ideally we would like a
> completed and compressed file to be produced on board and regularly sent back to ground without
> requiring remote access. MIDAS is appealing to us because its easily automated but we wouldn't
> necessarily need functions like a GUI or web interface. Part of the discussion now is whether or not a
> microblaze processor would be sufficient or if we need a dedicted ARM processor.
>
Hi, just recently I got a midas frontend to build and run on uclinux on a microblaze arm CPU (GRIFFIN CDM VME board).
It worked, but uncovered many problems inside midas - uclinux has no mmu, no multithreading, no recursive mutexes, no
some of the other stuff assumed always available.
The worst problem I ran into was with uclinux giving us a very small stack so code like "int main() { char buf[10*1024]; }
crashes right away and there is a lot of code like this in midas.
My feeling about the xilinx soft-core CPU, if you can run uclinux, you can also run a midas frontend. We do not require
memory beyond that needed to store one or two of your data events.
By design, the midas library can be built in a "minimal" configuration that only supports a frontend connected
to the mserver (no local ODB, no local event buffers, no local mhttpd/mlogger, etc).
As you have seen in the Makefile, there are provisions for cross-compilation and I cross-compile midas things quite often.
On the other side, if you have xilinx FPGA with build-in PowerPC CPU, most definitely you can run full linux
and you can run full midas on it, we have done this for the T2K/ND280 experiment in Japan.
K.O. |
24 Oct 2018, Ryu Sawada, Info, bm_receive_event timeout in ROME
|
Hi all
There is a bug report in the ROME repository which says bm_receive_event timeouts.
https://bitbucket.org/muegamma/rome3/issues/8/rome-with-midas-produces-timeout-after
Does anybody have any ideas what could causing the problem ?
Ryu |
26 Dec 2018, Konstantin Olchanski, Info, bm_receive_event timeout in ROME
|
> There is a bug report in the ROME repository which says bm_receive_event timeouts.
> https://bitbucket.org/muegamma/rome3/issues/8/rome-with-midas-produces-timeout-after
> Does anybody have any ideas what could causing the problem ?
There could be a problem with causing bm_receive_event() to wait for an event for a time longer than
the rpc timeout. This rings a very small bell for me. But I do not remember the details.
As I now go through the midas event buffer code, I will check that bm_receive_event() connected
through the mserver has correctly working timeouts.
Thank you for reminding me about this difficulty.
K.O. |
18 Dec 2018, Konstantin Olchanski, Info, mxml update
|
the mxml library was updated to make it thread-safe.
https://bitbucket.org/tmidas/mxml/src/master/
I also take an opportunity to remind all to update your copy to the latest version
as I just stumbled on old bug that I fixed 1 year ago (crash of mlogger)
but forgot to update all and every of my copies of mxml.
I also looked at the xml encoder and I see that it has several places where it may
truncate the data, but none of these places can cause truncation of ODB data
because the fixed-size internal buffers are big enough to hold the longest
values sent by the odb xml encoder.
K.O. |
30 Oct 2018, Joseph McKenna, Bug Report, Side panel auto-expands when history page updates
|
One can collapse the side panel when looking at history pages with the button in
the top left, great! We want to see many pages so screen real estate is important
The issue we face is that when the page refreshes, the side panel expands. Can
we make the panel state more 'sticky'?
Many thanks
Joseph (ALPHA)
Version: 2.1
Revision: Mon Mar 19 18:15:51 2018 -0700 - midas-2017-07-c-197-g61fbcd43-dirty
on branch feature/midas-2017-10 |
31 Oct 2018, Stefan Ritt, Bug Report, Side panel auto-expands when history page updates
|
>
>
> One can collapse the side panel when looking at history pages with the button in
> the top left, great! We want to see many pages so screen real estate is important
>
> The issue we face is that when the page refreshes, the side panel expands. Can
> we make the panel state more 'sticky'?
>
> Many thanks
> Joseph (ALPHA)
>
> Version: 2.1
> Revision: Mon Mar 19 18:15:51 2018 -0700 - midas-2017-07-c-197-g61fbcd43-dirty
> on branch feature/midas-2017-10
Hi Joseph,
In principle a page refresh should now not be necessary, since pages should reload automatically
the contents which changes. If a custom page needs a reload, it is not well designed. If necessary, I
can explain the details.
Anyhow I implemented your "stickyness" of the side panel in the last commit to the develop branch.
Best regards,
Stefan |
31 Oct 2018, Joseph McKenna, Bug Report, Side panel auto-expands when history page updates
|
> >
> >
> > One can collapse the side panel when looking at history pages with the button in
> > the top left, great! We want to see many pages so screen real estate is important
> >
> > The issue we face is that when the page refreshes, the side panel expands. Can
> > we make the panel state more 'sticky'?
> >
> > Many thanks
> > Joseph (ALPHA)
> >
> > Version: 2.1
> > Revision: Mon Mar 19 18:15:51 2018 -0700 - midas-2017-07-c-197-g61fbcd43-dirty
> > on branch feature/midas-2017-10
>
> Hi Joseph,
>
> In principle a page refresh should now not be necessary, since pages should reload automatically
> the contents which changes. If a custom page needs a reload, it is not well designed. If necessary, I
> can explain the details.
>
> Anyhow I implemented your "stickyness" of the side panel in the last commit to the develop branch.
>
> Best regards,
> Stefan
Hi Stefan,
I apologise for miss using the word refresh. The re-appearing sidebar was also seen with the automatic
reload, I have implemented your fix here and it now works great!
Thank you very much!
Joseph |
02 Nov 2018, Stefan Ritt, Bug Report, Side panel auto-expands when history page updates
|
> I apologise for miss using the word refresh. The re-appearing sidebar was also seen with the automatic
> reload, I have implemented your fix here and it now works great!
Still did not get your point. Why is there "automatic reload"? The status page should not "completely reload" any more.
Instead, all data is fetched in the background using AJAX calls, and only the data on the page is updated once per second. If
there is a "complete reload", something is wrong.
Stefan |
02 Nov 2018, Thomas Lindner, Bug Report, Side panel auto-expands when history page updates
|
> > I apologise for miss using the word refresh. The re-appearing sidebar was also seen with the automatic
> > reload, I have implemented your fix here and it now works great!
>
> Still did not get your point. Why is there "automatic reload"? The status page should not "completely reload" any more.
> Instead, all data is fetched in the background using AJAX calls, and only the data on the page is updated once per second.
If
> there is a "complete reload", something is wrong.
Joseph's original message says that the problem is with the standard MIDAS history page, which currently use a complete reload
when refreshing. Of course we are planning to update this history pages to only grab what it needs (as well as changing the
plotting to use newer HTML plotting). But until that upgrade happens your fix is helpful for the history page. |
02 Nov 2018, Stefan Ritt, Bug Report, Side panel auto-expands when history page updates
|
> Joseph's original message says that the problem is with the standard MIDAS history page, which currently use a complete reload
> when refreshing. Of course we are planning to update this history pages to only grab what it needs (as well as changing the
> plotting to use newer HTML plotting). But until that upgrade happens your fix is helpful for the history page.
Ok, now I understand, and of course I agree with you.
Stefan |
11 Sep 2018, Francesco Renga, Forum, Launching an executable script from the sequencer
|
Dear experts,
is there any way to launch an executable script on the host computer from the MIDAS
sequencer? If not, is there any interest to develop such a feature?
Thank you,
Francesco |
11 Sep 2018, Pierre Gorel, Forum, Launching an executable script from the sequencer
|
> Dear experts,
> is there any way to launch an executable script on the host computer from the MIDAS
> sequencer? If not, is there any interest to develop such a feature?
>
> Thank you,
> Francesco
The SCRIPT command will do that (on the machine running MIDAS). I know it works with either python or
bash scripts. I tried without success to pass the parameters and I went around by setting ODB entries
prior to running the script and then access to them within the script. |
11 Sep 2018, Stefan Ritt, Forum, Launching an executable script from the sequencer
|
> > Dear experts,
> > is there any way to launch an executable script on the host computer from the MIDAS
> > sequencer? If not, is there any interest to develop such a feature?
> >
> > Thank you,
> > Francesco
>
> The SCRIPT command will do that (on the machine running MIDAS). I know it works with either python or
> bash scripts. I tried without success to pass the parameters and I went around by setting ODB entries
> prior to running the script and then access to them within the script.
Passing parameters should work. If it's confirmed to be broken, I'm willing to fix it.
Stefan |
28 Aug 2018, Lukas Gerritzen, Bug Report, Deleting Links in ODB via mhttpd
|
Asume you have a variable foo and a link bar -> foo. When you go to the ODB in
mhttpd, click "Delete" and select bar, it actually deletes foo. bar stays,
stating "<cannot resolve link>". Trying the same in odbedit with rm gives the
expected result (bar is gone, foo is still there).
I'm on the develop branch. |
28 Aug 2018, Konstantin Olchanski, Bug Report, Deleting Links in ODB via mhttpd
|
> Asume you have a variable foo and a link bar -> foo. When you go to the ODB in
> mhttpd, click "Delete" and select bar, it actually deletes foo. bar stays,
> stating "<cannot resolve link>". Trying the same in odbedit with rm gives the
> expected result (bar is gone, foo is still there).
>
> I'm on the develop branch.
I think I can confirm this. Created a bug report on bitbucket:
https://bitbucket.org/tmidas/midas/issues/148/mhttpd-odb-editor-deletes-wrong-symlink
K.O. |
29 Aug 2018, Stefan Ritt, Bug Report, Deleting Links in ODB via mhttpd
|
> > Asume you have a variable foo and a link bar -> foo. When you go to the ODB in
> > mhttpd, click "Delete" and select bar, it actually deletes foo. bar stays,
> > stating "<cannot resolve link>". Trying the same in odbedit with rm gives the
> > expected result (bar is gone, foo is still there).
> >
> > I'm on the develop branch.
>
> I think I can confirm this. Created a bug report on bitbucket:
>
> https://bitbucket.org/tmidas/midas/issues/148/mhttpd-odb-editor-deletes-wrong-symlink
>
> K.O.
I fixed this and committed the change. Took me a while since it was in KO's code.
Stefan |
29 Aug 2018, Konstantin Olchanski, Forum, midas forum mail relay changed to smtp.triumf.ca
|
Per changes at TRIUMF, the MIDAS forum mail relay was changed from trmail.triumf.ca to
smtp.triumf.ca. K.O. |
21 Aug 2018, Wes Gohn, Bug Report, mserver problem
|
Hi. We've just updated our midas installation to the newest version, and we now see repeated errors from the
mserver in messages. Mostly we see
11:17:02.994 2018/08/21 [ODBEdit,TALK] Program mserver restarted
which happens 2-3 times per minute.
We have also been seeing occasional dropped rpc connections to our frontends, which could be related.
The version we were running with previously was ~1 year old, and we have just updated to the newest version
on bitbucket.
Thanks,
Wes |
28 Aug 2018, Konstantin Olchanski, Bug Report, mserver problem
|
> Hi. We've just updated our midas installation to the newest version, and we now see repeated errors from the
> mserver in messages. Mostly we see
>
> 11:17:02.994 2018/08/21 [ODBEdit,TALK] Program mserver restarted
>
> which happens 2-3 times per minute.
>
> We have also been seeing occasional dropped rpc connections to our frontends, which could be related.
>
> The version we were running with previously was ~1 year old, and we have just updated to the newest version
> on bitbucket.
Hmm... usually mserver will not restart automatically, maybe you have set it to autorestart on ODB (/programs/mserver/auto_restart
set to "y").
It would be unusual for the main mserver program to crash, to debug it, you will need to run it in a terminal
and see if there is any error messages. Even better to run it in a terminal inside "gdb" and capture the stack trace
when it crashes.
Anyhow, crash of main mserver will not cause "dropped rpc connections" to clients - this would require for their
individual mserver subprocesses to crash. Such crashes would be highly unusual and are harder to debug.
Perhaps for the crashes you see there is some error messages in midas.log?
K.O. |
24 Aug 2018, Lukas Gerritzen, Forum, Int64 datatype
|
I would like to store the address of 1-Wire temperature sensors in a device
driver. However, the supportet data types (as definded around
include/midas.h:311) do not foresee a type large enough.
Is there a good reason against this?
I know that other experiments use this kind of sensor, how do you store the
addresses? I've noticed that most of the address is just zeroes, but I wouldn't
like to store just half the address, assuming that half the address is always
zeroes. |
25 Aug 2018, Stefan Ritt, Forum, Int64 datatype
|
> I would like to store the address of 1-Wire temperature sensors in a device
> driver. However, the supportet data types (as definded around
> include/midas.h:311) do not foresee a type large enough.
>
> Is there a good reason against this?
>
> I know that other experiments use this kind of sensor, how do you store the
> addresses? I've noticed that most of the address is just zeroes, but I wouldn't
> like to store just half the address, assuming that half the address is always
> zeroes.
Well, when this code was written, computers had 640kB and operating systems had 16 bit. What
you can do for your 1-wire sensor is to store the address in two values, one 32-bit LSB and one
32-bit MSB. Or store it in a string with hex representation.
Stefan |
28 Aug 2018, Konstantin Olchanski, Forum, Int64 datatype
|
> I would like to store the address of 1-Wire temperature sensors in a device
> driver. However, the supportet data types (as definded around
> include/midas.h:311) do not foresee a type large enough.
>
Hmm... you do not say what sensor you use and how many bits you actually need.
For up to 32 bits you can use TID_DWORD (uint32_t) (obviously)
For up to 48 bits (or so), you can use TID_DOUBLE (double) (wierd, but IEEE754 double precision variables would work as 48-bit (or so) integers).
For more, I would use arrays of TID_DWORD (64 bits, store low 32 bits into a[0], high bits into a[1]).
>
> Is there a good reason against this?
>
We had requests for implementing uint64_t 64-bit data types in MIDAS before. There are two problems:
a) in the MIDAS data banks, there is a problem with the bank header definition which only has 3 DWORDSs so causes
each alternating data bank to be 64-bit misaligned. And misaligned 64-bit data is very bad.
b) in ODB, 64-bit data support will need to be added from scratch and again it is not clear without doing it
if there will be any alignement problems. If one were to implement ODB from scratch, one would have everything
aligned to 64-bits or maybe even 128-bits, with uint64_t fully supported.
It is unlikely this kind of work will ever be done on ODB, but who knows.
> I know that other experiments use this kind of sensor, how do you store the
> addresses? I've noticed that most of the address is just zeroes, but I wouldn't
> like to store just half the address, assuming that half the address is always
> zeroes.
Cannot answer without knowing what sensor you use, but certainly you can use an array of bytes
or an array of integers to store arbitrarily long addresses. You can also use a TID_STRING
and store the address as a text string "0xabcdabcdabcdabcd" of arbitrary length.
K.O. |
28 Aug 2018, Lukas Gerritzen, Forum, Problems with virtual history events
|
Hi,
I am trying to set up virtual history events following
https://midas.triumf.ca/MidasWiki/index.php/History_System#Virtual_History_Event
Trying it the first way, using the following setup:
Key name Type #Val Size Last Opn Mode Value
---------------------------------------------------------------------------
Links DIR
dirlink -> External/dir KEY 1 12 >99d 0 RWD <subdirectory>
Key name Type #Val Size Last Opn Mode Value
---------------------------------------------------------------------------
External DIR
dir DIR
foo FLOAT 1 4 16s 0 RWD 12.5
Then I get the following error message:
==================== History link "dirlink", ID 28150 =======================
[Logger,ERROR] [mlogger.cxx:4942:open_history,ERROR] History event dirlink has
no variables in ODB
Trying the second way, I set up the following:
Key name Type #Val Size Last Opn Mode Value
---------------------------------------------------------------------------
Links DIR
dir DIR
testlink -> External/foo
FLOAT 1 4 8m 0 RWD 5.2
Key name Type #Val Size Last Opn Mode Value
---------------------------------------------------------------------------
External DIR
foo FLOAT 1 4 6m 0 RWD 5.2
Starting mlogger in verbose mode yields the following error:
==================== History link "dir", ID 28150 =======================
[Logger,ERROR] [mlogger.cxx:4935:open_history,ERROR] History link
/History/Links/dir/testlink is invalid
Error in history system, aborting startup.
I'm not sure if this is a bug or just a case of PEBCAK.
Finally, to set the update period, do I need entries in /history/links periods
with the tag name? Is there a way to only write them in the history file when
they change? I want to use the virtual history events for measurements I get
from external scripts, some periodic, some manual.
Thanks |
28 Aug 2018, Konstantin Olchanski, Forum, Problems with virtual history events
|
Hi, what you try should have worked. Perhaps your symlink is wrong and should say "/External/..." (with a leading slash). The "links period" would have
worked same as equipment/common/history period - as a rate limiter.
Anyhow, I suggest another way to do the same - create a fake equipment - the logger does
not care if the equipment is real or not and if you write into /eq/fake/variables from a proper frontend
or from a script. To hide the fake equipment from the status page, set /eq/fake/common/hidden to "true".
This will work for sure.
K.O.
> Hi,
> I am trying to set up virtual history events following
> https://midas.triumf.ca/MidasWiki/index.php/History_System#Virtual_History_Event
>
> Trying it the first way, using the following setup:
> Key name Type #Val Size Last Opn Mode Value
> ---------------------------------------------------------------------------
> Links DIR
> dirlink -> External/dir KEY 1 12 >99d 0 RWD <subdirectory>
>
>
> Key name Type #Val Size Last Opn Mode Value
> ---------------------------------------------------------------------------
> External DIR
> dir DIR
> foo FLOAT 1 4 16s 0 RWD 12.5
>
>
> Then I get the following error message:
> ==================== History link "dirlink", ID 28150 =======================
> [Logger,ERROR] [mlogger.cxx:4942:open_history,ERROR] History event dirlink has
> no variables in ODB
>
>
> Trying the second way, I set up the following:
> Key name Type #Val Size Last Opn Mode Value
> ---------------------------------------------------------------------------
> Links DIR
> dir DIR
> testlink -> External/foo
> FLOAT 1 4 8m 0 RWD 5.2
>
> Key name Type #Val Size Last Opn Mode Value
> ---------------------------------------------------------------------------
> External DIR
> foo FLOAT 1 4 6m 0 RWD 5.2
>
>
> Starting mlogger in verbose mode yields the following error:
> ==================== History link "dir", ID 28150 =======================
> [Logger,ERROR] [mlogger.cxx:4935:open_history,ERROR] History link
> /History/Links/dir/testlink is invalid
> Error in history system, aborting startup.
>
> I'm not sure if this is a bug or just a case of PEBCAK.
>
> Finally, to set the update period, do I need entries in /history/links periods
> with the tag name? Is there a way to only write them in the history file when
> they change? I want to use the virtual history events for measurements I get
> from external scripts, some periodic, some manual.
>
> Thanks |
21 Jul 2018, Hiroaki Natori, Forum, Question about distributing event builder function on remote PC
|
Dear expert,
I'm going to develop MIDAS DAQ for COMET experiment.
I'm thinking to distribute the load of event building to different PCs.
I attach a schematics of one of the examples of the design.
Please tell me how can I accomplish a kind of "sub-EventBuilder".
I'm reading the midas code to understand the scheme of MIDAS but
it is a lot and I want to know which one to focus on.
Can I do it writing user function based on either "mfe.c" or "mevb.c"?
Frontend program with multithread equipment is the one to do?
Or should I modify the original midas files?
Best regards,
Hiroaki Natori |
23 Jul 2018, Konstantin Olchanski, Forum, Question about distributing event builder function on remote PC
|
> I'm going to develop MIDAS DAQ for COMET experiment.
> I'm thinking to distribute the load of event building to different PCs.
> I attach a schematics of one of the examples of the design.
Your schematic is reminiscent of the T2K/ND280 structure where the MIDAS DAQ
was split into several separate MIDAS instances (separate "experiments": the FGD, the TPC,
the slow controls, etc).
They were joined together by the "cascade" equipment which provided a path
for the data events to flow from subsidiary midas instances to the main system (the one
with the final mlogger). It also provided a reverse path for run control, where starting
a run in the main experiment also started the run in all the subsidiary experiments.
This cascade frontend was never included in the midas distribution (an oversight),
but I still have the code for it somewhere.
How many "frontend PC" components do you envision? (10, 100, 1000?).
In T2K/ND280, each subsidiary experiment had it's own ODB which made sense
because e.g. the FGD and the TPC were quite different and were managed by different
groups.
But for you it probably makes sense to have one common ODB. This means a MIDAS
structure where ODB is located on the main computer ("event builder PC"),
all others connect to it via the mserver and midas rpc.
But you will need to have the MIDAS shared event buffers on each "frontend PC" to be local,
which means the bm_xxx() functions have run locally instead of throuhg the mserver rpc.
This is not how midas works right now, but it could be modified to do this.
On the other hand, you do not have to use midas to write the "frontend pc" code. Today's
C++ provides enough features - threads, locks, mutexes, shared memories, event queues,
etc so you can write the whole sub-event builder as one monolithic c++ program
and use midas only to send the data to the main event builder. (plus midas rpc to handle
run control). In this scheme, technically, this "frontend pc" program would
be a multithreaded midas frontend.
K.O. |
28 Jul 2018, Hiroaki Natori, Forum, Question about distributing event builder function on remote PC
|
Dear Mr. Olchanski
Thank you for your comment.
We exect the number of readout channels is ~1000, boards ~100 and the frontend pc <10.
We expect that trigger rate is a few kHz.
Writing monolithic c++ code may need complete understanding on midas,
and I will consider more about writing from scratch or modifying midas code.
Best regards
Hiroaki Natori
> > I'm going to develop MIDAS DAQ for COMET experiment.
> > I'm thinking to distribute the load of event building to different PCs.
> > I attach a schematics of one of the examples of the design.
>
> Your schematic is reminiscent of the T2K/ND280 structure where the MIDAS DAQ
> was split into several separate MIDAS instances (separate "experiments": the FGD, the TPC,
> the slow controls, etc).
>
> They were joined together by the "cascade" equipment which provided a path
> for the data events to flow from subsidiary midas instances to the main system (the one
> with the final mlogger). It also provided a reverse path for run control, where starting
> a run in the main experiment also started the run in all the subsidiary experiments.
>
> This cascade frontend was never included in the midas distribution (an oversight),
> but I still have the code for it somewhere.
>
> How many "frontend PC" components do you envision? (10, 100, 1000?).
>
> In T2K/ND280, each subsidiary experiment had it's own ODB which made sense
> because e.g. the FGD and the TPC were quite different and were managed by different
> groups.
>
> But for you it probably makes sense to have one common ODB. This means a MIDAS
> structure where ODB is located on the main computer ("event builder PC"),
> all others connect to it via the mserver and midas rpc.
>
> But you will need to have the MIDAS shared event buffers on each "frontend PC" to be local,
> which means the bm_xxx() functions have run locally instead of throuhg the mserver rpc.
> This is not how midas works right now, but it could be modified to do this.
>
> On the other hand, you do not have to use midas to write the "frontend pc" code. Today's
> C++ provides enough features - threads, locks, mutexes, shared memories, event queues,
> etc so you can write the whole sub-event builder as one monolithic c++ program
> and use midas only to send the data to the main event builder. (plus midas rpc to handle
> run control). In this scheme, technically, this "frontend pc" program would
> be a multithreaded midas frontend.
>
> K.O. |
04 May 2018, Francesco Renga, Forum, ODB full
|
Dear expert,
I'm developing a frontend and I'm getting this kind of error at each event:
10:14:56.564 2018/05/04 [Sample Frontend,ERROR] [odb.c:5911:db_set_data1,ERROR]
online database full
If I run the mem command in odbedit I get the result at the end of this post.
Notice that I need to use an event size which is significantly larger than the
default one. I don't know if it is relevant for this error. I have in the ODB:
/Experiment/MAX_EVENT_SIZE = 900000000
and in the frontend code:
/* maximum event size produced by this frontend */
INT max_event_size = 300000000;
/* maximum event size for fragmented events (EQ_FRAGMENTED) */
INT max_event_size_frag = 5 * 1024 * 1024;
/* buffer size to hold events */
INT event_buffer_size = 600000000;
Events seem to be properly stored in the output files, but I'm afraid I could
get some other problem.
Thank you for your help,
Francesco
-------------------------------------------------------------------------
Database header size is 0x21040, all following values are offset by this!
Key area 0x00000000 - 0x0007FFFF, size 524288 bytes
Data area 0x00080000 - 0x00100000, size 524288 bytes
Keylist:
--------
Free block at 0x00000B58, size 0x00000008, next 0x000053E0
Free block at 0x000053E0, size 0x00000008, next 0x00006560
Free block at 0x00006560, size 0x00079AA0, next 0x00000000
Free Key area: 498352 bytes out of 524288 bytes
Data:
-----
Free block at 0x000847F0, size 0x0007B810, next 0x00000000
Free Data area: 505872 bytes out of 524288 bytes
Free: 498352 (95.1%) keylist, 505872 (96.5%) data |
04 May 2018, Stefan Ritt, Forum, ODB full
|
Two options:
1) Do NOT send your events into the ODB. This is controlled via the flag RO_ODB in your frontend setting. For simple experiments with small events, it might make sense to copy each
event into the ODB for debugging, but if you have large events, this does not make sense. Use the "mdump" utility to check your events instead.
2) Increase the size of the ODB. See the first FAQ here: https://midas.triumf.ca/MidasWiki/index.php/FAQ
Stefan
> Dear expert,
> I'm developing a frontend and I'm getting this kind of error at each event:
>
> 10:14:56.564 2018/05/04 [Sample Frontend,ERROR] [odb.c:5911:db_set_data1,ERROR]
> online database full
>
> If I run the mem command in odbedit I get the result at the end of this post.
>
> Notice that I need to use an event size which is significantly larger than the
> default one. I don't know if it is relevant for this error. I have in the ODB:
>
> /Experiment/MAX_EVENT_SIZE = 900000000
>
> and in the frontend code:
>
> /* maximum event size produced by this frontend */
> INT max_event_size = 300000000;
>
> /* maximum event size for fragmented events (EQ_FRAGMENTED) */
> INT max_event_size_frag = 5 * 1024 * 1024;
>
> /* buffer size to hold events */
> INT event_buffer_size = 600000000;
>
> Events seem to be properly stored in the output files, but I'm afraid I could
> get some other problem.
>
> Thank you for your help,
> Francesco
>
> -------------------------------------------------------------------------
>
> Database header size is 0x21040, all following values are offset by this!
> Key area 0x00000000 - 0x0007FFFF, size 524288 bytes
> Data area 0x00080000 - 0x00100000, size 524288 bytes
>
> Keylist:
> --------
> Free block at 0x00000B58, size 0x00000008, next 0x000053E0
> Free block at 0x000053E0, size 0x00000008, next 0x00006560
> Free block at 0x00006560, size 0x00079AA0, next 0x00000000
>
> Free Key area: 498352 bytes out of 524288 bytes
>
> Data:
> -----
> Free block at 0x000847F0, size 0x0007B810, next 0x00000000
>
> Free Data area: 505872 bytes out of 524288 bytes
>
> Free: 498352 (95.1%) keylist, 505872 (96.5%) data |
20 Jul 2018, Konstantin Olchanski, Forum, ODB full
|
Concurrence.
Normally, MIDAS data events are saved to ODB (via RO_ODB into /eq/xxx/variables) to make them go into the midas history (/eq/xxx/common/history > 0).
If you do not want events to go into the history, but still want them saved to ODB, it should work (as long as ODB itself
is big enough), but you may run into other problems, specifically ODB free space fragmentation, when no matter how big ODB is, there is never
enough continuous free space for saving a large event. If it happens you will also see random "odb full" errors.
K.O.
> Two options:
>
> 1) Do NOT send your events into the ODB. This is controlled via the flag RO_ODB in your frontend setting. For simple experiments with small events, it might make sense to copy each
> event into the ODB for debugging, but if you have large events, this does not make sense. Use the "mdump" utility to check your events instead.
>
> 2) Increase the size of the ODB. See the first FAQ here: https://midas.triumf.ca/MidasWiki/index.php/FAQ
>
> Stefan
>
>
> > Dear expert,
> > I'm developing a frontend and I'm getting this kind of error at each event:
> >
> > 10:14:56.564 2018/05/04 [Sample Frontend,ERROR] [odb.c:5911:db_set_data1,ERROR]
> > online database full
> >
> > If I run the mem command in odbedit I get the result at the end of this post.
> >
> > Notice that I need to use an event size which is significantly larger than the
> > default one. I don't know if it is relevant for this error. I have in the ODB:
> >
> > /Experiment/MAX_EVENT_SIZE = 900000000
> >
> > and in the frontend code:
> >
> > /* maximum event size produced by this frontend */
> > INT max_event_size = 300000000;
> >
> > /* maximum event size for fragmented events (EQ_FRAGMENTED) */
> > INT max_event_size_frag = 5 * 1024 * 1024;
> >
> > /* buffer size to hold events */
> > INT event_buffer_size = 600000000;
> >
> > Events seem to be properly stored in the output files, but I'm afraid I could
> > get some other problem.
> >
> > Thank you for your help,
> > Francesco
> >
> > -------------------------------------------------------------------------
> >
> > Database header size is 0x21040, all following values are offset by this!
> > Key area 0x00000000 - 0x0007FFFF, size 524288 bytes
> > Data area 0x00080000 - 0x00100000, size 524288 bytes
> >
> > Keylist:
> > --------
> > Free block at 0x00000B58, size 0x00000008, next 0x000053E0
> > Free block at 0x000053E0, size 0x00000008, next 0x00006560
> > Free block at 0x00006560, size 0x00079AA0, next 0x00000000
> >
> > Free Key area: 498352 bytes out of 524288 bytes
> >
> > Data:
> > -----
> > Free block at 0x000847F0, size 0x0007B810, next 0x00000000
> >
> > Free Data area: 505872 bytes out of 524288 bytes
> >
> > Free: 498352 (95.1%) keylist, 505872 (96.5%) data |
05 Jun 2018, Frederik Wauters, Forum, strings in sqlite
|
I am setting up a sqlite db to serve as a run database.
The easiest option is to use the history sqlite feature, and add run information
as virtual history events
however:
Invalid tag 0 'Comment' in event 21 'Run Parameters': cannot do history for
TID_STRING data, sorry!
I'd like to save e.g. the edit on start information , with shift crew checks.
Would it be easy to allow for text, or is this inherent to the history system
handling binary data? |
20 Jul 2018, Konstantin Olchanski, Forum, strings in sqlite
|
> Invalid tag 0 'Comment' in event 21 'Run Parameters': cannot do history for
> TID_STRING data, sorry!
The original MIDAS history API does not have provisions for storing TID_STRING data,
it is a very unfortunate limitation that has been with us for a very long time.
If I ever get around to rewrite the MIDAS history API, I will definitely add support for TID_STRING data.
But not today.
K.O.
P.S. Support for arbitrary binary blobs is also possible, but this will make the midas history
a kind of "a daq inside the daq" thing, probably we do not want to go this direction.
K.O. |
20 Jul 2018, Konstantin Olchanski, Info, ROOT I/O workshop notable
|
The ROOT I/O workshop was held on June 20th at CERN. A few things of interest in MIDAS land:
- LZ4 is now used as default compression (replacing gzip-1)
- JSON class streamer is finally implemented (XML streamer updated/reworked)
- recursive read-write lock class implemented
- do not see any special mention of Javascript I/O or jsroot, but jsroot git repo seems to be quite active
Of these the recursive read-write lock is most interesting - using something similar would improve ODB performance
and presumably fix the existing lock fairness problems.
https://root.cern.ch/doc/master/TReentrantRWLock_8hxx_source.html
https://indico.cern.ch/event/715802/contributions/2942560/attachments/1670191/2680682/ROOT_IO_June_Workshop_v2.pdf
https://github.com/root-project/jsroot
K.O. |
03 Jul 2018, Frederik Wauters, Forum, mlogger? jamming
|
We run as follows:
* sis3316 digitizers in a vme crate
* 1-2 midas events /s
* data rate at 20 MB/s
At a rate of 30 MB/s the daq crashed because the I think the mlogger can`t follow:
* it runs at 100% cpu
* memory usage of mlogger process goes from 2% to 15%
* All other processes < 50 % cpu and < 20% RAM
Both the vme frontend and the mlogger crash about 2.5 minutes into a run. Both
the logger and vme fe spit out:
bm_validate_client_pointers: Assertion `pclient->read_pointer >= 0 &&
pclient->read_pointer <= pheader->size' failed.
Aborted
I first thought that writing-to-disk could be a bottle neck. But when I write to
an SSD, same thing.
Is there another bottleneck which keeps the mlogger busy? |
22 Jun 2018, Frederik Wauters, Forum, custom script on custom page
|
I am implementing buttons to launch scripts from a custom page.
The simple way works, i.e.
<input type=submit name=customscript value="run_script">
But I want to stay on the page. Copying "Customscript button without a page
reload" from https://midas.triumf.ca/MidasWiki/index.php/Custom_Page_Features
yields the following error:
Uncaught ReferenceError: XMLHttpRequestGeneric is not defined
at cs_button (Trend:165)
at HTMLInputElement.onclick (Trend:90)
I included <script src="mhttpd.js"></script> and call mhttpd_init on page load.
So why can`t it run this ajax request?
Or is there a better way to launch a script without messing up the page |
22 Jun 2018, Stefan Ritt, Forum, custom script on custom page
|
> Uncaught ReferenceError: XMLHttpRequestGeneric is not defined
> at cs_button (Trend:165)
> at HTMLInputElement.onclick (Trend:90)
That code was not written by me, so I'm must guessing here.
Probably the XMLHttpRequestGeneric() is some function hiding browser specialities to create
AJAX requests. These days most browser understand the standard request
XMLHttpRequest()
so why don't you try to just remove the "Generic"
Stefan |
25 Jun 2018, Frederik Wauters, Forum, custom script on custom page
|
> > Uncaught ReferenceError: XMLHttpRequestGeneric is not defined
> > at cs_button (Trend:165)
> > at HTMLInputElement.onclick (Trend:90)
>
> That code was not written by me, so I'm must guessing here.
>
> Probably the XMLHttpRequestGeneric() is some function hiding browser specialities to create
> AJAX requests. These days most browser understand the standard request
>
> XMLHttpRequest()
>
> so why don't you try to just remove the "Generic"
>
> Stefan
That removes the error, but script doesnt get called. It goes to the javascript function and
callback, but nothing happens.
When I change type=button to type=submit , the script gets called again, but with page refresh. |
08 Jun 2018, Lee Pool, Info, MIDAS RTEMS PoRT
|
Hi,
So I finally got around to "publish" work I did in 2009/2010 with RTEMS.
The work was mainly between myself and Till Straumann (SLAC), and Dr. Joel
Sherill, to get VME support for vme universe/vme tsi148 ( basic support ), into
the i386 bsp.
https://bitbucket.org/lcpool2/midas-k600/src/develop/ ( our rtems port ).
What this did was to allow us to run our various VME single board controllers,
with a single frontend application.
It is still classified testing but its been very successful, so
far, and I hope to use it in the next experiment, if possible.
The midas port, contains a makefile, and some changes to the
midas.c/system.c/mfe.c files. I've not tested the full functionality
as I'm super time limited.
Hope this is help full to others... |
17 May 2018, Zaher Salman, Forum, embedding history in SVG
|
I am embedding histories into a custom page within an SVG,
<image x="21000" y="1000" width="6000" height="6000"
href="../HS/SampleCryo/SampleTemp.gif?width=230&scale=0.5h"/>
this works fine. However, I would like to update this regularly without
refreshing the full page via
<meta http-equiv="Refresh" content="60">
is there a good way to do that? By the way, the "Periodic update of parts of a
custom page" from the documentation does not seem to work here. |
|