> 1. Is it possible to get "Running time" using, for example, jsonrpc? (please see
> the attached file)
You have in the ODB "/Runinfo/Start time binary" which is measured in seconds since
1970. By subtracting this from the current time, you get the running time.
> 2. Is it possible to configure "Start time" and "Stop time" with time zone? For
> example when I start a new run, value of "Start time" key is automatically changed
> to "Fri Aug 21 12:38:36 2020" without time zone.
"Start time binary" and "Stop time binary" are in seconds since the 1970 in UTC, so no
time zone involved there. The ASCII versions of the start/stop time are derived from
the binary time using the server's local time zone. If you want to display them in a
different time zone, you have to create a custom page and convert it to another time
var d = new Date(start_time_binary);
Thank you, Stefan
> > 1. Is it possible to get "Running time" using, for example, jsonrpc? (please see
> > the attached file)
> You have in the ODB "/Runinfo/Start time binary" which is measured in seconds since
> 1970. By subtracting this from the current time, you get the running time.
> > 2. Is it possible to configure "Start time" and "Stop time" with time zone? For
> > example when I start a new run, value of "Start time" key is automatically changed
> > to "Fri Aug 21 12:38:36 2020" without time zone.
> "Start time binary" and "Stop time binary" are in seconds since the 1970 in UTC, so no
> time zone involved there. The ASCII versions of the start/stop time are derived from
> the binary time using the server's local time zone. If you want to display them in a
> different time zone, you have to create a custom page and convert it to another time
> var d = new Date(start_time_binary);
Thank you, Stefan
>>>var d = new Date(start_time_binary);
I need time zone because new Date() gives time related to time zone of my PC.
> > > 1. Is it possible to get "Running time" using, for example, jsonrpc? (please see
> > > the attached file)
> > You have in the ODB "/Runinfo/Start time binary" which is measured in seconds since
> > 1970. By subtracting this from the current time, you get the running time.
> > > 2. Is it possible to configure "Start time" and "Stop time" with time zone? For
> > > example when I start a new run, value of "Start time" key is automatically changed
> > > to "Fri Aug 21 12:38:36 2020" without time zone.
> > "Start time binary" and "Stop time binary" are in seconds since the 1970 in UTC, so no
> > time zone involved there. The ASCII versions of the start/stop time are derived from
> > the binary time using the server's local time zone. If you want to display them in a
> > different time zone, you have to create a custom page and convert it to another time
> > var d = new Date(start_time_binary);
> > Stefan
I have been looking at the 2019 workshop slides, I am interested in the C++ future of MIDAS.
I am quite interested in using the object oriented
ALPHA will start data taking in 2021
The most recent commit (b43aef648c2f8a7e710a327d0b322751ae44afea) throws this
src/tmfe_main.cxx:39:11: error: 'SIGPIPE' was not declared in this scope
It's fixed by adding #include <signal.h> to that file.
> The most recent commit (b43aef648c2f8a7e710a327d0b322751ae44afea) throws this
> compiler error:
> src/tmfe_main.cxx:39:11: error: 'SIGPIPE' was not declared in this scope
> signal(SIGPIPE, SIG_IGN);
> It's fixed by adding #include <signal.h> to that file.
"but it works just fine on my mac!"
anyhow, thank you for reporting this problem, it already fixed. the bitbucket auto-
build also caught it. I also boogered up "make remoteonly", also fixed now.
BTW, for production use I recommend midas from the "release" branches, unless one
needs a bug fix or new feature from the development branch.
> BTW, for production use I recommend midas from the "release" branches, unless one
> needs a bug fix or new feature from the development branch.
Fair point. I would suggest adding that recommendation to the wiki instructions. I
forget to add that step otherwise.
FYI - in conjunction with replacement of ladd00.triumf.ca, this MIDAS ELOG has been updated to the latest
version 2.7.3-2058. Please report any problems or anomalies. K.O.
Following changes to midas produced from the TRIUMF T2K project have been
committed to svn:
1) cm_shutdown() will now SIGKILL clients that cannot be stopped via normal
means. Previously cm_shutdown() would print a message to the effect "please kill
this client yourself manually". The user action in this case (assuming they did
not issue cm_shutdown() by mistake) has been to find out the client pid using
"ps", kill -KILL it, then "odbedit clean". cm_shutdown() now performs all this
2) rpc_send_event() did not correctly detect loss of connection to the remote
mserver (i.e. in case it was killed by cm_shutdown() above). Now, correct error
handling is in place and the remote frontend should gracefully shutdown if
mserver connection is lost. (However I observe that some of my remote frontends
fail to exit unless I do "exit(1);" from my frontend_exit() function.
3) mhttpd bug fixed: when editing odb entries, the "cancel" button did not work
4) lazylogger "script" backup type is now fully tested and documented. Example
scripts for writing to dcache are available by request.
5) mlogger and mhttpd changes for writing history data to an sql database are
mostly completed and will be committed after some more debugging. (If you are
interested in details, please contact me directly).
6) (committed some time ago) Makefile changes for cross-compiling midas are now
in: "make linux32", "make linux64", "make crosscompile".
I updated the documentation for setting up a MYSQL (MariaDB) database for
recording MIDAS history: https://midas.triumf.ca/MidasWiki/index.php/History_System#Write_MYSQL-history_events
One thing to note: the "writer" user must have the "INDEX" permission, otherwise
many things will not work correctly.
Included are the instructions for importing exiting *.hst history files into the
SQL database: mh2sql --mysql mysql_writer.txt *.hst
Let me know if there is interest in adding support for writing into Postgres SQL
database. We used to support both MySQL and Postgres through the ODBC library,
but in the new code, each database has to be supported through it's native API.
There is code for SQLITE, MYSQL, but no code for Postgres, although it is not too
hard to add.
a big update to the event buffer code was merged today.
two important bug fixes:
- a logic error in bm_receive_event() (actually bm_fill_read_cache_locked())
caused use of uninitialized variable to increment the read pointer and crash
with error "read pointed points to an invalid event")
- missing bm_unlock() in bm_flush_cache() caused double-locking of event buffer
caused a hang and a subsequent crash via the watchdog timeout.
- bm_receive_event_vec(std::vector<char>) with automatic memory allocation, one
does not need to worry about providing a large event buffer to receive event
data. For local connections MAX_EVENT_SIZE is no longer used, for remote
connections, a buffer of MAX_EVENT_SIZE is allocated automatically, this is a
limitation of the MIDAS RPC layer (it does not know how to allocate memory to
receive arbitrary large data)
(MAX_EVENT_SIZE is now only used in bm_receive_event_rpc()).
- rpc_send_event_sg() - thread safe method to send events to the mserver. it
takes an array of scatter-gather buffers, so a midas event does not have to be
in one continious buffer.
- bm_send_event_sg() - same for local connections.
- on top of bm_send_event_sg() we now have bm_send_event_vec(std::vector<char>)
and bm_send_event_vec(std::vector<vector<char>>). now we can move forward with
implementing a new "event object" (the TMEvent event object from midasio.h
already works with these new methods).
- remote connected bm_send_event() & co now always send events to the mserver
using the event socket. (before, bm_send_event() used RPC_BM_SEND_EVENT and
suffered from the RPC layer encoding/decoding overhead. mfe.c used
rpc_send_event() for remote connections)
- bm_send_event(), bm_receive_event() & co now take a timeout value (in
milliseconds) instead of an async_flag. The old async_flag values BM_WAIT and
BM_NO_WAIT continue working as expected (wait forever and do not wait at all,
- following improvements are only for remote connections:
- in the case of event buffer congestion (event readers are slow, event buffers
are close to 100% full), the bm_flush_cache() RPC will no longer timeout due to
mserver being stuck waiting for free buffer space. (RPC is called with a 1000
msec timeout, infinite loop waiting for flush is done on the frontend side, the
RPC timeout will never fire)
- in the case of event buffer congestion, ODB RPC will no longer timeout.
(previously mserver was stuck waiting for free buffer space and did not process
- at the end of run, last few events could be stuck in the event socket. now,
frontends can flush it using bm_flush_cache(0,BM_WAIT) (use zero for the buffer
handle). correct run transition should stop the trigger, stop generating new
events, call bm_flush_cache(0,BM_WAIT), call bm_flush_cache("SYSTEM",BM_WAIT)
and return success. (TMFE frontend already does this). Note that
bm_flush_cache(BM_WAIT) can be stuck for very long time waiting for the event
buffers to empty-out, so run transition RPC timeout is still possible.
The mhttpd history "export" function has been converted to the new midas history
interface and should now work for SQL-based history systems. In the process,
improvements by Eoin Butler (CERN AD-5/ALPHA) were merged - adding a UNIX
timestamp and a better text timestamp. Also now "export" outputs the actual
values from the history file - the scaling values from the definition of the
history plot panel are no longer applied.
Here is an example of the new file format:
Time, Timestamp, Run, Run State, SLOW
2011.06.21 15:45:21, 1308696321, 13292, 3, -89.1007
svn rev 5104
I commited the latest VMIC VME driver we use at TRIUMF. It has working support
for D32 and D64 DMA and can move data from the SIS3820 multiscaler through the
MIDAS frontend at > 30 Mbytes/sec on our VMICVME-7805 machines. The actual DMA
speed on the VME bus is around 50 Mbytes/sec, effective data rate is lower
because of a memcpy() from the kernel DMA buffer into user memory (required by
the MIDAS mvmestd.h interface, quite inefficient for DMA operations). K.O.
(update: resolve all FIXMEs, document the breakup of "structured banks")
This note documents the workings of the midas history.
There is 2 separate history sections: equipment history and links history.
* is equipment history enabled?
For each equipment, history is controlled by the value of /eq/xxx/common/period:
0 = history disabled
1 = history is enabled
>1 = history is enabled, throttled down
The throttling is implemented in log_history()/watch_history() by this algorithm:
the very first history event is recorded, then all changed to the data are ignored until
"period" seconds has elapsed. Then the next history event will be recorded, and following
changes will be ignored until "period" second elapses, and so forth. Period value "1" has
special meaning - there is no throttling, all history events are logged.
If equipment history is enabled, history events are created by parsing the content of /eq/xxx/variables.
* what is history events?
A "history event" is a history atomic unit of data. Associated with each history event is a timestamp (unix time),
a name (limited to NAME_LENGTH in the old history) and a list of history tags that describe the individual data
values inside the history event.
When making history plots in mhttpd, for each curve on the plot, one selects a history event (from the list
of currently active events, recently active events or the list of all events that ever existed), then from the list of tags
inside the history event one selects the particular variable that will be plotted.
In the old MIDAS history, all history events are written into one history file (.hst file + optional .def and .idx event definition and time index files
which can be/are regenerated automatically from the .hst file). History events are identified by 16-bit history event IDs, the persistent mapping
from history event names and the 16-bit history event IDs is stored in ODB /History/Events. In addition the list of all known history event tags is
stored in ODB /History/Tags. For per-equipment history, the 16-bit history event ID is the value of ODB "/eq/xxx/common/event id".
In the SQL history (MySQL, SQLITE, etc), each history event is an SQL table. The history event tags are the SQL table columns.
In the new FILE history, each history event is written into a separate file, tag definition are recorded in text formal in the file header, history event
data is appended to the file in binary format (fixed record size). If the history event definition is changed, a new file will be started.
* how are history events constructed?
The mlogger creates history events in open_history() by parsing ODB /eq/xxx/variables. Each ODB entry under "variables" is referred to as a "variable".
Each variable can be a single ODB value, an array of ODB values, or a subdirectory (corresponding to TID_STRUCT structured data banks). As each variable
is processed, one or more tags are created to describe it. Single ODB values will generally produce a single tag, while arrays can produce
one single tag - describing the whole array - or multiple tags - one per array element - depending whether the array is "named" or not.
The code can generate two types of history:
- "per-equipment" history will have the tags for all variables concatenated together into one single history event
- "per-variable" history will have one history event defined for each variable. Inside could be one tag - for single odb values and unnamed arrays - or multiple tags - for named arrays and structured data
Per-equipment history is the original MIDAS history implementation.
Per-variable history was added to permit efficient data storage in SQL tables. It's initial implementation used 1 ODB hotlink for each variable and it was easy to exceed the maximum permitted number of
ODB hotlinks (db_open_record()).
To reduce consumption of hotlinks, db_watch() has been implemented and now per-variable history only uses 1 ODB hotlink per equipment.
With db_watch, per-equipment history is no longer available. per-variable history is the new default (and the only option).
* how are the history event tags constructed?
(quirk - single odb values are treated as arrays of length "1")
FIXME: single odb values should be treated as such, /eq/xxx/settings/names should not be applied
(quirk - "string" ODB entries are not permitted)
FIXME - single odb values of type TID_STRING should be possible with SQL, FILE and MIDAS history. arrays of strings is impossible "struct TAG" does not have a data field for string length - only n_data and
item length implied through it's TID.
History event tags are constructed in the mlogger add_equipment().
For variables of type TID_KEY (subdirectories, corresponding to TID_STRUCT structured banks), one tag is generated for each subdirectory entry. Tag names for /eq/xxx/var/aaa/bbb will be "aaa_bbb".
(with an underscore).
FIXME: subdirectory entries of type TID_KEY and TID_LINK should be explicitly forbidden.
FIXME: TID_KEY could be supported by replacing db_get_data() with db_get_record() in watch_history().
FIXME: TID_LINK could be supported by adding db_watch() on the link target.
For named arrays, individual tags are generated for each array element. Tag names are taken from the names array. For empty tag names (empty names array), tags are "aaa_0", "aaa_1", etc (for
/eq/xxx/var/aaa). For "single names" arrays, tag names have the variable name appended (with a space), for /eq/xxx/var/aaa and an empty names array, tags will be "aaa_0 aaa", "aaa_1 aaa", etc. For
populated names array, the tags will be "name0 aaa", "name1 aaa", etc.
For unnamed arrays and single odb variables (in ODB, single odb variables are arrays of length 1), a single tag is generated.
For TID_LINK variables what happens? FIXME!
FIXME: support TID_LINK variables by correctly parsing the link target and setting a db_watch() on the link target.
Named arrays have a "Names" entry in /eq/xxx/settings. For example, to add names to /eq/xxx/var/aaa, create a string array "/eq/xxx/settings/names aaa". The names array should be at least as long as
the corresponding data array. Individual entries in the names array can be left blank (tag names will be "aaa_0", "aaa_1", etc). Duplicate tag names are not permitted.
A single "Names" entry can be created to name all arrays in variables with the same names ("single names"). Create /eq/xxx/settings/names" and arrays /eq/xxx/var/aaa and /eq/xxx/var/bbb will have
history tags "name0 aaa", "name1 aaa", "name0 bbb", "name1 bbb", etc. If "names" are left blank, tag names will be "aaa_0 aaa", "aaa_1 aaa", "bbb_0 bbb", "bbb_1 bbb", etc.
In the mhttpd variables viewer, "single name" arrays are displayed in a 2D table.
* /history/links history
History events are created for each entry under /history/links.
Two types of links are permitted:
/history/links/aaa is a link to a subdirectory: db_watch() is setup to watch this subdirectory, tags are created for each subdirectory entry (1 tag per entry). There is no possibility for naming array elements, so 1 tag per array, regardless of the number of elements.
/history/links/bbb is a subdirectory with links to odb values: db_watch is setup to watch each link target, tags are created for each link (1 tag per link). tag name is the link name (NOT the target name). There is no possibility for naming array elements.
FIXME: Mixing links and subdirectories is not permitted, but could be done - additional db_watch() will need to be done on any links.
Update period history events created for /history/links is controlled by entries in "/history/links periods". Numeric values of periods are same as for equipment histories. Numeric value 0 disables the history for a particular event.
Add switch "-C" to odbedit to allow it to connect to corrupted ODB. Then,
depending on corruption, the user can manually remove or correct the
corrupted entries. Also, some corruption is automatically fixed by "odbedit"
itself. I use this functionality to debug and fix broken ODBs.
For your enjoyment, here is the diff:
diff -r1.64 odbedit.c
> BOOL corrupted;
< debug = cmd_mode = FALSE;
> debug = corrupted = cmd_mode = FALSE;
> else if (argv[i] == '-' && argv[i] == 'C')
> corrupted = TRUE;
< printf(" [-c Command] [-c @CommandFile] [-s size]
> printf(" [-c Command] [-c @CommandFile] [-s size]\n");
> printf(" [-g (debug)] [-C (connect to corrupted
< if (status != CM_SUCCESS)
> else if ((status == DB_INVALID_HANDLE)&&corrupted)
> cm_get_error(status, str);
> printf("ODB is corrupted, connecting anyway...\n");
> else if (status != CM_SUCCESS)
We've had mhttpd aborting regularly since upgrading from midas-1.9.3. This
happens during elog queries, and is due to an elog file that was incorrectly
modified by hand. The modification to the file occurred 6 months ago.
el_retrieve(midas.c:15683) now has several assert statements, one of which
aborts the program on reading the bad entry.
Why is assert used, instead of an error return from the function (if
necessary), and maybe an error message in the log file? Assert statements are
often removed, using NDEBUG, for normal use.
The problem elog entry had one character removed, so end-of-file came before
the end of the message. This could probably occur without the file being
altered, if the disk containing the elog fills.
> We've had mhttpd aborting regularly since upgrading from midas-1.9.3. This
> happens during elog queries, and is due to an elog file that was incorrectly
> modified by hand.
(sorry for delayed reply, for reasons unknown, I did not get an email notice when this was posted)
Yes, I agree, error handling in midas elog code is insufficient (note missing error checks for
read() and lseek() system calls). Anything but "perfect" elog files would cause funny errors and
> The modification to the file occurred 6 months ago.
> el_retrieve(midas.c:15683) now has several assert statements, one of which
> aborts the program on reading the bad entry.
I added those to fix problems with "broken last NN days" and with infinite looping in the elog code
that we observed in TWIST.
You are welcome to replace the assert() statements with proper error handling. I used to have some code
that could report the filename of the bad elog file. Can we also report the exact file location for broken
Please send me the diff, I will commit it to midas cvs.
> Why is assert used, instead of an error return from the function (if
> necessary), and maybe an error message in the log file? Assert statements are
> often removed, using NDEBUG, for normal use.
I use assert() in several ways:
0) I want a core dump each time X happens. (This is the only reasonable action when facing memory/stack
corruption. The problems in the elog code were stack corruption).
1) "I am too lazy to write proper error handling code" so I just crash and burn. This includes the
case where "proper error handling" would be "too invasive".
2) the error is too bad (or too deep) and there is no reasonable way to recover. Print an error message
and dump core (for later analysis). I sometimes use "cm_msg(); abort()". (assert is "printf("error"); abort()")
Please refer to literature for philosophic discussions on uses of assert() (Argh! Stefan will have my
head again!), but I will mention that "abort() early, abort() often" I find very effective. BTW, this technique
is heavily used in the Linux kernel (oops(), bug(), panic()) with some good effect, too.
> The problem elog entry had one character removed, so end-of-file came before
> the end of the message. This could probably occur without the file being
> altered, if the disk containing the elog fills.
Yes, I think you are right. In TWIST, we have seen disk-full conditions break both elog and history.
A "nested" or "recursive" lock is a special type of lock that permits a lock holder to lock the same resources again and again, without deadlocking on itself. They are
very useful, but tricky to implement because most system lock primitives (SYSV semaphores, POSIX mutexes, etc) do not permit nested locks, so all the logic for
"yes, I am the holder of the lock, yes, I can go ahead without taking it again" (plus the reverse on unlocking) has to be done "by hand". As ever, if implemented
wrong or used wrong, Bad Things happen. Many people dislike nested locks because of the added complexity, but realistically, it is impossible to build a system
that does not require nested locking at least somewhere.
MIDAS lock primitives - ss_semaphore_wait_for(), db_lock_database() and bm_lock_buffer() implement a type of nested locks.
ODB locks implemented in db_lock_database() fully support nested (recursive) locking and this feature is heavily used by the ODB library. Many ODB db_xxx()
functions take the ODB lock, do something, then call another ODB function that also takes the ODB lock recursively. This works well.
Unfortunately, the ODB nested lock implementation is NOT thread-safe. (Unless one is connected through the mserver, in which case, db_xxx() functions ARE
thread-safe because all ODB access is serialized by the mserver RPC mutex).
Event buffer locks implemented in bm_lock_buffer() rely on ss_semaphore_xxx() to provide nested locking.
ss_semaphore_wait_for() uses SYSV semaphores, which do not provide nested locking, except when called from cm_watchdog(). (keep reading).
Because bm_lock_buffer() does not implement nested locking, use of cm_msg() in buffer management code will lead to self-deadlock, as shown in the following
stack trace, where bm_cleanup() is working on the SYSMSG buffer, locked it, then called cm_msg() which is now waiting on the SYSMSG lock, which we are holding
#0 0x00007fff87274e9e in semop ()
#1 0x0000000100024075 in ss_semaphore_wait_for (semaphore_handle=1179654, timeout=300000) at src/system.c:2280
#2 0x0000000100015292 in bm_lock_buffer (buffer_handle=<value temporarily unavailable, due to optimizations>) at src/midas.c:5386
#3 0x000000010000df97 in bm_send_event (buffer_handle=1, source=0x7fff5fbfd430, buf_size=<value temporarily unavailable, due to optimizations>,
async_flag=0) at src/midas.c:6484
#4 0x000000010000e6f5 in cm_msg (message_type=2, filename=<value temporarily unavailable, due to optimizations>, line=4226, routine=0x10004559f
"bm_cleanup", format=0x100045550 "Client '%s' on buffer '%s' removed by %s because process pid %d does not exist") at src/midas.c:722
#5 0x000000010001553c in bm_cleanup_buffer_locked (i=<value temporarily unavailable, due to optimizations>, who=0x100045f42 "bm_open_buffer",
actual_time=869425784) at src/midas.c:4226
#6 0x00000001000167ee in bm_cleanup (who=0x100045f42 "bm_open_buffer", actual_time=869425784, wrong_interval=0) at src/midas.c:4286
#7 0x000000010001ae27 in bm_open_buffer (buffer_name=<value temporarily unavailable, due to optimizations>, buffer_size=100000,
buffer_handle=0x10006e9ac) at src/midas.c:4550
#8 0x000000010001ae90 in cm_msg_register (func=0x100000c60 <process_message>) at src/midas.c:895
#9 0x0000000100009a13 in main (argc=3, argv=0x7fff5fbff3d8) at src/odbedit.c:2790
This example deadlock is not a normal code path - I accidentally exposed this deadlock sequence by adding some extra locking.
But in normal use, cm_msg() is called quite often from cm_watchdog() and as protection against this type of deadlock, MIDAS
ss_semaphore_xxx() has a special case that permits one level of nesting for locks called by code executed from cm_watchdog(). This is a very
clever implementation of partial nested locking.
So again, we are running into problems with cm_msg() - logically it should be at the very bottom of the system hierarchy - everybody calls it from their most
delicate places, while holding various locks, etc - but instead, cm_msg() call the whole MIDAS system all over again - it calls ODB functions, event buffer functions,
etc - mostly to open and to write into the SYSMSG buffer.
If you are reading this, I hope you are getting a better idea of the difference between textbook systems and systems that are used in the field to get some work
I'm using a python client to start and stop runs, and the following code *appears*
to set the MIDAS state to "Run"
However, it doesn't seem to do other things associated with a run, like start
Is there a different way I should start the run from the python client?
The ODB variable "/Runinfo/State" is a symptom of starting/stopping a run, rather than the cause.
In C++, one uses `cm_transition()` to start/stop runs.
In python code you can use the `start_run()` and `stop_run()` functions from `midas.client`: https://bitbucket.org/tmidas/midas/src/00ff089a836100186e9b26b9ca92623e672f0030/python/midas/client.py#lines-793:808