25 Jul 2017, Stefan Ritt, Info, Current git repository "develop" branch broken
|
Dear all,
we are currently undergoing major modifications in the way mhttpd is working. I realized that
we are now at a state where mhttpd is currently broken, and it will take a few weeks in order to
get everything converted to the new scheme we plan to use. Therefore I moved the git branch
"master" to the last known stable version of midas. So for any practical purpose, please do
NOT update your "develop" branch until further notice. To get the last stable version, you can
do a
$ git checkout master
which moves you right before we started to make major modifications. Once we are finished,
we will announce this here in the forum.
Best regards,
Stefan |
04 Aug 2017, Konstantin Olchanski, Info, Notes on installing midas from scratch
|
Notes on installing midas from scratch. The instruction on midaswiki will be synced with this later.
cd ~/packages
git clone ...
cd midas
make
cd ~
mkdir ~/online
cd ~/online
~/git/midas/darwin/bin/odbinit --env
source env.sh
~/git/midas/darwin/bin/odbinit --exptab
~/git/midas/darwin/bin/odbinit
ls -la
send:online olchansk$ ls -la
total 2376
drwxr-xr-x 15 olchansk staff 510 Aug 4 15:34 .
drwxr-xr-x+ 244 olchansk staff 8296 Aug 4 15:33 ..
-rw-r--r-- 1 olchansk staff 0 Aug 4 15:34 .ALARM.SHM
-rw-r--r-- 1 olchansk staff 0 Aug 4 15:34 .ELOG.SHM
-rw-r--r-- 1 olchansk staff 0 Aug 4 15:34 .HISTORY.SHM
-rw-r--r-- 1 olchansk staff 0 Aug 4 15:34 .MSG.SHM
-rw-r--r-- 1 olchansk staff 1183808 Aug 4 15:34 .ODB.SHM
-rw-r--r-- 1 olchansk staff 8 Aug 4 15:34 .ODB_SIZE.TXT
-rw-r--r-- 1 olchansk staff 15 Aug 4 15:34 .SHM_HOST.TXT
-rw-r--r-- 1 olchansk staff 12 Aug 4 15:34 .SHM_TYPE.TXT
-rw-r--r-- 1 olchansk staff 0 Aug 4 15:34 .SYSMSG.SHM
-rw-r--r-- 1 olchansk staff 341 Aug 4 15:33 env.csh
-rw-r--r-- 1 olchansk staff 322 Aug 4 15:33 env.sh
-rw-r--r-- 1 olchansk staff 40 Aug 4 15:34 exptab
-rw-r--r-- 1 olchansk staff 287 Aug 4 15:34 midas.log
send:online olchansk$
odbedit ### works
mhttpd ### bombs, requires SSL certificate https://bitbucket.org/tmidas/midas/issues/57/initial-mhttpd-should-bind-to-localhost
odbedit ### cd /experiment, set "http redirect to https" to no, set "midas https port" to 0
mhttpd ### runs now
connect to http://localhost:8080 ### status page works
restart mhttpd as mhttpd -D
mlogger -D
fetest ### runs, prints time and data
start a run from web page ### works
### fetest generates crazy data rate https://bitbucket.org/tmidas/midas/issues/58/fetest-crazy-data-rate
### go to history, define plot for SLOW/SLOW, see sine wave ### works
### history is written to expt dir, no good, go to "history"
### data files written to expt dir, no good, go to "data"
### midas.log written to data dir, no good (want expt dir)
### elog written to expt dir, go to "elog"
### logger channel config is wrong - gzip compression and crc32c should be enabled by default
### history config is wrong - FILE per-variable history should be enabled by default
K.O.
|
07 Aug 2017, Stefan Ritt, Info, Notes on installing midas from scratch
|
Thanks for documenting this in detail. A few suggestions:
- is it really necessary to call odbedit three times? Maybe two or even three functions can be merged. Like you call odbinit, it checks if the environment is
there, and creates it automatically if not. Same with the exptab.
- can we make "http redirecto to https = n" and "midas https port = 0" as the default? Of course this has to go with binding to localhost only.
- does it make sense to define default directories for history, data files and midas.log? Maybe we could come with a "default scheme" which can then later
adjusted if needed.
- will you take care of the wrong logger channel config and history config?
Best regards,
Stefan
> Notes on installing midas from scratch. The instruction on midaswiki will be synced with this later.
>
> cd ~/packages
> git clone ...
> cd midas
> make
> cd ~
> mkdir ~/online
> cd ~/online
> ~/git/midas/darwin/bin/odbinit --env
> source env.sh
> ~/git/midas/darwin/bin/odbinit --exptab
> ~/git/midas/darwin/bin/odbinit
> ls -la
> send:online olchansk$ ls -la
> total 2376
> drwxr-xr-x 15 olchansk staff 510 Aug 4 15:34 .
> drwxr-xr-x+ 244 olchansk staff 8296 Aug 4 15:33 ..
> -rw-r--r-- 1 olchansk staff 0 Aug 4 15:34 .ALARM.SHM
> -rw-r--r-- 1 olchansk staff 0 Aug 4 15:34 .ELOG.SHM
> -rw-r--r-- 1 olchansk staff 0 Aug 4 15:34 .HISTORY.SHM
> -rw-r--r-- 1 olchansk staff 0 Aug 4 15:34 .MSG.SHM
> -rw-r--r-- 1 olchansk staff 1183808 Aug 4 15:34 .ODB.SHM
> -rw-r--r-- 1 olchansk staff 8 Aug 4 15:34 .ODB_SIZE.TXT
> -rw-r--r-- 1 olchansk staff 15 Aug 4 15:34 .SHM_HOST.TXT
> -rw-r--r-- 1 olchansk staff 12 Aug 4 15:34 .SHM_TYPE.TXT
> -rw-r--r-- 1 olchansk staff 0 Aug 4 15:34 .SYSMSG.SHM
> -rw-r--r-- 1 olchansk staff 341 Aug 4 15:33 env.csh
> -rw-r--r-- 1 olchansk staff 322 Aug 4 15:33 env.sh
> -rw-r--r-- 1 olchansk staff 40 Aug 4 15:34 exptab
> -rw-r--r-- 1 olchansk staff 287 Aug 4 15:34 midas.log
> send:online olchansk$
>
> odbedit ### works
> mhttpd ### bombs, requires SSL certificate https://bitbucket.org/tmidas/midas/issues/57/initial-mhttpd-should-bind-to-localhost
> odbedit ### cd /experiment, set "http redirect to https" to no, set "midas https port" to 0
> mhttpd ### runs now
> connect to http://localhost:8080 ### status page works
> restart mhttpd as mhttpd -D
> mlogger -D
> fetest ### runs, prints time and data
> start a run from web page ### works
> ### fetest generates crazy data rate https://bitbucket.org/tmidas/midas/issues/58/fetest-crazy-data-rate
> ### go to history, define plot for SLOW/SLOW, see sine wave ### works
> ### history is written to expt dir, no good, go to "history"
> ### data files written to expt dir, no good, go to "data"
> ### midas.log written to data dir, no good (want expt dir)
> ### elog written to expt dir, go to "elog"
> ### logger channel config is wrong - gzip compression and crc32c should be enabled by default
> ### history config is wrong - FILE per-variable history should be enabled by default
>
> K.O.
> |
11 Oct 2017, Konstantin Olchanski, Info, added support for ucLinux
|
Support for building for ucLinux was added to MIDAS. I use the emcraft toolchain and userland on
some kind of embedded ARM CPU that does not have an MMU. See the Makefile for details. The
main difference of ucLinux is lack of fork(), which cannot be done without an MMU. Not everything
works, but at the least I can run a frontend and connect to an experiment on a remote host
computer (mserver connection). K.O. |
13 Oct 2017, Konstantin Olchanski, Info, odb multithread support repaired
|
multithreaded access to odb was implemented back in 2013-2014. but recently a bug surfaced -
there was a race condition in the odb locking code against cm_watchdog(). Somehow this only
affected the mserver for the DRAGON experiment at TRIUMF. This is now fixed on the branch
feature/midas-2017-10. (this branch collects all the code that needs additional testing before
merging into develop and becoming the next release of midas).
K.O. |
21 Nov 2017, Konstantin Olchanski, Info, MIDAS support on el5?
|
It has been reported that the current midas release candidate does not build on el5 linux (SL/RHEL/CentOS-5).
According to Red Hat, el5 is end-of-life, last SL 5 (SL5.11) was done in 2014, so this linux is very old. Also as it happens, I do not have access to any
el5 machines to check if midas builds or runs (but this can be fixed).
https://www.scientificlinux.org/downloads/sl-versions/sl5/
https://access.redhat.com/support/policy/updates/errata
On the midas web page (https://midas.triumf.ca) we do not explicitly state which versions of which linux we definitely support. Most other open-
source projects only support current major linux distributions, hardly anybody supports end-of-life linuxes such as el5. Some projects do not even
support recent linuxes still widely in use (ROOT6 does not build on stock el6 and there is no KDE5 for el7).
So back to midas. Support for different operating systems comes down to:
1) C/C++ language support. We still use el6 (GCC 4.4.7), so use of c++-11 language features should be avoided
2) operating system features support:
a) sysv semaphores (sysv shared memory no longer used, cannot be used on macos)
aa) (macos also is missing parts of the sysv semaphore api, such as "wait for lock, with timeout", we are using an ugly work-around)
b) posix shared memory with mprotect() & co
c) posix mutexes, including recursive-type mutexes (this seems to be the problem on el5)
d) bsd networking (need to migrate from select() to poll() and from gethostbyname() to getaddrinfo() & co (for IPv6 support))
Not all of these operating system functions are required for all of midas. Running mhttpd and mlogger requires
pretty much everything. Running just a frontend connected to midas through the mserver requires the least features,
just the networking is enough, I think.
Obviously we cannot support midas in perpetuity on all versions of all operating systems, once I do not have
access to a machine, I cannot even check that midas builds and that it runs the basic functions.
Instead, we could provide a "feature reduced" build of midas (makefile target) that includes "just enough" of midas
to (say) run a frontend, maybe even odbedit. We already have some provisions for this, but no obvious documented
way actually doing it.
So back to el5.
How important it is to support very old operating systems?
How many people still use el5?
How about old versions of Ubuntu? Macos?
If you use anything older than el6, can you speak up,
(and if possible say why you cannot migrate to an up-to-date linux).
K.O. |
08 Jun 2018, Lee Pool, Info, MIDAS RTEMS PoRT
|
Hi,
So I finally got around to "publish" work I did in 2009/2010 with RTEMS.
The work was mainly between myself and Till Straumann (SLAC), and Dr. Joel
Sherill, to get VME support for vme universe/vme tsi148 ( basic support ), into
the i386 bsp.
https://bitbucket.org/lcpool2/midas-k600/src/develop/ ( our rtems port ).
What this did was to allow us to run our various VME single board controllers,
with a single frontend application.
It is still classified testing but its been very successful, so
far, and I hope to use it in the next experiment, if possible.
The midas port, contains a makefile, and some changes to the
midas.c/system.c/mfe.c files. I've not tested the full functionality
as I'm super time limited.
Hope this is help full to others... |
20 Jul 2018, Konstantin Olchanski, Info, ROOT I/O workshop notable
|
The ROOT I/O workshop was held on June 20th at CERN. A few things of interest in MIDAS land:
- LZ4 is now used as default compression (replacing gzip-1)
- JSON class streamer is finally implemented (XML streamer updated/reworked)
- recursive read-write lock class implemented
- do not see any special mention of Javascript I/O or jsroot, but jsroot git repo seems to be quite active
Of these the recursive read-write lock is most interesting - using something similar would improve ODB performance
and presumably fix the existing lock fairness problems.
https://root.cern.ch/doc/master/TReentrantRWLock_8hxx_source.html
https://indico.cern.ch/event/715802/contributions/2942560/attachments/1670191/2680682/ROOT_IO_June_Workshop_v2.pdf
https://github.com/root-project/jsroot
K.O. |
24 Oct 2018, Ryu Sawada, Info, bm_receive_event timeout in ROME
|
Hi all
There is a bug report in the ROME repository which says bm_receive_event timeouts.
https://bitbucket.org/muegamma/rome3/issues/8/rome-with-midas-produces-timeout-after
Does anybody have any ideas what could causing the problem ?
Ryu |
22 Nov 2018, Konstantin Olchanski, Info, status of self-signed https certificates
|
I just happened to check the current situation with self-signed https certificates as implemented in mhttpd.
(To remember, the powers-that-be are pushing for universal use of https for all web access. The https
implementation in mhttpd at the moment can only generate self-signed certificates, so...)
plain unencrypted http:
- both google chrome and firefox say "connection not secure", but connect without any fuss.
- apple safari does not say anything
https with self-signed certificate:
- google chrome goes through an "are you sure?" page, "red not secure" status in toolbar
- firefox does the same thing, requires adding a security exception, but still shows "not secure" status in toolbar
- apple safari goes through a sequence of "are you sure?" pages, asks for the user password to add the self-signed certificate to
the macos key store, then marks the connection as "secure" (good)
So clearly powers-that-be do not want us to use self-signed certificates for https. (And frown on use of unencrypted
http even for localhost connections). Properly signed certificates can be obtained from letsencrypt almost
automatically, but of course mhttpd needs to know how to use them and how to do handle their automatic renewals.
I plan to update the mongoose web server library inside mhttpd and with luck I will straighten some of this certificate business at
the same time.
In the mean time, we continue to recommend that mhttpd should be used behind a password protected https proxy (i.e. apache
httpd, etc).
K.O. |
30 Nov 2018, Stefan Ritt, Info, status of self-signed https certificates
|
> In the mean time, we continue to recommend that mhttpd should be used behind a password protected https proxy (i.e. apache
> httpd, etc).
I guess this is what moste people do anyhow these days. Do I understand correctly that this then rules out the usage of letsencrype certificates, since the
host needs to be accessed from outside, which is not possible if running behind a password protected firewall.
Stefan |
03 Dec 2018, Konstantin Olchanski, Info, status of self-signed https certificates
|
> > In the mean time, we continue to recommend that mhttpd should be used behind a password protected https proxy (i.e. apache
> > httpd, etc).
>
> I guess this is what moste people do anyhow these days. Do I understand correctly that this then rules out the usage of letsencrype certificates, since the
> host needs to be accessed from outside, which is not possible if running behind a password protected firewall.
>
> Stefan
Careful, firewall != proxy, very different things.
A firewall prevents network communications, period. (Like fences and locked doors, there are good reasons to have them).
An https proxy is a way to have encrypted (protected) web communications with a machine behind a firewall.
Basically, we have 4 main cases, all with trouble.
1) mhttpd running on localhost, "just for testing", is in trouble. there is no simple way to get a "blessed" certificate, and self-signed certificates are now "almost forbidden". http is "okey
for now", but the writing is on the wall. There is no special exception for "local-only" connections.
2a) mhttpd running on an internet-connected machine, with apache httpd, our best case. To get this working one has to configure both apache httpd and the "blessed certificate"
certbot tool. With luck, both tools work smoothly on current OSes (they do NOT).
2b) same, but without apache httpd. One still has to run certbot, and the "glue" between mhttpd and certbot is currently missing: need a way to point mhttpd to the certbot certificate
files and a way to reload mhttpd when the certificate is auto-renewed.
3) mhttpd running on a machine behind a corporate firewall. worst case. if firewall Gods make an opening for ports 80 and 443, it becomes case (2a/b), otherwise, one must use some
kind of https proxy. (Plus there is no trivial way to setup an encrypted secure communication channel between mhttpd and this proxy, a double bad).
K.O.
P.S. I guess one can use nginx as the https proxy instead of apache httpd. I did not try yet. My impression is that everybody uses nginx, except for people who started with apache httpd
and are too lazy to try nginx.
K.O. |
05 Dec 2018, Konstantin Olchanski, Info, Partial refactoring of ODB code
|
The current ODB code has several structural problems and I think I now figured out how to straighten them out.
Here is the problems:
a) nested (recursive) odb locks
b) no clear separation between read-only access and read-write access
c) no clear separation between odb validation and repair functions
d) cm_msg() is called while holding a database lock
Discussion:
a) odb locks are nested because most functions lock the database, then call other functions that lock the database again. Most locking primitives - SystemV
semaphores, POSIX semaphores and mutexes - usually do not permit nested (recursive) locking.
For locking the odb shared memory we use a SystemV semaphore with recursion implemented "by hand" in ss_semaphore_wait_for(). This works ok.
For making odb thread-safe, we use POSIX mutexes, and we rely on an optional feature (PTHREAD_MUTEX_RECURSIVE) which seems to work on most OSes, but
is not required to exist and work by any standard. For example, recursive mutexes do not work in uclinux (linux for machines without an MMU).
I looked at implementing recursive mutexes "by hand", same as we have the recursive semaphores, and realized that it is quite complicated and computationally
expensive (read: inefficient). (Also I think nested and recursive locks is "not mainstream" and should rather be avoided). As an example you can see full
complexity of a nested lock as recent implementation in ROOT. (good luck finding it).
A solution for this problem is well known. All functions are separated into "unlocked" user-callable functions and "locked" internal functions. Nested locking is
naturally eliminated.
Call sequences:
db_get_key() -> db_find_key() // odb is locked twice
become
db_get_key() -> db_get_key_locked() -> db_find_key_locked() // odb is locked once
Actual implementation of this scheme turns out to be a very clean and mechanical refactoring (moving the code without changing what it does).
As a try, I refactored db_find_key() and db_get_key() and I like the result. Locking is now obvious - obscure error paths with hidden "unlock before return" - are all
gone. Extra conversions between hDB and pheader are gone.
b) in this refactoring, functions that do not (should not) modify odb become easy to identify - the pheader argument is tagged "const".
This simplifies the implementation of "write-protected" odb - instead of ad-hoc db_allow_write_locked() sprinkled everywhere, one can have obvious calls to
"db_lock_read_only()" and "db_lock_read_write()".
Separation of locks into "read" and "write" locks, in turn, improves locking behaviour - helps against problems like lock starvation - which we did see with MIDAS -
as "read" locks are much more efficient - all readers can read the data at the same time, locking is only done when somebody need to "write".
c) some db_validate() functions also try to do repair. this cannot work if validation is called from "read-only" functions like db_find_key(). I now think the "repair"
functions should be separate from "validate" functions. validate functions should detect problems, repair functions would repair them. The question remains -
when is good time to run a full repair. (probably at the time when we connect to the database - this way, simply starting "odbedit" will force a database check and
repair).
d) calls to cm_msg() when odb is locked has been a problem for a long time. because cm_msg() itself calls odb and because it also calls event buffer code
(SYSMSG buffer) which in turn call odb functions, there was trouble with deadlocks between ODB and event buffer semaphores, trouble with recursive use of
ODB, etc.
Right now we have all this partially papered over by having cm_msg() put messages into a memory buffer that we periodically flush, but I was never super happy
with that solution. For example, if we crash before the message buffer is flushed, all error messages are lost, they do not go into midas.log, they are not printed on
the screen, they are not accessible in the core dump.
To resolve this problem, I have all "locked" functions call db_msg() instead of cm_msg(). db_msg() saves the messages in a linked list which is flushed into
cm_msg() immediately after we unlock odb.
If we crash after generating an error message but before it is flushed to cm_msg(), we can still access it through the linked list inside the core dump. This is an
improvement over what we have now. Ideally, all messages should be printed to the terminal and saved to midas.log and pushed into SYSMSG, but most of this is
impractical at a moment when odb is locked - as we already know it leads to deadlocks and other trouble...
Bottom line, I now have a path to improve the odb code and to resolve some of the long standing structural problems.
K.O. |
11 Dec 2018, Stefan Ritt, Info, Partial refactoring of ODB code
|
All makes sense to me. I agree to proceed with the refactoring.
One additional comment: In the 90's when I developed this code, locking was expensive. On a decent computer you could do a couple of thousand lock operations per second before you hit the 100%
CPU limit. Therefore I tried to reduce the number of lock operation as much as possible. Like a db_find_key locks the ODB once and then goes through all keys before it unlocks again. If I would lock for
every key and have an ODB with ten thousands of keys, that would have taken very long in the old days.
Now the world has changed, we can do almost a million locks a second. So a db_get_record() does not have to obtain a whole directory in one go, but can get each value separately, and if necessary lock
the ODB on each key access. This would be slower, but only a negligible amount these days. So in the spirit of making midas more robust, we can even go a step beyond simple refactoring and change the
locking scheme if it becomes more transparent and stable.
Best,
Stefan |
18 Dec 2018, Konstantin Olchanski, Info, mxml update
|
the mxml library was updated to make it thread-safe.
https://bitbucket.org/tmidas/mxml/src/master/
I also take an opportunity to remind all to update your copy to the latest version
as I just stumbled on old bug that I fixed 1 year ago (crash of mlogger)
but forgot to update all and every of my copies of mxml.
I also looked at the xml encoder and I see that it has several places where it may
truncate the data, but none of these places can cause truncation of ODB data
because the fixed-size internal buffers are big enough to hold the longest
values sent by the odb xml encoder.
K.O. |
26 Dec 2018, Konstantin Olchanski, Info, Partial refactoring of ODB code
|
> One additional comment: In the 90's when I developed this code, locking was expensive.
> Now the world has changed, we can do almost a million locks a second.
I am not sure this is quite true. The CPU can execute 3000 million operations per second (3GHz CPU, assuming 1 op/Hz),
so 1 lock operation is worth 3000 normal operations. Of course cache misses and branch mispredictions mess up
this simple arithmetic...
But I think cost of mutex lock/unlock can be easily measured. (hmm... now I am curious).
Bigger question is architectural, nested/recursive locks is definitely a bad thing to do (not just my opinion).
But closer to home, as I implemented "write protected" ODB, lock/unlock suddenly has to do MMU operations
(map unmap memory) and this is *very* expensive.
Also as we start doing more multithreading, lock contention is becoming a problem, and the standard solution
is to implement read-locks and write-locks. (everybody holding a read-lock can read ODB at the same time
without waiting).
So, moving in the direction of separate read and write locks and write-protected (and/or read-protected) ODB shared memory,
all points in the direction of reworking of ODB locks in the direction of removing the need for nested/recursive locks.
I think me and Stefan are in agreement here.
K.O. |
26 Dec 2018, Konstantin Olchanski, Info, bm_receive_event timeout in ROME
|
> There is a bug report in the ROME repository which says bm_receive_event timeouts.
> https://bitbucket.org/muegamma/rome3/issues/8/rome-with-midas-produces-timeout-after
> Does anybody have any ideas what could causing the problem ?
There could be a problem with causing bm_receive_event() to wait for an event for a time longer than
the rpc timeout. This rings a very small bell for me. But I do not remember the details.
As I now go through the midas event buffer code, I will check that bm_receive_event() connected
through the mserver has correctly working timeouts.
Thank you for reminding me about this difficulty.
K.O. |
27 Dec 2018, Stefan Ritt, Info, Partial refactoring of ODB code
|
> I am not sure this is quite true. The CPU can execute 3000 million operations per second (3GHz CPU, assuming 1 op/Hz),
> so 1 lock operation is worth 3000 normal operations. Of course cache misses and branch mispredictions mess up
> this simple arithmetic...
You can try that with "t1" in odbedit. This times the number of db_get_data() calls midas can do per second. On my MacBook Pro I get 470'000
accesses per second. |
28 Dec 2018, Konstantin Olchanski, Info, note on the midas event buffer code, part 1
|
In this technical note, I write down the workings of the midas event buffer code, the path
that events travel from the frontend to the SYSTEM buffer to mlogger (and to disk).
The event buffer code has worked well in the past, but more recently we see a few
problems. There is the event buffer shared memory corruption problem in the alpha-g
detector daq. There is difficulties with GET_RECENT. There is timeouts in bm_receive_event
RPC path in ROME. There is the 2Gbyte size limit on the event buffer size (limiting
maximum event size to about 1Gbyte), due to the 32-bit-ness of the event buffer size code.
In the day of 10gige networkwing (1Gbyte/sec) and >1Gbyte/sec storage arrays, 2Gbyte
buffer size is just about sufficient. There is lack of multithread safety in the event buffer
code. There is the lack of bm_receive_event() where I do not have to guess the maximum
event size (making event truncation impossible).
I have been looking at the event buffer code for many years. It is extremely very well
written,
but it is also probably the oldest code inside midas and it's age shows. Good code from
1998 is just very hard to read and follow 20 years later in 2018. We no longer do "goto", we
are not afraid of using malloc(), we declare variables as we are about to use them (instead
of at the beginning of a function). The list goes on and on.
So after looking at this code for many years, I finally decided to bite the bullet and
rewrite/modernize it. To my surprise the code took very well to this. I only had to rewrite
the parts that use difficult to follow "goto" logic, the rest of the code almost refactored
itself. I think this is a very good thing to happen as the basic logic remains the same and
people already familiar with old code (Stefan, myself, etc) will find that while things moved
around, the basic logic remained the same.
But first, we need to understand and write down how the event buffer code works.
to be continued,
K.O. |
28 Dec 2018, Konstantin Olchanski, Info, note on the midas event buffer code, part 2, bm_send_event()
|
> In this technical note, I write down the workings of the midas event buffer code
> we need to understand and write down how the event buffer code works.
The data ingress part of the event buffer code is very simple, events are sent to the event buffer via the one
function bm_send_event(). There is no other way to inject data into an event buffer.
bm_send_event() does this:
= if the write cache is active:
- the new event is written into the local write cache
- if the write cache is full, it is flushed into the event buffer via bm_flush_cache()
- if the new event does not fit into the write cache, the write cache is flushed and the event is written into the
event buffer (preserving the event ordering).
= if the write cache is inactive, new events are written directly into the event buffer.
= to write an event into the event buffer:
- wait for free space via bm_wait_for_free_space()
- lock buffer semaphore
- copy data into buffer shared memory
- update write pointer
- unlock buffer semaphore
- notify all readers waiting for this event
bm_flush_cache() does this:
- wait for free space via bm_wait_for_free_space()
- lock buffer semaphore
- copy events from write cache to buffer shared memory
- at the same time keep track of readers that wait for these events
- update write pointer
- unlock buffer semaphore
- notify all readers waiting for previous cached events
bm_wait_for_free_space() does this:
= if buffer is full and sync_flag==BM_NO_WAIT
- return BM_ASYNC_RETURN immediately
- causing bm_send_event() to immediately return BM_ASYNC_RETURN without writing anything to the event
buffer
= if buffer is full,
- sleep using ss_suspend(1000, MSG_BM), then check again
- there is no timeout, bm_wait_for_free_space() will wait forever
The most expensive operation when writing to the event buffer is the locking,
we must wait until all readers finish their reading and all other writers finish their writing
and unlock the buffer for us. (read on "lock fairness and starvation"). In general, the less
locking we do, the better.
To reduce lock contention the event buffer code has a write cache. In theory, when we write
a large number of small events, it is more efficient to "batch" them together, significantly
reducing the number of locking operations "per event". Even if the events are large, if there
is significant lock contention from multiple writers and multiple readers, batching the writes
is still a good idea. The downside of this is the cost of an extra memcpy(). instead of one memcpy() from
user buffer to the shared memory, we do 2 memcpy() - from user buffer to write cache, then from
write cache to the shared memory. Today's typical PC-type machines have very fast RAM,
so memcpy() is inexpensive. However embedded and low power machines (ARM SoCs, FPGA based SoCs,
etc) tend to have pretty slow memory, so extra memcpy() can be expensive.
The bottom line is that the write cache size should be tuned to the actual use case,
but in general it is less useful at low data rates (hardly any contention for the event buffer locks),
and more useful at high data rates especially with very small event sizes (high overhead from
locking on each event).
Right now the write cache is always enabled for mfe-based frontends:
bm_set_cache_size(..., 0, SERVER_CACHE_SIZE); // 100000 bytes
(perhaps the cache size should be made configurable via ODB /Eq/xxx/Common).
To ensure that events do not sit in the write cache for too long,
mfe-based frontends call bm_flush_cache() about once per second.
(see mfe.c, good luck!)
to be continued,
K.O. |
|