Back Midas Rome Roody Rootana
  Midas DAQ System, Page 72 of 142  Not logged in ELOG logo
ID Datedown Author Topic Subject
  1427   28 Dec 2018 Konstantin OlchanskiInfonote on the midas event buffer code, part 2, bm_send_event()
> In this technical note, I write down the workings of the midas event buffer code
> we need to understand and write down how the event buffer code works.

The data ingress part of the event buffer code is very simple, events are sent to the event buffer via the one 
function bm_send_event(). There is no other way to inject data into an event buffer.

bm_send_event() does this:
= if the write cache is active:
- the new event is written into the local write cache
- if the write cache is full, it is flushed into the event buffer via bm_flush_cache()
- if the new event does not fit into the write cache, the write cache is flushed and the event is written into the 
event buffer (preserving the event ordering).
= if the write cache is inactive, new events are written directly into the event buffer.
= to write an event into the event buffer:
- wait for free space via bm_wait_for_free_space()
- lock buffer semaphore
- copy data into buffer shared memory
- update write pointer
- unlock buffer semaphore
- notify all readers waiting for this event

bm_flush_cache() does this:
- wait for free space via bm_wait_for_free_space()
- lock buffer semaphore
- copy events from write cache to buffer shared memory
- at the same time keep track of readers that wait for these events
- update write pointer
- unlock buffer semaphore
- notify all readers waiting for previous cached events

bm_wait_for_free_space() does this:
= if buffer is full and sync_flag==BM_NO_WAIT
- return BM_ASYNC_RETURN immediately
- causing bm_send_event() to immediately return BM_ASYNC_RETURN without writing anything to the event 
buffer
= if buffer is full,
- sleep using ss_suspend(1000, MSG_BM), then check again
- there is no timeout, bm_wait_for_free_space() will wait forever

The most expensive operation when writing to the event buffer is the locking,
we must wait until all readers finish their reading and all other writers finish their writing
and unlock the buffer for us. (read on "lock fairness and starvation"). In general, the less
locking we do, the better.

To reduce lock contention the event buffer code has a write cache. In theory, when we write
a large number of small events, it is more efficient to "batch" them together, significantly
reducing the number of locking operations "per event". Even if the events are large, if there
is significant lock contention from multiple writers and multiple readers, batching the writes
is still a good idea. The downside of this is the cost of an extra memcpy(). instead of one memcpy() from
user buffer to the shared memory, we do 2 memcpy() - from user buffer to write cache, then from
write cache to the shared memory. Today's typical PC-type machines have very fast RAM,
so memcpy() is inexpensive. However embedded and low power machines (ARM SoCs, FPGA based SoCs,
etc) tend to have pretty slow memory, so extra memcpy() can be expensive.

The bottom line is that the write cache size should be tuned to the actual use case,
but in general it is less useful at low data rates (hardly any contention for the event buffer locks),
and more useful at high data rates especially with very small event sizes (high overhead from
locking on each event).

Right now the write cache is always enabled for mfe-based frontends:
bm_set_cache_size(..., 0, SERVER_CACHE_SIZE); // 100000 bytes
(perhaps the cache size should be made configurable via ODB /Eq/xxx/Common).

To ensure that events do not sit in the write cache for too long,
mfe-based frontends call bm_flush_cache() about once per second.
(see mfe.c, good luck!)

to be continued,
K.O.
  1426   28 Dec 2018 Konstantin OlchanskiInfonote on the midas event buffer code, part 1
In this technical note, I write down the workings of the midas event buffer code, the path 
that events travel from the frontend to the SYSTEM buffer to mlogger (and to disk).

The event buffer code has worked well in the past, but more recently we see a few 
problems. There is the event buffer shared memory corruption problem in the alpha-g 
detector daq. There is difficulties with GET_RECENT. There is timeouts in bm_receive_event 
RPC path in ROME. There is the 2Gbyte size limit on the event buffer size (limiting 
maximum event size to about 1Gbyte), due to the 32-bit-ness of the event buffer size code. 
In the day of 10gige networkwing (1Gbyte/sec) and >1Gbyte/sec storage arrays, 2Gbyte 
buffer size is just about sufficient. There is lack of multithread safety in the event buffer 
code. There is the lack of bm_receive_event() where I do not have to guess the maximum 
event size (making event truncation impossible).

I have been looking at the event buffer code for many years. It is extremely very well 
written,
but it is also probably the oldest code inside midas and it's age shows. Good code from 
1998 is just very hard to read and follow 20 years later in 2018. We no longer do "goto", we 
are not afraid of using malloc(), we declare variables as we are about to use them (instead 
of at the beginning of a function). The list goes on and on.

So after looking at this code for many years, I finally decided to bite the bullet and 
rewrite/modernize it. To my surprise the code took very well to this. I only had to rewrite 
the parts that use difficult to follow "goto" logic, the rest of the code almost refactored 
itself. I think this is a very good thing to happen as the basic logic remains the same and 
people already familiar with old code (Stefan, myself, etc) will find that while things moved 
around, the basic logic remained the same.

But first, we need to understand and write down how the event buffer code works.

to be continued,
K.O.
  1425   27 Dec 2018 Konstantin OlchanskiBug Reportmhttpd - custom page - RHEL/Fedora
> I still strongly believe that mhttpd should not serve arbitrary files (only serve files explicitly listed in ODB) or as next best option,
> only serve files from subdirectories explicitly listed in ODB.
> 
> If people have access to the ODB, the can put the directory /etc/ into the ODB and again read that way /etc/passwd.
>

I suggest a more practical approach.

The default configuration should be secure (not serve /etc/passwd and .ssh/id_rsa.pub right out of the box). If users change things,
it is their business, we have to trust them to know what they are doing.

Still we should protect them from trivial security mistakes. Here is an example. Right now we set ODB /Custom/Path to $MIDASSYS,
which is often "$HOME/packages/midas" or "$HOME/git/midas". In this case, the following command will steal the ssh
private key:  "wget http://localhost:8080/%2e%2e/%2e%2e/.ssh/authorized_keys". (this will not work in the google chrome url bar,
as it replaces "%2e%2e" with ".." then normalizes "/.." to "/"). BTW, I do not know all and every way to obfuscate ".." in order
to escape from a file path jail. Maybe I should see what apache httpd people do against escapes from a file path jail.

Most important is to clearly explain which files we serve from which URLs. If we are upfront that we serve all and any files
with file names in the form ("/Custom/Path" + URL), they make have a clue to not set "/Custom/Path" to blank or "/". On our side,
obviously /Custom/Path set to "" should not mean that we serve any and all files with filenames that can be encoded into a URL.

K.O.

P.S. All this only reinforces my opinion that mhttpd should not be exposed directly to the internet (or even worse,
to a university campus network). Safest is to place it behind a password-protected https proxy and hope the password
is not leaked (hello, browser "save password/show password" button!) and is strong enough against
guessing or brute force attack. (hello, password midas/midas!).

K.O.
  1424   27 Dec 2018 Stefan RittBug Reportmhttpd - custom page - RHEL/Fedora
> BTW, "the fix" in mhttpd unconditionally creates /Custom/Path and sets it to the value of $MIDASSYS. This path 
> seems to be prepended to all file paths, so this fix also breaks the normal use of /Custom/xxx that contain the full 
> path name of the file to serve...

I just set the /Custom/Path to $MIDASSYS to have something non-zero there. This is only a default which should be changed to the directory 
containing the actual custom pages. If it breaks existing code, just set it manually to an empty string, nothing prevents you from doing that.

> Looks like file serving in mhttpd got messed up and needs to be reviewed. I still strongly believe that mhttpd should 
> be serve arbitrary files (only serve files explicitly listed in ODB) or as next best option, only serve files from 
> subdirectories explicitly listed in ODB.

I'm thinking along the same lines, but figured out that this cannot be done easily. If people have access to the ODB, the can put the directory 
/etc/ into the ODB and again read that way /etc/passwd. We would have to explicitly hard-code some directories to exclude like /etc/ /var/ etc. 
but on macOS that might be different. We could put the list of directories into a physical file, which cannot be edited via the web interface. 

Stefan
  1423   27 Dec 2018 Stefan RittInfoPartial refactoring of ODB code
> I am not sure this is quite true. The CPU can execute 3000 million operations per second (3GHz CPU, assuming 1 op/Hz),
> so 1 lock operation is worth 3000 normal operations. Of course cache misses and branch mispredictions mess up
> this simple arithmetic...

You can try that with "t1" in odbedit. This times the number of db_get_data() calls midas can do per second. On my MacBook Pro I get 470'000 
accesses per second.
  1422   26 Dec 2018 Konstantin OlchanskiSuggestionSelf-resetting alarm class
> > If you run an external script anyway, you can also call "odbedit -c alarm" to
> > reset all alarms. Or you could try to set the "Triggered" entry of that certain
> > alarm to 0 (again, with odbedit), that could also work.
> 
> That would not really help, because you cannot trigger a script AFTER an alarm occurred. Having 
> "self-resetting" alarms is actually not a bad idea. I could add a flag "Auto reset" which is false by 
> default and can be set to true for this functionality. Will keep that in mind for the next 
> development cycle.
> 

I second, this is a good idea. Sometimes I want "sticky" alarms that stay on to indicate that a bad thing happened in the 
past, sometimes I want self-resetting alarms that go away when a bad thing turns back into a good thing.

When I do this in a frontend, I manually trigger the alarm and manually clear the alarm, i.e. you can see this
done in ~addaq/online/src/fectrl.cxx

Use al_trigger_alarm() and al_reset_alarm().

This can also be done through the json-rpc interface - both calls are available as rpc commands - and so easy to use 
from javascript. (but there is no simple unix command line tool to issue json-rpc requests. ouch. must write one now.)

K.O.
  1421   26 Dec 2018 Konstantin OlchanskiForumImplementing MIDAS on a Satellite
> 
> Thank you for your comment Stefan. We do have some hardware resources on the board such as RAM, ROM and
> Flash storage so we wouldn't necessarily have to virtualize everything. Ideally we would like a
> completed and compressed file to be produced on board and regularly sent back to ground without
> requiring remote access. MIDAS is appealing to us because its easily automated but we wouldn't
> necessarily need functions like a GUI or web interface. Part of the discussion now is whether or not a
> microblaze processor would be sufficient or if we need a dedicted ARM processor.
> 

Hi, just recently I got a midas frontend to build and run on uclinux on a microblaze arm CPU (GRIFFIN CDM VME board).

It worked, but uncovered many problems inside midas - uclinux has no mmu, no multithreading, no recursive mutexes, no 
some of the other stuff assumed always available.

The worst problem I ran into was with uclinux giving us a very small stack so code like "int main() { char buf[10*1024]; }
crashes right away and there is a lot of code like this in midas.

My feeling about the xilinx soft-core CPU, if you can run uclinux, you can also run a midas frontend. We do not require 
memory beyond that needed to store one or two of your data events.

By design, the midas library can be built in a "minimal" configuration that only supports a frontend connected
to the mserver (no local ODB, no local event buffers, no local mhttpd/mlogger, etc).

As you have seen in the Makefile, there are provisions for cross-compilation and I cross-compile midas things quite often.

On the other side, if you have xilinx FPGA with build-in PowerPC CPU, most definitely you can run full linux
and you can run full midas on it, we have done this for the T2K/ND280 experiment in Japan.

K.O.
  1420   26 Dec 2018 Konstantin OlchanskiInfobm_receive_event timeout in ROME
> There is a bug report in the ROME repository which says bm_receive_event timeouts.
> https://bitbucket.org/muegamma/rome3/issues/8/rome-with-midas-produces-timeout-after
> Does anybody have any ideas what could causing the problem ?

There could be a problem with causing bm_receive_event() to wait for an event for a time longer than 
the rpc timeout. This rings a very small bell for me. But I do not remember the details.

As I now go through the midas event buffer code, I will check that bm_receive_event() connected 
through the mserver has correctly working timeouts.

Thank you for reminding me about this difficulty.

K.O.
  1419   26 Dec 2018 Konstantin OlchanskiInfoPartial refactoring of ODB code
> One additional comment: In the 90's when I developed this code, locking was expensive.
> Now the world has changed, we can do almost a million locks a second.

I am not sure this is quite true. The CPU can execute 3000 million operations per second (3GHz CPU, assuming 1 op/Hz),
so 1 lock operation is worth 3000 normal operations. Of course cache misses and branch mispredictions mess up
this simple arithmetic...

But I think cost of mutex lock/unlock can be easily measured. (hmm... now I am curious).

Bigger question is architectural, nested/recursive locks is definitely a bad thing to do (not just my opinion).

But closer to home, as I implemented "write protected" ODB, lock/unlock suddenly has to do MMU operations
(map unmap memory) and this is *very* expensive.

Also as we start doing more multithreading, lock contention is becoming a problem, and the standard solution
is to implement read-locks and write-locks. (everybody holding a read-lock can read ODB at the same time
without waiting).

So, moving in the direction of separate read and write locks and write-protected (and/or read-protected) ODB shared memory,
all points in the direction of reworking of ODB locks in the direction of removing the need for nested/recursive locks.

I think me and Stefan are in agreement here.

K.O.
  1418   26 Dec 2018 Konstantin OlchanskiBug Reportmhttpd - custom page - RHEL/Fedora
> I implemented that fix. Thank you to Andreas. Creating "Custom" directory from the web now does 
> not have that problem any more.

This fix also stops mhttpd from serving the /etc/passwd file.

BTW, "the fix" in mhttpd unconditionally creates /Custom/Path and sets it to the value of $MIDASSYS. This path 
seems to be prepended to all file paths, so this fix also breaks the normal use of /Custom/xxx that contain the full 
path name of the file to serve...

Looks like file serving in mhttpd got messed up and needs to be reviewed. I still strongly believe that mhttpd should 
be serve arbitrary files (only serve files explicitly listed in ODB) or as next best option, only serve files from 
subdirectories explicitly listed in ODB.

K.O.
  1417   26 Dec 2018 Konstantin OlchanskiBug Reportmhttpd - custom page - RHEL/Fedora
> > [mhttpd,ERROR] [mhttpd.cxx:563:rread,ERROR] Cannot read file '/root', read of 
> > 4096 returned -1, errno 21 (Is a directory)
> 
> On some linux systems, "/root" exists, it is a directory used as the home directory 
> of user "root" (~root is /root; traditional UNIX has ~root as /).
> 

I just got burned by the same problem on MacOS. mhttpd odb editor cannot open ODB "/System" 
because on MacOS there is a subdirectory called "/System".

So the question is why did mhttpd suddenly started serving files from the main URL?

K.O.
  1416   21 Dec 2018 Stefan RittBug Reportmhttpd - custom page - RHEL/Fedora
I implemented that fix. Thank you to Andreas. Creating "Custom" directory from the web now does 
not have that problem any more.

Stefan
  1415   18 Dec 2018 Konstantin OlchanskiInfomxml update
the mxml library was updated to make it thread-safe.
https://bitbucket.org/tmidas/mxml/src/master/

I also take an opportunity to remind all to update your copy to the latest version
as I just stumbled on old bug that I fixed 1 year ago (crash of mlogger)
but forgot to update all and every of my copies of mxml.

I also looked at the xml encoder and I see that it has several places where it may
truncate the data, but none of these places can cause truncation of ODB data
because the fixed-size internal buffers are big enough to hold the longest
values sent by the odb xml encoder.

K.O.
  1414   11 Dec 2018 Stefan RittInfoPartial refactoring of ODB code
All makes sense to me. I agree to proceed with the refactoring.

One additional comment: In the 90's when I developed this code, locking was expensive. On a decent computer you could do a couple of thousand lock operations per second before you hit the 100% 
CPU limit. Therefore I tried to reduce the number of lock operation as much as possible. Like a db_find_key locks the ODB once and then goes through all keys before it unlocks again. If I would lock for 
every key and have an ODB with ten thousands of keys, that would have taken very long in the old days. 

Now the world has changed, we can do almost a million locks a second. So a db_get_record() does not have to obtain a whole directory in one go, but can get each value separately, and if necessary lock 
the ODB on each key access. This would be slower, but only a negligible amount these days. So in the spirit of making midas more robust, we can even go a step beyond simple refactoring and change the 
locking scheme if it becomes more transparent and stable.

Best,
Stefan
  1413   05 Dec 2018 Konstantin OlchanskiInfoPartial refactoring of ODB code
The current ODB code has several structural problems and I think I now figured out how to straighten them out.

Here is the problems:

a) nested (recursive) odb locks
b) no clear separation between read-only access and read-write access
c) no clear separation between odb validation and repair functions
d) cm_msg() is called while holding a database lock

Discussion:

a) odb locks are nested because most functions lock the database, then call other functions that lock the database again. Most locking primitives - SystemV 
semaphores, POSIX semaphores and mutexes - usually do not permit nested (recursive) locking.

For locking the odb shared memory we use a SystemV semaphore with recursion implemented "by hand" in ss_semaphore_wait_for(). This works ok.

For making odb thread-safe, we use POSIX mutexes, and we rely on an optional feature (PTHREAD_MUTEX_RECURSIVE) which seems to work on most OSes, but 
is not required to exist and work by any standard. For example, recursive mutexes do not work in uclinux (linux for machines without an MMU).

I looked at implementing recursive mutexes "by hand", same as we have the recursive semaphores, and realized that it is quite complicated and computationally 
expensive (read: inefficient). (Also I think nested and recursive locks is "not mainstream" and should rather be avoided). As an example you can see full 
complexity of a nested lock as recent implementation in ROOT. (good luck finding it).

A solution for this problem is well known. All functions are separated into "unlocked" user-callable functions and "locked" internal functions. Nested locking is 
naturally eliminated.

Call sequences:
db_get_key() -> db_find_key() // odb is locked twice
become
db_get_key() -> db_get_key_locked() -> db_find_key_locked() // odb is locked once

Actual implementation of this scheme turns out to be a very clean and mechanical refactoring (moving the code without changing what it does).

As a try, I refactored db_find_key() and db_get_key() and I like the result. Locking is now obvious - obscure error paths with hidden "unlock before return"  - are all 
gone. Extra conversions between hDB and pheader are gone.

b) in this refactoring, functions that do not (should not) modify odb become easy to identify - the pheader argument is tagged "const".

This simplifies the implementation of "write-protected" odb - instead of ad-hoc db_allow_write_locked() sprinkled everywhere, one can have obvious calls to 
"db_lock_read_only()" and "db_lock_read_write()".

Separation of locks into "read" and "write" locks, in turn, improves locking behaviour - helps against problems like lock starvation - which we did see with MIDAS - 
as "read" locks are much more efficient - all readers can read the data at the same time, locking is only done when somebody need to "write".

c) some db_validate() functions also try to do repair. this cannot work if validation is called from "read-only" functions like db_find_key(). I now think the "repair" 
functions should be separate from "validate" functions. validate functions should detect problems, repair functions would repair them. The question remains - 
when is good time to run a full repair. (probably at the time when we connect to the database - this way, simply starting "odbedit" will force a database check and 
repair).

d) calls to cm_msg() when odb is locked has been a problem for a long time. because cm_msg() itself calls odb and because it also calls event buffer code 
(SYSMSG buffer) which in turn call odb functions, there was trouble with deadlocks between ODB and event buffer semaphores, trouble with recursive use of 
ODB, etc.

Right now we have all this partially papered over by having cm_msg() put messages into a memory buffer that we periodically flush, but I was never super happy 
with that solution. For example, if we crash before the message buffer is flushed, all error messages are lost, they do not go into midas.log, they are not printed on 
the screen, they are not accessible in the core dump.

To resolve this problem, I have all "locked" functions call db_msg() instead of cm_msg(). db_msg() saves the messages in a linked list which is flushed into 
cm_msg() immediately after we unlock odb.

If we crash after generating an error message but before it is flushed to cm_msg(), we can still access it through the linked list inside the core dump. This is an 
improvement over what we have now. Ideally, all messages should be printed to the terminal and saved to midas.log and pushed into SYSMSG, but most of this is 
impractical at a moment when odb is locked - as we already know it leads to deadlocks and other trouble...

Bottom line, I now have a path to improve the odb code and to resolve some of the long standing structural problems.

K.O.
  1412   03 Dec 2018 Konstantin OlchanskiInfostatus of self-signed https certificates
> > In the mean time, we continue to recommend that mhttpd should be used behind a password protected https proxy (i.e. apache 
> > httpd, etc).
> 
> I guess this is what moste people do anyhow these days. Do I understand correctly that this then rules out the usage of letsencrype certificates, since the 
> host needs to be accessed from outside, which is not possible if running behind a password protected firewall.
> 
> Stefan

Careful, firewall != proxy, very different things.

A firewall prevents network communications, period. (Like fences and locked doors, there are good reasons to have them).

An https proxy is a way to have encrypted (protected) web communications with a machine behind a firewall.

Basically, we have 4 main cases, all with trouble.

1) mhttpd running on localhost, "just for testing", is in trouble. there is no simple way to get a "blessed" certificate, and self-signed certificates are now "almost forbidden". http is "okey 
for now", but the writing is on the wall. There is no special exception for "local-only" connections.

2a) mhttpd running on an internet-connected machine, with apache httpd, our best case. To get this working one has to configure both apache httpd and the "blessed certificate" 
certbot tool. With luck, both tools work smoothly on current OSes (they do NOT).

2b) same, but without apache httpd. One still has to run certbot, and the "glue" between mhttpd and certbot is currently missing: need a way to point mhttpd to the certbot certificate 
files and a way to reload mhttpd when the certificate is auto-renewed.

3) mhttpd running on a machine behind a corporate firewall. worst case. if firewall Gods make an opening for ports 80 and 443, it becomes case (2a/b), otherwise, one must use some 
kind of https proxy. (Plus there is no trivial way to setup an encrypted secure communication channel between mhttpd and this proxy, a double bad).

K.O.

P.S. I guess one can use nginx as the https proxy instead of apache httpd. I did not try yet. My impression is that everybody uses nginx, except for people who started with apache httpd 
and are too lazy to try nginx.

K.O.
  1411   30 Nov 2018 Stefan RittInfostatus of self-signed https certificates
> In the mean time, we continue to recommend that mhttpd should be used behind a password protected https proxy (i.e. apache 
> httpd, etc).

I guess this is what moste people do anyhow these days. Do I understand correctly that this then rules out the usage of letsencrype certificates, since the 
host needs to be accessed from outside, which is not possible if running behind a password protected firewall.

Stefan
  1410   22 Nov 2018 Konstantin OlchanskiInfostatus of self-signed https certificates
I just happened to check the current situation with self-signed https certificates as implemented in mhttpd.

(To remember, the powers-that-be are pushing for universal use of https for all web access. The https
implementation in mhttpd at the moment can only generate self-signed certificates, so...)

plain unencrypted http:
- both google chrome and firefox say "connection not secure", but connect without any fuss.
- apple safari does not say anything

https with self-signed certificate:
- google chrome goes through an "are you sure?" page, "red not secure" status in toolbar
- firefox does the same thing, requires adding a security exception, but still shows "not secure" status in toolbar
- apple safari goes through a sequence of "are you sure?" pages, asks for the user password to add the self-signed certificate to 
the macos key store, then marks the connection as "secure" (good)

So clearly powers-that-be do not want us to use self-signed certificates for https. (And frown on use of unencrypted
http even for localhost connections). Properly signed certificates can be obtained from letsencrypt almost
automatically, but of course mhttpd needs to know how to use them and how to do handle their automatic renewals.

I plan to update the mongoose web server library inside mhttpd and with luck I will straighten some of this certificate business at 
the same time.

In the mean time, we continue to recommend that mhttpd should be used behind a password protected https proxy (i.e. apache 
httpd, etc).

K.O.
  1409   02 Nov 2018 Stefan RittBug ReportSide panel auto-expands when history page updates
> Joseph's original message says that the problem is with the standard MIDAS history page, which currently use a complete reload 
> when refreshing.  Of course we are planning to update this history pages to only grab what it needs (as well as changing the 
> plotting to use newer HTML plotting). But until that upgrade happens your fix is helpful for the history page.

Ok, now I understand, and of course I agree with you.

Stefan
  1408   02 Nov 2018 Thomas LindnerBug ReportSide panel auto-expands when history page updates
> > I apologise for miss using the word refresh. The re-appearing sidebar was also seen with the automatic
> > reload, I have implemented your fix here and it now works great!
> 
> Still did not get your point. Why is there "automatic reload"? The status page should not "completely reload" any more. 
> Instead, all data is fetched in the background using AJAX calls, and only the data on the page is updated once per second. 
If 
> there is a "complete reload", something is wrong.

Joseph's original message says that the problem is with the standard MIDAS history page, which currently use a complete reload 
when refreshing.  Of course we are planning to update this history pages to only grab what it needs (as well as changing the 
plotting to use newer HTML plotting). But until that upgrade happens your fix is helpful for the history page.
ELOG V3.1.4-2e1708b5