ELOG Midas

Back Midas Rome Roody Rootana

Midas DAQ System, Page 34 of 138

Not logged in

Find | Login | Help

New entries since:

Wed Dec 31 16:00:00 1969

Full | Summary | Threaded | Hide attachments

2755 Entries

Goto page Previous 1, 2, 3 ... 33, 34, 35 ... 136, 137, 138 Next

ID	Date	Author	Topic	Subject
1282	26 Apr 2017	Stefan Ritt	Info	added db_get_value_string()
Just some thought for discussion: Rather than "spicing up" the MIDAS library here and there with C++ objects such as std::string, wouldn't it make more sense to "cleanly" wrap an ODB value in a C++ class? We could use then both APIs in parallel, and encourage the C++ API for new developments. We could then write things like: ODBKEY<std::string> name("/Experiment/Name"); // constructor calls automatically db_get_value name = "New Name"; // overloading the "=" operator, will call db_set_value() or even ODBKEY<std::vector, std::string> nameArray("..."); for (auto &s : nameArray) std::cout << s << std::endl; // print all elements of string array so we treat ODB arrays as vectors, which fixes array boundary violations nicely. If the key does not exist, we could properly throw exceptions and forget about tons of nested return parameters for error conditions. Many nice things could be done, common errors could be prevented, and we can do a "smooth" migration: We don't have to change the whole library completely, just where we feel it's currently needed. So over time the code would be "objectified". Would be nice if we could rely on C++11 (like the "auto" feature above). Not sure about VxWorks, but every other OS should be fine. Stefan > Since we have been regularly running into problems with db_get_xxx(TID_STRING) and string buffers of mismatched size, > I now implemented db_get_value_string(hdb, hkey, key_name, index, &string, create). > > It works the same as db_get_value(TID_STRING), except that the string value is returned into an std::string object, > memory allocation is handled by std::string and there is no string length limit (other than std::string limits). > > Accessing string arrays is done explicitly via an "index" parameter, if index is bigger than odb array size DB_OUT_OF_RANGE is returned > without logging an error message (e.g. db_get_data_index() will log an error). This makes is safe to iterate over array entries with a simple > loop of index from 0 and up until db_get returns an error. > > As before, if the odb entry does not exist, it will be created (if create==true) and initialized with the value of the string parameter (zero-terminated in odb). > > There is also newly added db_set_value_string() and cm_get_path_string(). if you want more of these, please ask, or send patches. > > K.O.
1284	02 May 2017	Konstantin Olchanski	Info	added db_resize_string()
> Since we have been regularly running into problems with db_get_xxx(TID_STRING) and string buffers of mismatched size, > I now implemented db_get_value_string(hdb, hkey, key_name, index, &string, create). I run into problems with string arrays - non-array strings have unlimited length, but string arrays have fixed string length, usually set at creation time. This causes a problem with growing arrays using db_get_value_string(), when converting a non-array variable to an array, the wrong string length gets used, and one gets an array with useless string length. There is no way to specify the correct array string length without adding more parameters to db_get_value_string() and confusing and complicating it for the typical case where it is used against simple (non-array) odb entries. To clarify the situation, db_get_value_string() was changed to reject attempts to resize an array and calls of db_get_value_string(index>0 and create==TRUE) now return an error. To create and resize string arrays, I added a new function - db_resize_array(hdb, hkey, key_name, num_values, max_string_size). Here, num_values is the new array size, making it possible to grow or shrink an array max_string_size is the new string size, making it possible to change the array string length after the array was created (there was no midas function to do this before now). I added a json-rpc call for db_resize_string(). But it still needs to be added to odbedit and mhttpd. K.O.
1286	02 May 2017	Konstantin Olchanski	Info	mhttpd inline-editor change
I changed the mhttpd odb inline editor to use the json-rpc interface. Good things: - browser no longer complains about obsolete synchronous ajax calls - can edit strings of arbitrary length (was limited to the max URL length) - funny characters " (quote), > and < (angle brackets) are correctly escaped. - after editing, the actual value from odb is loaded and displayed (confirming that the edit "took"). K.O.
1289	02 May 2017	Konstantin Olchanski	Info	added db_get_value_string()
> Just some thought for discussion: Even more thoughts: - c++ interface for odb. been there, done that. see VirtualODB in rootana. Can access live ODB, XML odb dump from midas file, even ODB through http/mhttpd (needs to be converted to json rpc api). - c++11. the ROOT team made the decision for us, for all practical reasons. RH/SL/CentOS <= 6 are left for dead. (but we still have machines as old as SL4). - odb interface via severe operator overloading. writing "let x=42;" to simulate the universe from the big band to thermal death is elegant (overload operator= of class "let") but there is a surprise for naive programmer (long run time, large memory consumption) - c++ exceptions. defective by design, as they do not carry enough debug information (i.e. java exceptions carry the full stack trace). in the typical case, it is impossible to tell who and why is throwing exceptions. error handling is reduced to "main() { try { real_main } catch exception { printf("sorry!"); }}. see http://stackoverflow.com/questions/1736146/why-is-exception-handling-bad - converting midas to a new simplified odb api. typical use via db_get_value() is already one (or two) line of code that cannot be reduced (have to specify odb path, tid, etc), so little is gained from using a different api. getting rid of db_find_key()/db_get_key() would be helpful, but with db_get_value(), they are hardly ever used in new code. There are weaknesses in the current api, would be nice to fix them some day, and a c++ api seems like the right way to go: - fix the race condition between db_enum_key() and db_delete_key(). (it is same as between "ls" and "rm" - with nfs, try to "rm" on one client while running "ls" on another, fun!) - fix the race condition between odb handles (pointers into shared memory) and db_delete_key() (and whatever else moves the keys around). This means using full odb paths for all odb api functions. - make it all work nice multithreaded - the above race conditions would become only worse if we encourage heavy use of threads in midas. And I do need a "no-odb" odb api for my "no-midas" midas frontend framework (where I can build and run the frontend without linking and connecting with a real midas), in practice it means all api "get" calls have to take a "default" value that is returned right back to me when I am not connected (or linked) with a real odb. Good fodder for this summer discussions. K.O. > > Rather than "spicing up" the MIDAS library here and there with C++ objects such as std::string, wouldn't it make more sense to "cleanly" wrap an ODB value in a C++ class? We could use then > both APIs in parallel, and encourage the C++ API for new developments. We could then write things like: > > ODBKEY<std::string> name("/Experiment/Name"); // constructor calls automatically db_get_value > name = "New Name"; // overloading the "=" operator, will call db_set_value() > > or even > > ODBKEY<std::vector, std::string> nameArray("..."); > for (auto &s : nameArray) > std::cout << s << std::endl; // print all elements of string array > > so we treat ODB arrays as vectors, which fixes array boundary violations nicely. > > If the key does not exist, we could properly throw exceptions and forget about tons of nested return parameters for error conditions. > > Many nice things could be done, common errors could be prevented, and we can do a "smooth" migration: We don't have to change the whole library completely, just where we feel it's currently > needed. So over time the code would be "objectified". Would be nice if we could rely on C++11 (like the "auto" feature above). Not sure about VxWorks, but every other OS should be fine. > > Stefan > > > Since we have been regularly running into problems with db_get_xxx(TID_STRING) and string buffers of mismatched size, > > I now implemented db_get_value_string(hdb, hkey, key_name, index, &string, create). > > > > It works the same as db_get_value(TID_STRING), except that the string value is returned into an std::string object, > > memory allocation is handled by std::string and there is no string length limit (other than std::string limits). > > > > Accessing string arrays is done explicitly via an "index" parameter, if index is bigger than odb array size DB_OUT_OF_RANGE is returned > > without logging an error message (e.g. db_get_data_index() will log an error). This makes is safe to iterate over array entries with a simple > > loop of index from 0 and up until db_get returns an error. > > > > As before, if the odb entry does not exist, it will be created (if create==true) and initialized with the value of the string parameter (zero-terminated in odb). > > > > There is also newly added db_set_value_string() and cm_get_path_string(). if you want more of these, please ask, or send patches. > > > > K.O.
1294	31 May 2017	Konstantin Olchanski	Info	modified db_watch() arguments
for reasons unknown, db_watch() did not have an "info" parameter passed through to the callback handler function, like it is done with db_open_record(). This omission makes it difficult to write db_watch handler functions that must watch multiple odb trees - db_watch only delivers the hkey of the modified item inside the tree, leaving us with no simple way to tell which tree it came from. An example of this is mfe.c watching the Common structure for multiple equipments. There are other uses for the "info" parameter, for example it is needed to implement c++ wrapper classes. this omission is now corrected at the cost of changing the definition db_watch(). all uses of db_watch() in the midas tree have been corrected, but all out-of-tree programs will not compile. For quick conversion, add a NULL parameter to db_watch() calls and add a "void*info" parameter to your watch handler function. sorry about this disturbance, K.O.
1305	13 Jul 2017	Konstantin Olchanski	Info	implemented: json-rpc batch requests
The mhttpd json-rpc interface now implements batch requests per http://www.jsonrpc.org/specification#batch In the nutshell, instead of a single request, one can send a json array of requests and receive a json array of replies. As a variance from the spec, the midas implementation executes the requests strictly in-order and the array of replies corresponds exactly to the array of requests (the spec requires user to use the "id" field to match replies to requests, in midas json-rpc, the 1st reply is always to the 1st request, 2nd reply is to the 2nd request and so forth). See this in action look at resources/example.html and in resources/transition.html K.O.
1307	25 Jul 2017	Stefan Ritt	Info	Current git repository "develop" branch broken
Dear all, we are currently undergoing major modifications in the way mhttpd is working. I realized that we are now at a state where mhttpd is currently broken, and it will take a few weeks in order to get everything converted to the new scheme we plan to use. Therefore I moved the git branch "master" to the last known stable version of midas. So for any practical purpose, please do NOT update your "develop" branch until further notice. To get the last stable version, you can do a $ git checkout master which moves you right before we started to make major modifications. Once we are finished, we will announce this here in the forum. Best regards, Stefan
1310	04 Aug 2017	Konstantin Olchanski	Info	Notes on installing midas from scratch
Notes on installing midas from scratch. The instruction on midaswiki will be synced with this later. cd ~/packages git clone ... cd midas make cd ~ mkdir ~/online cd ~/online ~/git/midas/darwin/bin/odbinit --env source env.sh ~/git/midas/darwin/bin/odbinit --exptab ~/git/midas/darwin/bin/odbinit ls -la send:online olchansk$ ls -la total 2376 drwxr-xr-x 15 olchansk staff 510 Aug 4 15:34 . drwxr-xr-x+ 244 olchansk staff 8296 Aug 4 15:33 .. -rw-r--r-- 1 olchansk staff 0 Aug 4 15:34 .ALARM.SHM -rw-r--r-- 1 olchansk staff 0 Aug 4 15:34 .ELOG.SHM -rw-r--r-- 1 olchansk staff 0 Aug 4 15:34 .HISTORY.SHM -rw-r--r-- 1 olchansk staff 0 Aug 4 15:34 .MSG.SHM -rw-r--r-- 1 olchansk staff 1183808 Aug 4 15:34 .ODB.SHM -rw-r--r-- 1 olchansk staff 8 Aug 4 15:34 .ODB_SIZE.TXT -rw-r--r-- 1 olchansk staff 15 Aug 4 15:34 .SHM_HOST.TXT -rw-r--r-- 1 olchansk staff 12 Aug 4 15:34 .SHM_TYPE.TXT -rw-r--r-- 1 olchansk staff 0 Aug 4 15:34 .SYSMSG.SHM -rw-r--r-- 1 olchansk staff 341 Aug 4 15:33 env.csh -rw-r--r-- 1 olchansk staff 322 Aug 4 15:33 env.sh -rw-r--r-- 1 olchansk staff 40 Aug 4 15:34 exptab -rw-r--r-- 1 olchansk staff 287 Aug 4 15:34 midas.log send:online olchansk$ odbedit ### works mhttpd ### bombs, requires SSL certificate https://bitbucket.org/tmidas/midas/issues/57/initial-mhttpd-should-bind-to-localhost odbedit ### cd /experiment, set "http redirect to https" to no, set "midas https port" to 0 mhttpd ### runs now connect to http://localhost:8080 ### status page works restart mhttpd as mhttpd -D mlogger -D fetest ### runs, prints time and data start a run from web page ### works ### fetest generates crazy data rate https://bitbucket.org/tmidas/midas/issues/58/fetest-crazy-data-rate ### go to history, define plot for SLOW/SLOW, see sine wave ### works ### history is written to expt dir, no good, go to "history" ### data files written to expt dir, no good, go to "data" ### midas.log written to data dir, no good (want expt dir) ### elog written to expt dir, go to "elog" ### logger channel config is wrong - gzip compression and crc32c should be enabled by default ### history config is wrong - FILE per-variable history should be enabled by default K.O.
1311	07 Aug 2017	Stefan Ritt	Info	Notes on installing midas from scratch
Thanks for documenting this in detail. A few suggestions: - is it really necessary to call odbedit three times? Maybe two or even three functions can be merged. Like you call odbinit, it checks if the environment is there, and creates it automatically if not. Same with the exptab. - can we make "http redirecto to https = n" and "midas https port = 0" as the default? Of course this has to go with binding to localhost only. - does it make sense to define default directories for history, data files and midas.log? Maybe we could come with a "default scheme" which can then later adjusted if needed. - will you take care of the wrong logger channel config and history config? Best regards, Stefan > Notes on installing midas from scratch. The instruction on midaswiki will be synced with this later. > > cd ~/packages > git clone ... > cd midas > make > cd ~ > mkdir ~/online > cd ~/online > ~/git/midas/darwin/bin/odbinit --env > source env.sh > ~/git/midas/darwin/bin/odbinit --exptab > ~/git/midas/darwin/bin/odbinit > ls -la > send:online olchansk$ ls -la > total 2376 > drwxr-xr-x 15 olchansk staff 510 Aug 4 15:34 . > drwxr-xr-x+ 244 olchansk staff 8296 Aug 4 15:33 .. > -rw-r--r-- 1 olchansk staff 0 Aug 4 15:34 .ALARM.SHM > -rw-r--r-- 1 olchansk staff 0 Aug 4 15:34 .ELOG.SHM > -rw-r--r-- 1 olchansk staff 0 Aug 4 15:34 .HISTORY.SHM > -rw-r--r-- 1 olchansk staff 0 Aug 4 15:34 .MSG.SHM > -rw-r--r-- 1 olchansk staff 1183808 Aug 4 15:34 .ODB.SHM > -rw-r--r-- 1 olchansk staff 8 Aug 4 15:34 .ODB_SIZE.TXT > -rw-r--r-- 1 olchansk staff 15 Aug 4 15:34 .SHM_HOST.TXT > -rw-r--r-- 1 olchansk staff 12 Aug 4 15:34 .SHM_TYPE.TXT > -rw-r--r-- 1 olchansk staff 0 Aug 4 15:34 .SYSMSG.SHM > -rw-r--r-- 1 olchansk staff 341 Aug 4 15:33 env.csh > -rw-r--r-- 1 olchansk staff 322 Aug 4 15:33 env.sh > -rw-r--r-- 1 olchansk staff 40 Aug 4 15:34 exptab > -rw-r--r-- 1 olchansk staff 287 Aug 4 15:34 midas.log > send:online olchansk$ > > odbedit ### works > mhttpd ### bombs, requires SSL certificate https://bitbucket.org/tmidas/midas/issues/57/initial-mhttpd-should-bind-to-localhost > odbedit ### cd /experiment, set "http redirect to https" to no, set "midas https port" to 0 > mhttpd ### runs now > connect to http://localhost:8080 ### status page works > restart mhttpd as mhttpd -D > mlogger -D > fetest ### runs, prints time and data > start a run from web page ### works > ### fetest generates crazy data rate https://bitbucket.org/tmidas/midas/issues/58/fetest-crazy-data-rate > ### go to history, define plot for SLOW/SLOW, see sine wave ### works > ### history is written to expt dir, no good, go to "history" > ### data files written to expt dir, no good, go to "data" > ### midas.log written to data dir, no good (want expt dir) > ### elog written to expt dir, go to "elog" > ### logger channel config is wrong - gzip compression and crc32c should be enabled by default > ### history config is wrong - FILE per-variable history should be enabled by default > > K.O. >
1317	11 Oct 2017	Konstantin Olchanski	Info	added support for ucLinux
Support for building for ucLinux was added to MIDAS. I use the emcraft toolchain and userland on some kind of embedded ARM CPU that does not have an MMU. See the Makefile for details. The main difference of ucLinux is lack of fork(), which cannot be done without an MMU. Not everything works, but at the least I can run a frontend and connect to an experiment on a remote host computer (mserver connection). K.O.
1318	13 Oct 2017	Konstantin Olchanski	Info	odb multithread support repaired
multithreaded access to odb was implemented back in 2013-2014. but recently a bug surfaced - there was a race condition in the odb locking code against cm_watchdog(). Somehow this only affected the mserver for the DRAGON experiment at TRIUMF. This is now fixed on the branch feature/midas-2017-10. (this branch collects all the code that needs additional testing before merging into develop and becoming the next release of midas). K.O.
1328	21 Nov 2017	Konstantin Olchanski	Info	MIDAS support on el5?
It has been reported that the current midas release candidate does not build on el5 linux (SL/RHEL/CentOS-5). According to Red Hat, el5 is end-of-life, last SL 5 (SL5.11) was done in 2014, so this linux is very old. Also as it happens, I do not have access to any el5 machines to check if midas builds or runs (but this can be fixed). https://www.scientificlinux.org/downloads/sl-versions/sl5/ https://access.redhat.com/support/policy/updates/errata On the midas web page (https://midas.triumf.ca) we do not explicitly state which versions of which linux we definitely support. Most other open- source projects only support current major linux distributions, hardly anybody supports end-of-life linuxes such as el5. Some projects do not even support recent linuxes still widely in use (ROOT6 does not build on stock el6 and there is no KDE5 for el7). So back to midas. Support for different operating systems comes down to: 1) C/C++ language support. We still use el6 (GCC 4.4.7), so use of c++-11 language features should be avoided 2) operating system features support: a) sysv semaphores (sysv shared memory no longer used, cannot be used on macos) aa) (macos also is missing parts of the sysv semaphore api, such as "wait for lock, with timeout", we are using an ugly work-around) b) posix shared memory with mprotect() & co c) posix mutexes, including recursive-type mutexes (this seems to be the problem on el5) d) bsd networking (need to migrate from select() to poll() and from gethostbyname() to getaddrinfo() & co (for IPv6 support)) Not all of these operating system functions are required for all of midas. Running mhttpd and mlogger requires pretty much everything. Running just a frontend connected to midas through the mserver requires the least features, just the networking is enough, I think. Obviously we cannot support midas in perpetuity on all versions of all operating systems, once I do not have access to a machine, I cannot even check that midas builds and that it runs the basic functions. Instead, we could provide a "feature reduced" build of midas (makefile target) that includes "just enough" of midas to (say) run a frontend, maybe even odbedit. We already have some provisions for this, but no obvious documented way actually doing it. So back to el5. How important it is to support very old operating systems? How many people still use el5? How about old versions of Ubuntu? Macos? If you use anything older than el6, can you speak up, (and if possible say why you cannot migrate to an up-to-date linux). K.O.
1371	08 Jun 2018	Lee Pool	Info	MIDAS RTEMS PoRT
Hi, So I finally got around to "publish" work I did in 2009/2010 with RTEMS. The work was mainly between myself and Till Straumann (SLAC), and Dr. Joel Sherill, to get VME support for vme universe/vme tsi148 ( basic support ), into the i386 bsp. https://bitbucket.org/lcpool2/midas-k600/src/develop/ ( our rtems port ). What this did was to allow us to run our various VME single board controllers, with a single frontend application. It is still classified testing but its been very successful, so far, and I hope to use it in the next experiment, if possible. The midas port, contains a makefile, and some changes to the midas.c/system.c/mfe.c files. I've not tested the full functionality as I'm super time limited. Hope this is help full to others...
1376	20 Jul 2018	Konstantin Olchanski	Info	ROOT I/O workshop notable
The ROOT I/O workshop was held on June 20th at CERN. A few things of interest in MIDAS land: - LZ4 is now used as default compression (replacing gzip-1) - JSON class streamer is finally implemented (XML streamer updated/reworked) - recursive read-write lock class implemented - do not see any special mention of Javascript I/O or jsroot, but jsroot git repo seems to be quite active Of these the recursive read-write lock is most interesting - using something similar would improve ODB performance and presumably fix the existing lock fairness problems. https://root.cern.ch/doc/master/TReentrantRWLock_8hxx_source.html https://indico.cern.ch/event/715802/contributions/2942560/attachments/1670191/2680682/ROOT_IO_June_Workshop_v2.pdf https://github.com/root-project/jsroot K.O.
1403	24 Oct 2018	Ryu Sawada	Info	bm_receive_event timeout in ROME
Hi all There is a bug report in the ROME repository which says bm_receive_event timeouts. https://bitbucket.org/muegamma/rome3/issues/8/rome-with-midas-produces-timeout-after Does anybody have any ideas what could causing the problem ? Ryu
1410	22 Nov 2018	Konstantin Olchanski	Info	status of self-signed https certificates
I just happened to check the current situation with self-signed https certificates as implemented in mhttpd. (To remember, the powers-that-be are pushing for universal use of https for all web access. The https implementation in mhttpd at the moment can only generate self-signed certificates, so...) plain unencrypted http: - both google chrome and firefox say "connection not secure", but connect without any fuss. - apple safari does not say anything https with self-signed certificate: - google chrome goes through an "are you sure?" page, "red not secure" status in toolbar - firefox does the same thing, requires adding a security exception, but still shows "not secure" status in toolbar - apple safari goes through a sequence of "are you sure?" pages, asks for the user password to add the self-signed certificate to the macos key store, then marks the connection as "secure" (good) So clearly powers-that-be do not want us to use self-signed certificates for https. (And frown on use of unencrypted http even for localhost connections). Properly signed certificates can be obtained from letsencrypt almost automatically, but of course mhttpd needs to know how to use them and how to do handle their automatic renewals. I plan to update the mongoose web server library inside mhttpd and with luck I will straighten some of this certificate business at the same time. In the mean time, we continue to recommend that mhttpd should be used behind a password protected https proxy (i.e. apache httpd, etc). K.O.
1411	30 Nov 2018	Stefan Ritt	Info	status of self-signed https certificates
> In the mean time, we continue to recommend that mhttpd should be used behind a password protected https proxy (i.e. apache > httpd, etc). I guess this is what moste people do anyhow these days. Do I understand correctly that this then rules out the usage of letsencrype certificates, since the host needs to be accessed from outside, which is not possible if running behind a password protected firewall. Stefan
1412	03 Dec 2018	Konstantin Olchanski	Info	status of self-signed https certificates
> > In the mean time, we continue to recommend that mhttpd should be used behind a password protected https proxy (i.e. apache > > httpd, etc). > > I guess this is what moste people do anyhow these days. Do I understand correctly that this then rules out the usage of letsencrype certificates, since the > host needs to be accessed from outside, which is not possible if running behind a password protected firewall. > > Stefan Careful, firewall != proxy, very different things. A firewall prevents network communications, period. (Like fences and locked doors, there are good reasons to have them). An https proxy is a way to have encrypted (protected) web communications with a machine behind a firewall. Basically, we have 4 main cases, all with trouble. 1) mhttpd running on localhost, "just for testing", is in trouble. there is no simple way to get a "blessed" certificate, and self-signed certificates are now "almost forbidden". http is "okey for now", but the writing is on the wall. There is no special exception for "local-only" connections. 2a) mhttpd running on an internet-connected machine, with apache httpd, our best case. To get this working one has to configure both apache httpd and the "blessed certificate" certbot tool. With luck, both tools work smoothly on current OSes (they do NOT). 2b) same, but without apache httpd. One still has to run certbot, and the "glue" between mhttpd and certbot is currently missing: need a way to point mhttpd to the certbot certificate files and a way to reload mhttpd when the certificate is auto-renewed. 3) mhttpd running on a machine behind a corporate firewall. worst case. if firewall Gods make an opening for ports 80 and 443, it becomes case (2a/b), otherwise, one must use some kind of https proxy. (Plus there is no trivial way to setup an encrypted secure communication channel between mhttpd and this proxy, a double bad). K.O. P.S. I guess one can use nginx as the https proxy instead of apache httpd. I did not try yet. My impression is that everybody uses nginx, except for people who started with apache httpd and are too lazy to try nginx. K.O.
1413	05 Dec 2018	Konstantin Olchanski	Info	Partial refactoring of ODB code
The current ODB code has several structural problems and I think I now figured out how to straighten them out. Here is the problems: a) nested (recursive) odb locks b) no clear separation between read-only access and read-write access c) no clear separation between odb validation and repair functions d) cm_msg() is called while holding a database lock Discussion: a) odb locks are nested because most functions lock the database, then call other functions that lock the database again. Most locking primitives - SystemV semaphores, POSIX semaphores and mutexes - usually do not permit nested (recursive) locking. For locking the odb shared memory we use a SystemV semaphore with recursion implemented "by hand" in ss_semaphore_wait_for(). This works ok. For making odb thread-safe, we use POSIX mutexes, and we rely on an optional feature (PTHREAD_MUTEX_RECURSIVE) which seems to work on most OSes, but is not required to exist and work by any standard. For example, recursive mutexes do not work in uclinux (linux for machines without an MMU). I looked at implementing recursive mutexes "by hand", same as we have the recursive semaphores, and realized that it is quite complicated and computationally expensive (read: inefficient). (Also I think nested and recursive locks is "not mainstream" and should rather be avoided). As an example you can see full complexity of a nested lock as recent implementation in ROOT. (good luck finding it). A solution for this problem is well known. All functions are separated into "unlocked" user-callable functions and "locked" internal functions. Nested locking is naturally eliminated. Call sequences: db_get_key() -> db_find_key() // odb is locked twice become db_get_key() -> db_get_key_locked() -> db_find_key_locked() // odb is locked once Actual implementation of this scheme turns out to be a very clean and mechanical refactoring (moving the code without changing what it does). As a try, I refactored db_find_key() and db_get_key() and I like the result. Locking is now obvious - obscure error paths with hidden "unlock before return" - are all gone. Extra conversions between hDB and pheader are gone. b) in this refactoring, functions that do not (should not) modify odb become easy to identify - the pheader argument is tagged "const". This simplifies the implementation of "write-protected" odb - instead of ad-hoc db_allow_write_locked() sprinkled everywhere, one can have obvious calls to "db_lock_read_only()" and "db_lock_read_write()". Separation of locks into "read" and "write" locks, in turn, improves locking behaviour - helps against problems like lock starvation - which we did see with MIDAS - as "read" locks are much more efficient - all readers can read the data at the same time, locking is only done when somebody need to "write". c) some db_validate() functions also try to do repair. this cannot work if validation is called from "read-only" functions like db_find_key(). I now think the "repair" functions should be separate from "validate" functions. validate functions should detect problems, repair functions would repair them. The question remains - when is good time to run a full repair. (probably at the time when we connect to the database - this way, simply starting "odbedit" will force a database check and repair). d) calls to cm_msg() when odb is locked has been a problem for a long time. because cm_msg() itself calls odb and because it also calls event buffer code (SYSMSG buffer) which in turn call odb functions, there was trouble with deadlocks between ODB and event buffer semaphores, trouble with recursive use of ODB, etc. Right now we have all this partially papered over by having cm_msg() put messages into a memory buffer that we periodically flush, but I was never super happy with that solution. For example, if we crash before the message buffer is flushed, all error messages are lost, they do not go into midas.log, they are not printed on the screen, they are not accessible in the core dump. To resolve this problem, I have all "locked" functions call db_msg() instead of cm_msg(). db_msg() saves the messages in a linked list which is flushed into cm_msg() immediately after we unlock odb. If we crash after generating an error message but before it is flushed to cm_msg(), we can still access it through the linked list inside the core dump. This is an improvement over what we have now. Ideally, all messages should be printed to the terminal and saved to midas.log and pushed into SYSMSG, but most of this is impractical at a moment when odb is locked - as we already know it leads to deadlocks and other trouble... Bottom line, I now have a path to improve the odb code and to resolve some of the long standing structural problems. K.O.
1414	11 Dec 2018	Stefan Ritt	Info	Partial refactoring of ODB code
All makes sense to me. I agree to proceed with the refactoring. One additional comment: In the 90's when I developed this code, locking was expensive. On a decent computer you could do a couple of thousand lock operations per second before you hit the 100% CPU limit. Therefore I tried to reduce the number of lock operation as much as possible. Like a db_find_key locks the ODB once and then goes through all keys before it unlocks again. If I would lock for every key and have an ODB with ten thousands of keys, that would have taken very long in the old days. Now the world has changed, we can do almost a million locks a second. So a db_get_record() does not have to obtain a whole directory in one go, but can get each value separately, and if necessary lock the ODB on each key access. This would be slower, but only a negligible amount these days. So in the spirit of making midas more robust, we can even go a step beyond simple refactoring and change the locking scheme if it becomes more transparent and stable. Best, Stefan

Goto page Previous 1, 2, 3 ... 33, 34, 35 ... 136, 137, 138 Next

ELOG V3.1.4-2e1708b5