Back Midas Rome Roody Rootana
  Midas DAQ System, Page 111 of 138  Not logged in ELOG logo
ID Date Authordown Topic Subject
  2526   09 Jun 2023 Konstantin OlchanskiInfoadded IPv6 support for mserver and MIDAS RPC
as of commit 71fb5e82f3e9a1501b4dc2430f9193ee5d176c82, MIDAS RPC and the mserver 
listen for connections both on IPv4 and IPv6. mserver clients and MIDAS RPC 
clients can connect to MIDAS using both IPv4 and IPv6. In the default 
configuration ("/Expt/Security/Enable non-localhost RPC" set to "n"), IPv4 
localhost is used, as before. Support for IPv6 is a by product from switching 
from obsolete non-thread-safe gethostbyname() and getaddrbyname() to modern 
getaddrinfo() and getnameinfo(). This fixes bug 357, observed crash of mhttpd 
inside gethostbyname(). K.O.
  2528   12 Jun 2023 Konstantin OlchanskiSuggestionMaximum ODB size
> > correction: ODB shared memory is saved to .ODB.SHM each time a client stops, this is db_close_database().
> 
> The original design of the midas shared memory (back in the 1990's) was that the ODB shared memory file gets
> only saved into the .ODB.SHM when the *last* client exits. This ensures to keep the ODB persistent when the
> shared memory gets deleted. I vaguely remember I put something in like:
> 
> db_close_database()
> ...
>   destroy_flag = (pheader->num_clients == 0);
> 
>   if (destroy_flag)
>      ss_shm_flush(pheader->name, pdb->shm_adr, pdb->shm_size, pdb->shm_handle);

I remember the same, but I tracked it down in git to the very first commit, and there is no if() there,
odb is saved to .ODB.SHM on every client shutdown, not just the last client. I guess we both misremebered.

What's more, ss_shm_flush() is done while holding the ODB semaphore, so all other midas programs that try to access
odb at the same time (including the mserver) will stall until write() and close() return. at least we do not fsync(),
and there is no waiting until data is committed to physical media.

$ git annotate 3bb04af4d^ src/odb.c
...
ef8320177	(Stefan Ritt	1998-10-08 13:46:02 +0000	875)  destroy_flag = (pheader->num_clients == 0);
ef8320177	(Stefan Ritt	1998-10-08 13:46:02 +0000	876)
ef8320177	(Stefan Ritt	1998-10-08 13:46:02 +0000	877)  /* flush shared memory to disk */
ef8320177	(Stefan Ritt	1998-10-08 13:46:02 +0000	878)  ss_flush_shm(pheader->name, pheader, sizeof(DATABASE_HEADER)+2*pheader->data_size);
ef8320177	(Stefan Ritt	1998-10-08 13:46:02 +0000	879)
ef8320177	(Stefan Ritt	1998-10-08 13:46:02 +0000	880)  /* unmap shared memory, delete it if we are the last */
ef8320177	(Stefan Ritt	1998-10-08 13:46:02 +0000	881)  ss_close_shm(pheader->name, pheader,
ef8320177	(Stefan Ritt	1998-10-08 13:46:02 +0000	882)               _database[hDB-1].shm_handle, destroy_flag);
...

K.O.
  2535   13 Jun 2023 Konstantin OlchanskiSuggestionMaximum ODB size
> > I remember the same, but I tracked it down in git to the very first commit, and there is no if() there,
> > odb is saved to .ODB.SHM on every client shutdown, not just the last client. I guess we both misremebered.

small problem. build an experiment, start taking data, observe how ODB is never saved to disk because the "last client" never stops. as bonus, crash 
the computer, observe how all changes to ODB are now lost. if mlogger is configured to save odb.json at the end of run, and to write ODB dumps at 
begin and end of every data file, you can recover some of the lost data.

for better effect, ODB should be dumped to disk at periodic intervals. but. current implementation writes odb to disk while holding the ODB 
semaphore, which means all ODB access stops for the duration, specifically, there will be gaps in the history because mlogger cannot read history 
data from ODB.

a better implementation could take the ODB lock, make a copy of ODB shared memory, release the ODB lock, complete writing to disk without holding the 
lock. protection is needed against 100 midas programs trying to do this all at the same time. computers with 0.5 GB RAM (many ARM FPGA SoCs) will be 
limited to ~100 Mbyte ODB). plus deal with memory allocation failures when taking a copy of a 2GB ODB.

in theory, the mmap() shared memory (already implemented in midas) does this automatically, but we lose control
over disk writes, we see some OSes write odb to disk "too often" and at wrong times, i.e. while we are in the middle
of creating or deleting something. current sequence of open(), atomic write() and close() ensures ODB.SHM always
contains a valid odb. (minus loss of OS and disk caches to crash or power loss).

K.O.
  2536   13 Jun 2023 Konstantin OlchanskiSuggestionMaximum ODB size
> > BTW, how do I resize the ODB.

ODB cannot be resized "online". Everything has to stop, save content to odb.json, get rid of old ODB.SHM, ensure ODB shared memory is destroyed (SysV or POSIX shared memory), 
create new ODB with new size, load odb.json. Feel free to punch this into chatgpt > odbresize.cxx, commit, test, push.

> I remember we discussed this some time ago, and concluded that odbedit needs a resize flag.

ODB cannot be resized online. ODB API has ODB clients holding ODB handles which are pointers (offsets) into ODB shared memory.

> Has this even been done?
> I guess this is still not done and the issue is still open: https://bitbucket.org/tmidas/midas/issues/329/need-odbresize
> I guess if we touch this maybe the problem with the wrong size should be also fixed: https://bitbucket.org/tmidas/midas/issues/328/odbinit-s-1024mb-creates-odb-with-wrong

please contribute 14 distraction-free days to my patreon. thanks in advance!

K.O.
  2538   13 Jun 2023 Konstantin OlchanskiSuggestionMaximum ODB size
> 
> > small problem. build an experiment, start taking data, observe how ODB is never saved to disk because the "last client" never stops. as bonus, crash 
> > the computer, observe how all changes to ODB are now lost. if mlogger is configured to save odb.json at the end of run, and to write ODB dumps at 
> > begin and end of every data file, you can recover some of the lost 
> 
> The new behavior is not much worse than before. Assume 10 programs running happily for days, computer crashes, all ODB changes lost. 
> So indeed a periodic flush without holding the lock might be best. Use a semaphore to prevent all programs flushing at the same time, or put
> the flush only in the logger after an end of run.

are you sure? when/how often does "last midas program finishes" happen? it does not happen on a system crash, not on power loss, not on "shutdown -r now" 
(I am pretty sure). In the experiments you run, how often do you shut down all programs (and check that you did not forget one somehow)?

sanity check. dragon experiment, very active, .ODB.SHM timestamp is 1 second old. not-very-active agmini, today is June 13th, timestamp of .ODB.SHM is June 
2nd. inactive TACTIC, timestamp of .ODB.SHM is May 16th.

so yes, not great, but in the new scheme, ODB.SHM timestamps would probably be from 2021 or 2020.

my vote is to undo this change, it is dangerous because it causes odb to be saved to ODB.SHM never.

K.O.
  2541   15 Jun 2023 Konstantin OlchanskiSuggestionMaximum ODB size
> 
> > are you sure? when/how often does "last midas program finishes" happen? it does not happen on a system crash, not on power loss, not on "shutdown -r now" 
> > (I am pretty sure). In the experiments you run, how often do you shut down all programs (and check that you did not forget one somehow)?
> 
> Indeed this is almost never the case, maybe once per months. On the other hand, we have a complete crash of the os maybe once a year. Most of the time the programs 
> run continuously (we do not need odbedit), so our timestamp is typically one or two days old, so not good either.
> 
> > my vote is to undo this change, it is dangerous because it causes odb to be saved to ODB.SHM never.
> 
> My vote is to flush the odb either periodically or after each run.
> 

So we are in agreement.

RFE filed:
https://bitbucket.org/tmidas/midas/issues/367/odb-should-be-saved-to-disk-periodically

Dangerous change reverted:
60e4c44ad66346b89ba057391acf7a02890049be

K.O.

bash-3.2$ git diff
diff --git a/src/odb.cxx b/src/odb.cxx
index 0d3b88c2..d104ff28 100644
--- a/src/odb.cxx
+++ b/src/odb.cxx
@@ -2199,7 +2199,14 @@ INT db_close_database(HNDLE hDB)
       destroy_flag = (pheader->num_clients == 0);
 
       /* flush shared memory to disk */
-      if (destroy_flag)
+
+      /* if we save ODB to disk only after last client finishes, we will never save ODB to disk
+         in most experiments - none of them ever completely stop MIDAS in normal operation.
+         as result, all changes to ODB contents will be lost on system crash, power loss
+         or normal reboot. see https://daq00.triumf.ca/elog-midas/Midas/2539
+         K.O. June 2023. */
+
+      if (1 || destroy_flag)
          ss_shm_flush(pheader->name, pdb->shm_adr, pdb->shm_size, pdb->shm_handle);
 
       strlcpy(xname, pheader->name, sizeof(xname));

K.O.
  2556   18 Jul 2023 Konstantin OlchanskiForumpull request for PostgreSQL support
> is there any news regarding this pull request ? 
> (https://bitbucket.org/tmidas/midas/pull-requests/30)

apologies for taking a very long time to review the proposed changes.

the main problem with this pull request remains, it tangles together too many changes to the code and I cannot simply 
say "this is okey", merge and commit it.

example of unrelated change is diff of mlogger.cxx, change of function in: "db_get_value(hDB, 0, "/Logger/Multithread 
transitions" ... )". there is also unrelated changes to whitespace sprinkled around.

can you review your diffs again and try to remove as much unrelated and unnecessary changes as you can?

I could do this for you, and merge my version, but next time you merge base midas, you will have a collision.

unrelated change of function is introduction of something called "downsampling", what is the purpose of this? How is it 
different from requesting binned data? Is it just a kludge to reduce the data size? Before we merge it, can you post a 
description/discussion to this forum here? (as a separate topic, separate from discussion of PostgreSQL merge).

the changes to add PostgreSQL so fat look reasonable:
- CMakeLists, is always painful but if you do same a MySQL, should be okey, we always end up rejigging this several 
times before it works everywhere.
- history.h, ok, minus changes for adding the "downsample" feature
- mlogger.cxx, changes are too tangled with "downsample" feature, cannot review
- SetDownsample() API is defective, should have separate Get() and Set() functions
- history_common.cxx, please do not add downsampling code to history providers that do not/will not support it.
- history_odbc.cxx, please do not change it. it does not support downsampling and never will.
- history.cxx, ditto
- mjsonrpc.cxx, history API is changed, we must know: is new JS compatible with old mhttpd? is old JS compatible with 
new mhttpd? (mixed versions are very common in practice). if there is incompatibility, can you recoded it to be 
compatible?
- history_schema.cxx: bitbucket diff is a dog's breakfast, cannot review. I will have to checkout your branch and diff 
by hand.

changes to mhistory.js appear to be extensive and some explanation is needed for what is changed, what bugs/problems 
are fixed, what new features are added.

to move forward, can you generate a pull requests that only adds pgsql to history_schema.cxx, history_common.cxx and 
mlogger.cxx and does not add any other functions, features and does not change any whistespace?

K.O.


> 
> If you agree to merge I can resolve conflicts that now 
> (after two months) are listed...
> 
> Regards,
> Gennaro
> 
> > 
> > Hi,
> > I have updated the PR with a new one that includes TimescaleDB support and some
> > changes to mhistory.js to support downsampling queries...
> > 
> > Cheers,
> > Gennaro
> > 
> > > > some minutes ago I published a PR for PostgreSQL support I developed
> > > > at INFN-Napoli for Darkside experiment...
> > > > 
> > > > I don't know if you receive a notification about this PR and in doubt
> > > > I wrote this message...
> > > 
> > > Hi, Gennaro, thank you for the very useful contribution. I saw the previous version 
> > > of your pull request and everything looked quite good. But that pull request was 
> > > for an older version of midas and it would not have applied cleanly to the current 
> > > version. I will take a look at your updated pull request. In theory it should only 
> > > add the Postgres class and modify a few other places in history_schema.cxx and have 
> > > no changes to anything else. (if you need those changes, it should be a separate 
> > > pull request).
> > > 
> > > Also I am curious what benefits and drawbacks of Postgres vs mysql/mariadb you have 
> > > observed for storing and using midas history data.
> > > 
> > > K.O.
  2557   18 Jul 2023 Konstantin OlchanskiBug Reportaccess to filesystem through mhttpd
> (e.g. http://midas.host:8080/etc/passwd)

not again! I complained about this before, and I added a fix, but it must be broken again.

getting a copy of /etc/passwd is reasonably benign, but getting a copy of 
/home/$USER/.ssh/id_rsa, id_rsa.pub, knownhosts and authorized_keys is a disaster.

(running mhttpd behind a web proxy does not solve the problem, number of attackers is 
reduced to only the people who know the proxy password and to local users).

K.O.
  2559   21 Jul 2023 Konstantin OlchanskiForumpull request for PostgreSQL support
> > is there any news regarding this pull request ? 
> > (https://bitbucket.org/tmidas/midas/pull-requests/30)
> 
> apologies for taking a very long time to review the proposed changes.
> 

I merged the PgSql bits by hand - the automatic tools make a dog's breakfast from the history_schema.cxx diffs. Ouch.

history_schema.cxx merged pretty much cleanly, but I have one question about CreateSqlColumn() with sql_strict set to "true". Can you say 
more why this is needed? Should this also be made the default for MySQL? The best I can tell the default values are only needed if we write 
to SQL but forget to provide values that should not be NULL? But our code never does this? Or this is for reading from SQL, where NULL values 
are replaced with the default values? I do not have time to look into this right now, I hope you can clarify it for me?

Also notice the fDownsample is set to zero and cannot be changed. I recommend we set it through the MakeMidasHistoryPgsql() factory method.

Please pull, merge, retest, update the pull request, check that there is no unrelated changes (changes in mlogger.cxx is a direct red flag!) 
and we should be able to merge the rest of your stuff pronto.

K.O.

commit e85bb6d37c85f02fc4895cae340ba71ab36de906 (HEAD -> develop, origin/develop, origin/HEAD)
Author: Konstantin Olchanski <olchansk@triumf.ca>
Date:   Fri Jul 21 09:45:08 2023 -0700

    merge PQSQL history in history_schema.cxx

commit f254ebd60a23c6ee2d4870f3b6b5e8e95a8f1f09
Author: Konstantin Olchanski <olchansk@triumf.ca>
Date:   Fri Jul 21 09:19:07 2023 -0700

    add PGSQL Makefile bits

commit aa5a35ba221c6f87ae7a811236881499e3d8dcf7
Author: Konstantin Olchanski <olchansk@triumf.ca>
Date:   Fri Jul 21 08:51:23 2023 -0700

    merge PGSQL support from https://bitbucket.org/gtortone/midas/branch/feature/timescaledb_support except for history_schema.cxx
  2567   02 Aug 2023 Konstantin OlchanskiForumIssues with Universe II Driver
I maintain the tsi148 and the universe-II drivers. I confirm -KO6 is my latest 
version, last updated for 32-bit Debian-11, and we still use it at TRIUMF.

It is good news that the vme_universe kernel module built, loaded and reported 
correct stuff to dmesg.

It is not clear why mvme_read_value() crashed. We need to know the value of 
vme_addr and addr, can you add printf()s for them using format "%08x" and try 
again?

K.O.

<p>&nbsp;</p>

<table align="center" cellspacing="1" style="border:1px solid #486090; 
width:98%">
	<tbody>
		<tr>
			<td style="background-color:#486090">Caleb Marshall 
wrote:</td>
		</tr>
		<tr>
			<td style="background-color:#FFFFB0">
			<p>Hello,</p>

			<p>At our lab we are currently in the process of 
migrating more of our systems over to Midas. However, all of our working systems 
are dependent on SBCs with the Tsi-148 chips of which we only have a handful. In 
order to have some backups and spares for testing, we have been attempting to 
get Midas working with some borrowed SBCs (Concurrent Technologies VX 40x/04x) 
with Universe-II chips. The SBC is running&nbsp;CentOS 7.&nbsp;I have tried to 
follow the instructions posted <a 
href="https://daq00.triumf.ca/DaqWiki/index.php/VME-
CPU#V7648_and_V7750_BIOS_Settings">here</a>. The universe-II kernel module 
appears to load correctly, dmesg gives:</p>

			<p>[ &nbsp; 32.384826] VME: Board is system 
controller<br />
			[ &nbsp; 32.384875] VME: Driver compiled for SMP 
system<br />
			[ &nbsp; 32.384877] VME: Installed VME Universe module 
version: 3.6.KO6<br />
			&nbsp;</p>

			<p>However, running vmescan.exe fails with a segfault. 
Running with gdb shows:</p>

			<p>vmic_mmap: Mapped VME AM 0x0d addr 0x00000000 size 
0x00ffffff at address 0x80a01000<br />
			mvme_open:<br />
			Bus handle &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 
&nbsp;= 0x7<br />
			DMA handle &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 
&nbsp;= 0x6045d0<br />
			DMA area size &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; = 
1048576 bytes<br />
			DMA &nbsp; &nbsp;physical address = 0x7ffff7eea000<br />
			vmic_mmap: Mapped VME AM 0x2d addr 0x00000000 size 
0x0000ffff at address 0x86ff0000</p>

			<p>Program received signal SIGSEGV, Segmentation fault.
<br />
			mvme_read_value (mvme=0x604010, vme_addr=&lt;optimized 
out&gt;)<br />
			&nbsp; &nbsp; at 
/home/jam/midas/packages/midas/drivers/vme/vmic/vmicvme.c:352<br />
			352&nbsp;&nbsp; &nbsp; &nbsp; &nbsp;dst &nbsp;= *((WORD 
*)addr);<br />
			&nbsp;</p>

			<p>With the pointer addr originating from a call 
to&nbsp;vmic_mapcheck within the&nbsp;&nbsp;mvme_read_value functions in the 
vmicvme.c file. Help with where to go from here would be appreciated.</p>

			<p>-Caleb&nbsp;</p>

			<p>&nbsp;</p>
			</td>
		</tr>
	</tbody>
</table>

<p>&nbsp;</p>
  2568   02 Aug 2023 Konstantin OlchanskiBug Reportexcessive logging of http requests
> > Our default configuration of apache httpd logs every request. MIDAS custom web pages can easily make a huge number of RPC calls creating a 
> > huge log file and filling system disk to 100% capacity
> perhaps use existing logrotate, add limit on file size (size) and limit of 2 old log files (rotate).

logrotate was ineffective.

following apache httpd config seems to disable logging of mjsonrpc requests. note that we cannot filter on the "mjsonrpc" string because 
Request_URI excludes the query string (ouch!).

#SetEnvIf Request_URI "^POST /?mjsonrpc.*" nolog 
SetEnvIf Request_Method "POST" envpost 
SetEnvIf Request_URI "^\/$" envuri 
SetEnvIfExpr "-T reqenv('envpost') && -T reqenv('envuri')" envnolog 
 
CustomLog logs/ssl_request_log "%t %h %{SSL_PROTOCOL}x %{SSL_CIPHER}x \"%r\" %b" env=!envnolog 

K.O.
  2571   03 Aug 2023 Konstantin OlchanskiForumIssues with Universe II Driver
> Here is the output:
> 
> vmic_mmap: Mapped VME AM 0x0d addr 0x00000000 size 0x00ffffff at address 0x80a01000
> mvme_open:
> Bus handle              = 0x3
> DMA handle              = 0x158f5d0
> DMA area size           = 1048576 bytes
> DMA    physical address = 0x7f91db553000
> vmic_mmap: Mapped VME AM 0x2d addr 0x00000000 size 0x0000ffff at address 0x86ff0000
> vme addr: 00000000 
> addr: db543000 

I see the problem. A24 is mapped at 0x80xxxxxx, A16 is mapped at 0x86ffxxxx, but 
mvme_read computed address 0xdb543000, out of range of either mapped vme address. ouch.

One more thing to check, AFAIK, this universe-II codes were never used on 64-bit CPU 
before, we only have 32-bit Pentium-3 and Pentium-4 machines with these chips. The 
tsi148 codes used to work both 32-bit and 64-bit, we used to have both flavours of 
CPUs, but now only have 64-bit.

What is your output for "uname -a"? does it report 32-bit or 64-bit kernel?

If you feel adventurous, you can build 32-bit midas (cd .../midas; make linux32), 
compile vmescan.o with "-m32" and link vmescan.exe against .../midas/linux-m32/lib, and 
see if that works. Meanwhile, I can check if vmicvme.c is 64-bit clean. Checking if 
kernel module is 64-bit clean would be more difficult...

K.O.
  2573   03 Aug 2023 Konstantin OlchanskiBug Reportexcessive logging of http requests
> > > Our default configuration of apache httpd logs every request. MIDAS custom web pages can easily make a huge number of RPC calls creating a 
> > > huge log file and filling system disk to 100% capacity
> > perhaps use existing logrotate, add limit on file size (size) and limit of 2 old log files (rotate).
>  
> CustomLog logs/ssl_request_log "%t %h %{SSL_PROTOCOL}x %{SSL_CIPHER}x \"%r\" %b" env=!envnolog 
> 

TransferLog is not conditional and has to be commented out to stop logging every jsonrpc request.

K.O.
  2575   04 Aug 2023 Konstantin OlchanskiForumIssues with Universe II Driver
> I can compile 32 bit midas. Unless I am interpreting the linking error, I don't 
> think I can use the driver as built.

I think you are right, Makefile from the Universe package does not build a -m32 version 
of libvme.so. I think I can fix that...

K.O.
  2576   09 Aug 2023 Konstantin OlchanskiForumpull request for PostgreSQL support
> The compilation of midas was broken by the last modification. The reason is that 
>    Pgsql *fPgsql = NULL;
> was not protected by #ifdef HAVE_PGSQL

confirmed, my mistake, I forgot to test with "make cmake NO_PGSQL". your fix is correct, thanks.

K.O.
  2577   09 Aug 2023 Konstantin OlchanskiBug ReportError accessing history files
I confirm I see same on the agmini system. Two problems: (a) error message is wrong, it's a 
short read, not a read error (clue: read() syscall does not return "no such file"). (b) 
mlogger is supposed to write history in record-size blocks, read in the same record size 
blocks. UNIX file semantics require that both reader and writer see read() and write() as 
atomic, even on NFS, so mhttpd should never see partially written history records. I can 
debug this on the agmini system. Probably should.

Problem (a) fixed in commit bb423c8680cc67220312534403840442868f2b3b, if you update, you 
should see error messages about "short read" and the read sizes it reports are very 
interesting, please put them in the elog here.

K.O.


> We sporadically (like once per few hours) have an error message when we access the 
> history plots through mhttpd:
> 
> 07:21:35.109 2023/08/03 [mhttpd,ERROR] 
> [history_schema.cxx:2345:FileHistory::read_data,ERROR] Cannot read 
> '/data2/history/mhf_1690890685_20230801_dc_hv.dat', read() errno 2 (No such file 
> or directory)
> 
> When I log in to the machine, I properly see the file and also can access it
> 
> [meg@megon02 history]$ ls -l mhf_1690890685_20230801_dc_hv.dat
> -rw-rw-r--. 1 meg meg 34176312 Aug  3 07:23 mhf_1690890685_20230801_dc_hv.dat
> 
> and I also can dump that file. 
> 
> When I try again with mhttpd, I properly see that file. 
> 
> Now in principle this is not a problem, but the error message is annoying, since this 
> is the only error we get in 24 hours. I attached a 24h log to see what I mean. If this 
> is an OS issue, I wonder if we should add code to retry the file access in case we get 
> that error.
> 
> Anybody seen a similar thing?
> 
> Best,
> Stefan
  2578   09 Aug 2023 Konstantin OlchanskiSuggestionMaximum ODB size
> > RFE filed:
> > https://bitbucket.org/tmidas/midas/issues/367/odb-should-be-saved-to-disk-periodically
> 
> Implemented and closed: https://bitbucket.org/tmidas/midas/issues/367/odb-should-be-saved-to-disk-periodically
> 
> Stefan

Stefan's comments from the closed bug report:

Ok I implemented some periodic flushing. Here is what I did:

Created

/System/Flush/Flush period : TID_UINT32 /System/Flush/Last flush : TID_UINT32

which control the flushing to disk. The default value for “Flush period” is 60 seconds or one minute.

All clients call db_flush_database() through their cm_yield() function
db_flush_database() checks the “Last flush” and only flushes the ODB when the period has expired. This test is 
done inside the ODB semaphore so that we don’t get a race condigiton
If the period has expired, db_flush_database() calls ss_shm_flush()
ss_shm_flush() tries to allocate a buffer of the shared memory. If the allocation is not successful (out of 
memory), ss_shm_flush() writes directly to the binary file as before.
If the allocation is successful, ss_shm_flush() copies the share memory to a buffer and passes this buffer to a 
dedicated thread which writes the buffer to the binary file. This causes ss_shm_flush() to return immediately and 
not block the calling program during the disk write operation.
Added back the “if (destroy_flag) ss_shm_flush()” so that the ODB is flushed for sure before the shared memory 
gets deleted.
This means now that under normal circumstances, exiting programs like odbedit do NOT flush the ODB. This allows to 
call many “odbedit -c” in a row without the flush penalty. Nevertheless, the ODB then gets flushed by other 
clients latest 60 seconds (or whatever the flush period is) after odbedit exits.

Please note that ODB flushing has two purposes:

When all programs exit, we need a persistent storage for the ODB. In most experiments this only happens very 
seldom. Maybe at the end of a beam time period.
If the computer crashes, a recent version of the ODB is kept on disk to simplify recovery after the crash.
Since crashes are not so often (during production periods we have maybe one hardware failure every few years) the 
flushing of the ODB too often does not make sense and just consumes resources. Flushing does also not help from 
corrupted ODBs, since the binary image will also get corrupted. So the only reason for periodic flushes is to ease 
recovery after a total crash. I put the default to 60 seconds, but if people are really paranoid they can decrease 
it to 10 seconds or so. Or increase it to 600 seconds if their system does not crash every week and disks are 
slow.

I made a dedicated branch feature/periodic_odb_flush so people can test the new functionality. If there are no 
complaints within the next few days, I will merge that into develop.

Stefan
  2580   09 Aug 2023 Konstantin OlchanskiBug FixStefan's improved ODB flush to disk
This is an important improvement, should have a post of it's own. K.O.

> > > RFE filed:
> > > https://bitbucket.org/tmidas/midas/issues/367/odb-should-be-saved-to-disk-
periodically
> > 
> > Implemented and closed: https://bitbucket.org/tmidas/midas/issues/367/odb-
should-be-saved-to-disk-periodically
> > 
> > Stefan
> 
> Stefan's comments from the closed bug report:
> 
> Ok I implemented some periodic flushing. Here is what I did:
> 
> Created
> 
> /System/Flush/Flush period : TID_UINT32 /System/Flush/Last flush : TID_UINT32
> 
> which control the flushing to disk. The default value for “Flush period” is 60 
seconds or one minute.
> 
> All clients call db_flush_database() through their cm_yield() function
> db_flush_database() checks the “Last flush” and only flushes the ODB when the 
period has expired. This test is 
> done inside the ODB semaphore so that we don’t get a race condigiton
> If the period has expired, db_flush_database() calls ss_shm_flush()
> ss_shm_flush() tries to allocate a buffer of the shared memory. If the 
allocation is not successful (out of 
> memory), ss_shm_flush() writes directly to the binary file as before.
> If the allocation is successful, ss_shm_flush() copies the share memory to a 
buffer and passes this buffer to a 
> dedicated thread which writes the buffer to the binary file. This causes 
ss_shm_flush() to return immediately and 
> not block the calling program during the disk write operation.
> Added back the “if (destroy_flag) ss_shm_flush()” so that the ODB is flushed 
for sure before the shared memory 
> gets deleted.
> This means now that under normal circumstances, exiting programs like odbedit 
do NOT flush the ODB. This allows to 
> call many “odbedit -c” in a row without the flush penalty. Nevertheless, the 
ODB then gets flushed by other 
> clients latest 60 seconds (or whatever the flush period is) after odbedit 
exits.
> 
> Please note that ODB flushing has two purposes:
> 
> When all programs exit, we need a persistent storage for the ODB. In most 
experiments this only happens very 
> seldom. Maybe at the end of a beam time period.
> If the computer crashes, a recent version of the ODB is kept on disk to 
simplify recovery after the crash.
> Since crashes are not so often (during production periods we have maybe one 
hardware failure every few years) the 
> flushing of the ODB too often does not make sense and just consumes resources. 
Flushing does also not help from 
> corrupted ODBs, since the binary image will also get corrupted. So the only 
reason for periodic flushes is to ease 
> recovery after a total crash. I put the default to 60 seconds, but if people 
are really paranoid they can decrease 
> it to 10 seconds or so. Or increase it to 600 seconds if their system does not 
crash every week and disks are 
> slow.
> 
> I made a dedicated branch feature/periodic_odb_flush so people can test the 
new functionality. If there are no 
> complaints within the next few days, I will merge that into develop.
> 
> Stefan
  2581   14 Aug 2023 Konstantin OlchanskiBug Reportexcessive logging of http requests
> Our default configuration of apache httpd logs every request.
> MIDAS custom web pages can easily make a huge number of RPC calls creating a 
> huge log file and filling system disk to 100% capacity.

close but no cigar. mhttpd is not running and /var/log got filled to 100% capacity by http error messages. I do not see any apache facility to filter 
error messages, hmm...

-rw-r--r-- 1 root root 1864421376 Aug 14 12:53 ssl_error_log

[Sun Aug 13 23:53:12.416247 2023] [proxy:error] [pid 18608] AH00940: HTTP: disabled connection for (localhost)
[Sun Aug 13 23:53:12.416538 2023] [proxy:error] [pid 19686] AH00940: HTTP: disabled connection for (localhost)
[Sun Aug 13 23:53:12.416603 2023] [proxy:error] [pid 19681] AH00940: HTTP: disabled connection for (localhost)
[Sun Aug 13 23:53:12.416775 2023] [proxy:error] [pid 19588] AH00940: HTTP: disabled connection for (localhost)
[Sun Aug 13 23:53:12.417022 2023] [proxy:error] [pid 19311] AH00940: HTTP: disabled connection for (localhost)
[Sun Aug 13 23:53:12.421864 2023] [proxy:error] [pid 18620] AH00940: HTTP: disabled connection for (localhost)
[Sun Aug 13 23:53:12.422051 2023] [proxy:error] [pid 19693] AH00940: HTTP: disabled connection for (localhost)
[Sun Aug 13 23:53:12.422199 2023] [proxy:error] [pid 19673] AH00940: HTTP: disabled connection for (localhost)
[Sun Aug 13 23:53:12.422222 2023] [proxy:error] [pid 18608] AH00940: HTTP: disabled connection for (localhost)
[Sun Aug 13 23:53:12.422230 2023] [proxy:error] [pid 19657] AH00940: HTTP: disabled connection for (localhost)
[Sun Aug 13 23:53:12.422259 2023] [proxy:error] [pid 18633] AH00940: HTTP: disabled connection for (localhost)
[Sun Aug 13 23:53:12.427513 2023] [proxy:error] [pid 19686] AH00940: HTTP: disabled connection for (localhost)
[Sun Aug 13 23:53:12.427549 2023] [proxy:error] [pid 19681] AH00940: HTTP: disabled connection for (localhost)
[Sun Aug 13 23:53:12.427645 2023] [proxy:error] [pid 19588] AH00940: HTTP: disabled connection for (localhost)
[Sun Aug 13 23:53:12.427774 2023] [proxy:error] [pid 19693] AH00940: HTTP: disabled connection for (localhost)
[Sun Aug 13 23:53:12.427800 2023] [proxy:error] [pid 18620] AH00940: HTTP: disabled connection for (localhost)

K.O.
  2582   15 Aug 2023 Konstantin OlchanskiInfomlogger update
A bit of update to the mlogger. In preparation for more cleanup when Stefan is 
here at TRIUMF.

1) fix overwrite of existing files if run number is reset (check for existing 
files was missing in the LZ4, BZ2 & co data path)
2) made output files read-only (midas, json and checksum files)
3) commented out the old code paths

Currently active per-channel ODB settings:

Active - enable or disable mlogger channel
Type - NOT USED
Filename - output filename template, %d are replaced by run number and subrun 
number, also pipe command for PIPE output
Format - NOT USED
Compression - NOT USED
ODB dump - enable/disable writing ODB dump to data file
ODB dump format - "json" is recommended for new experiments
Log messages - write log messages to output file, 0=off, -1=write all messages
Buffer - "SYSTEM" read events from this event buffer
EventID - "-1" for all events
Trigger Mask - "-1" for all events
Event Limit - stop run after so many events
Byte Limit - stop run after so many bytes
Subrun Byte limit - switch to next subrun file after writing so many bytes. 
actual file size is longer than subrun_byte_limit because of ODB dumps.
Tape Capacity - NOT USED
Subdir Format - if not empty, output file name is DIR/SUBDIR/FILENAME, "%" 
format things are expanded by strftime().
Current Filename - updated by mlogger, contains the currently written file name
Data checksum - checksum before compression, use CRC32C for maximum speed, 
SHA512 for maximum security.
File checksum - checksum after compression, CRC32C is good against accidental 
file corruption, SHA512 is cryptographically strong, good against purposeful 
tampering.
Compress - use "lz4" for maximum speed, bzip2 or pbzip2 for maximum compression. 
no compression and gzip are not recommended. (ZFS may apply lz4 compression to 
uncompressed data).
Output - "NULL" do not write anything, "FILE" write to disk, "FTP" write to FTP 
server, "ROOT" write via the mlogger ROOT writer (docs?), "PIPE" pipe data 
through an external command (i.e. for bzip2 compression).
Gzip compression - gzip compression flags (see gzip docs, 1=max speed, 9=max 
compression)
Bzip2 compression - if non-zero, bzip2 compression level (see "bzip2 -h", 1=max 
speed, 9=max compression)
Pbzip2 num cpu - number of CPUs used by parallel bzip2 compression, pbzip2 -p 
flag
Pbzip2 compression - if non-zero, pbzip2 compresison level (see "pbzip2 -h", 
default is 9=max compression)
Pbzip2 options - any additional pbzip2 options, i.e. -l, -m, -p, etc.

Currently active /Logger options:

Data Dir - where to write all output files, if empty, cm_get_path() is used.
Message file date format - not used in mlogger
Message dir - not used in mlogger
Write data - if set to "no", midas file, runlog, etc will not be written.
ODB Dump - at run stop, save odb to disk
ODB Dump File - file name for "ODB Dump" save file. "%d" is replaced by run 
number. "json" format is recommended for new experiments.
ODB Last Dump File - at run start, save ODB to disk. "json" format is 
recommended for new experiments.
Auto restart - run stopped by time limit or event limit is automatically 
restarted
Auto restart delay - wair for some many seconds before restarting the run
Tape message - NOT USED
Run duration - stop the run after so many seconds
Next subrun - change from "no" to "yes" to force mlogger to open a new subrun 
file (should this be per-channel?)
Subrun duration - open new subrun file after so many seconds (should this be 
per-channel?)
History dir - not used in mlogger
Detached transition - "no" use the normal multithreaded transtions 
(recommended), "yes" use mtransition helper to stop and restart runs. sometimes 
files because mtransition is not in the user $PATH or wrong version of 
mtransition is in the user $PATH.

K.O.
ELOG V3.1.4-2e1708b5