ELOG Midas

Back Midas Rome Roody Rootana

Midas DAQ System, Page 112 of 138

Not logged in

Find | Login | Help

Full | Summary | Threaded | Show attachments

2748 Entries

Goto page Previous 1, 2, 3 ... 111, 112, 113 ... 136, 137, 138 Next

ID	Date	Author	Topic	Subject
2578	09 Aug 2023	Konstantin Olchanski	Suggestion	Maximum ODB size
> > RFE filed: > > https://bitbucket.org/tmidas/midas/issues/367/odb-should-be-saved-to-disk-periodically > > Implemented and closed: https://bitbucket.org/tmidas/midas/issues/367/odb-should-be-saved-to-disk-periodically > > Stefan Stefan's comments from the closed bug report: Ok I implemented some periodic flushing. Here is what I did: Created /System/Flush/Flush period : TID_UINT32 /System/Flush/Last flush : TID_UINT32 which control the flushing to disk. The default value for “Flush period” is 60 seconds or one minute. All clients call db_flush_database() through their cm_yield() function db_flush_database() checks the “Last flush” and only flushes the ODB when the period has expired. This test is done inside the ODB semaphore so that we don’t get a race condigiton If the period has expired, db_flush_database() calls ss_shm_flush() ss_shm_flush() tries to allocate a buffer of the shared memory. If the allocation is not successful (out of memory), ss_shm_flush() writes directly to the binary file as before. If the allocation is successful, ss_shm_flush() copies the share memory to a buffer and passes this buffer to a dedicated thread which writes the buffer to the binary file. This causes ss_shm_flush() to return immediately and not block the calling program during the disk write operation. Added back the “if (destroy_flag) ss_shm_flush()” so that the ODB is flushed for sure before the shared memory gets deleted. This means now that under normal circumstances, exiting programs like odbedit do NOT flush the ODB. This allows to call many “odbedit -c” in a row without the flush penalty. Nevertheless, the ODB then gets flushed by other clients latest 60 seconds (or whatever the flush period is) after odbedit exits. Please note that ODB flushing has two purposes: When all programs exit, we need a persistent storage for the ODB. In most experiments this only happens very seldom. Maybe at the end of a beam time period. If the computer crashes, a recent version of the ODB is kept on disk to simplify recovery after the crash. Since crashes are not so often (during production periods we have maybe one hardware failure every few years) the flushing of the ODB too often does not make sense and just consumes resources. Flushing does also not help from corrupted ODBs, since the binary image will also get corrupted. So the only reason for periodic flushes is to ease recovery after a total crash. I put the default to 60 seconds, but if people are really paranoid they can decrease it to 10 seconds or so. Or increase it to 600 seconds if their system does not crash every week and disks are slow. I made a dedicated branch feature/periodic_odb_flush so people can test the new functionality. If there are no complaints within the next few days, I will merge that into develop. Stefan
2580	09 Aug 2023	Konstantin Olchanski	Bug Fix	Stefan's improved ODB flush to disk
This is an important improvement, should have a post of it's own. K.O. > > > RFE filed: > > > https://bitbucket.org/tmidas/midas/issues/367/odb-should-be-saved-to-disk- periodically > > > > Implemented and closed: https://bitbucket.org/tmidas/midas/issues/367/odb- should-be-saved-to-disk-periodically > > > > Stefan > > Stefan's comments from the closed bug report: > > Ok I implemented some periodic flushing. Here is what I did: > > Created > > /System/Flush/Flush period : TID_UINT32 /System/Flush/Last flush : TID_UINT32 > > which control the flushing to disk. The default value for “Flush period” is 60 seconds or one minute. > > All clients call db_flush_database() through their cm_yield() function > db_flush_database() checks the “Last flush” and only flushes the ODB when the period has expired. This test is > done inside the ODB semaphore so that we don’t get a race condigiton > If the period has expired, db_flush_database() calls ss_shm_flush() > ss_shm_flush() tries to allocate a buffer of the shared memory. If the allocation is not successful (out of > memory), ss_shm_flush() writes directly to the binary file as before. > If the allocation is successful, ss_shm_flush() copies the share memory to a buffer and passes this buffer to a > dedicated thread which writes the buffer to the binary file. This causes ss_shm_flush() to return immediately and > not block the calling program during the disk write operation. > Added back the “if (destroy_flag) ss_shm_flush()” so that the ODB is flushed for sure before the shared memory > gets deleted. > This means now that under normal circumstances, exiting programs like odbedit do NOT flush the ODB. This allows to > call many “odbedit -c” in a row without the flush penalty. Nevertheless, the ODB then gets flushed by other > clients latest 60 seconds (or whatever the flush period is) after odbedit exits. > > Please note that ODB flushing has two purposes: > > When all programs exit, we need a persistent storage for the ODB. In most experiments this only happens very > seldom. Maybe at the end of a beam time period. > If the computer crashes, a recent version of the ODB is kept on disk to simplify recovery after the crash. > Since crashes are not so often (during production periods we have maybe one hardware failure every few years) the > flushing of the ODB too often does not make sense and just consumes resources. Flushing does also not help from > corrupted ODBs, since the binary image will also get corrupted. So the only reason for periodic flushes is to ease > recovery after a total crash. I put the default to 60 seconds, but if people are really paranoid they can decrease > it to 10 seconds or so. Or increase it to 600 seconds if their system does not crash every week and disks are > slow. > > I made a dedicated branch feature/periodic_odb_flush so people can test the new functionality. If there are no > complaints within the next few days, I will merge that into develop. > > Stefan
2581	14 Aug 2023	Konstantin Olchanski	Bug Report	excessive logging of http requests
> Our default configuration of apache httpd logs every request. > MIDAS custom web pages can easily make a huge number of RPC calls creating a > huge log file and filling system disk to 100% capacity. close but no cigar. mhttpd is not running and /var/log got filled to 100% capacity by http error messages. I do not see any apache facility to filter error messages, hmm... -rw-r--r-- 1 root root 1864421376 Aug 14 12:53 ssl_error_log [Sun Aug 13 23:53:12.416247 2023] [proxy:error] [pid 18608] AH00940: HTTP: disabled connection for (localhost) [Sun Aug 13 23:53:12.416538 2023] [proxy:error] [pid 19686] AH00940: HTTP: disabled connection for (localhost) [Sun Aug 13 23:53:12.416603 2023] [proxy:error] [pid 19681] AH00940: HTTP: disabled connection for (localhost) [Sun Aug 13 23:53:12.416775 2023] [proxy:error] [pid 19588] AH00940: HTTP: disabled connection for (localhost) [Sun Aug 13 23:53:12.417022 2023] [proxy:error] [pid 19311] AH00940: HTTP: disabled connection for (localhost) [Sun Aug 13 23:53:12.421864 2023] [proxy:error] [pid 18620] AH00940: HTTP: disabled connection for (localhost) [Sun Aug 13 23:53:12.422051 2023] [proxy:error] [pid 19693] AH00940: HTTP: disabled connection for (localhost) [Sun Aug 13 23:53:12.422199 2023] [proxy:error] [pid 19673] AH00940: HTTP: disabled connection for (localhost) [Sun Aug 13 23:53:12.422222 2023] [proxy:error] [pid 18608] AH00940: HTTP: disabled connection for (localhost) [Sun Aug 13 23:53:12.422230 2023] [proxy:error] [pid 19657] AH00940: HTTP: disabled connection for (localhost) [Sun Aug 13 23:53:12.422259 2023] [proxy:error] [pid 18633] AH00940: HTTP: disabled connection for (localhost) [Sun Aug 13 23:53:12.427513 2023] [proxy:error] [pid 19686] AH00940: HTTP: disabled connection for (localhost) [Sun Aug 13 23:53:12.427549 2023] [proxy:error] [pid 19681] AH00940: HTTP: disabled connection for (localhost) [Sun Aug 13 23:53:12.427645 2023] [proxy:error] [pid 19588] AH00940: HTTP: disabled connection for (localhost) [Sun Aug 13 23:53:12.427774 2023] [proxy:error] [pid 19693] AH00940: HTTP: disabled connection for (localhost) [Sun Aug 13 23:53:12.427800 2023] [proxy:error] [pid 18620] AH00940: HTTP: disabled connection for (localhost) K.O.
2582	15 Aug 2023	Konstantin Olchanski	Info	mlogger update
A bit of update to the mlogger. In preparation for more cleanup when Stefan is here at TRIUMF. 1) fix overwrite of existing files if run number is reset (check for existing files was missing in the LZ4, BZ2 & co data path) 2) made output files read-only (midas, json and checksum files) 3) commented out the old code paths Currently active per-channel ODB settings: Active - enable or disable mlogger channel Type - NOT USED Filename - output filename template, %d are replaced by run number and subrun number, also pipe command for PIPE output Format - NOT USED Compression - NOT USED ODB dump - enable/disable writing ODB dump to data file ODB dump format - "json" is recommended for new experiments Log messages - write log messages to output file, 0=off, -1=write all messages Buffer - "SYSTEM" read events from this event buffer EventID - "-1" for all events Trigger Mask - "-1" for all events Event Limit - stop run after so many events Byte Limit - stop run after so many bytes Subrun Byte limit - switch to next subrun file after writing so many bytes. actual file size is longer than subrun_byte_limit because of ODB dumps. Tape Capacity - NOT USED Subdir Format - if not empty, output file name is DIR/SUBDIR/FILENAME, "%" format things are expanded by strftime(). Current Filename - updated by mlogger, contains the currently written file name Data checksum - checksum before compression, use CRC32C for maximum speed, SHA512 for maximum security. File checksum - checksum after compression, CRC32C is good against accidental file corruption, SHA512 is cryptographically strong, good against purposeful tampering. Compress - use "lz4" for maximum speed, bzip2 or pbzip2 for maximum compression. no compression and gzip are not recommended. (ZFS may apply lz4 compression to uncompressed data). Output - "NULL" do not write anything, "FILE" write to disk, "FTP" write to FTP server, "ROOT" write via the mlogger ROOT writer (docs?), "PIPE" pipe data through an external command (i.e. for bzip2 compression). Gzip compression - gzip compression flags (see gzip docs, 1=max speed, 9=max compression) Bzip2 compression - if non-zero, bzip2 compression level (see "bzip2 -h", 1=max speed, 9=max compression) Pbzip2 num cpu - number of CPUs used by parallel bzip2 compression, pbzip2 -p flag Pbzip2 compression - if non-zero, pbzip2 compresison level (see "pbzip2 -h", default is 9=max compression) Pbzip2 options - any additional pbzip2 options, i.e. -l, -m, -p, etc. Currently active /Logger options: Data Dir - where to write all output files, if empty, cm_get_path() is used. Message file date format - not used in mlogger Message dir - not used in mlogger Write data - if set to "no", midas file, runlog, etc will not be written. ODB Dump - at run stop, save odb to disk ODB Dump File - file name for "ODB Dump" save file. "%d" is replaced by run number. "json" format is recommended for new experiments. ODB Last Dump File - at run start, save ODB to disk. "json" format is recommended for new experiments. Auto restart - run stopped by time limit or event limit is automatically restarted Auto restart delay - wair for some many seconds before restarting the run Tape message - NOT USED Run duration - stop the run after so many seconds Next subrun - change from "no" to "yes" to force mlogger to open a new subrun file (should this be per-channel?) Subrun duration - open new subrun file after so many seconds (should this be per-channel?) History dir - not used in mlogger Detached transition - "no" use the normal multithreaded transtions (recommended), "yes" use mtransition helper to stop and restart runs. sometimes files because mtransition is not in the user $PATH or wrong version of mtransition is in the user $PATH. K.O.
2583	16 Aug 2023	Konstantin Olchanski	Bug Report	midas wants to show notification?
I started to get web browser popups about "midas wants to show notifications, block/allow/x". is this a glitch or a new unannounced/undocumented feature? google chrome on macos. K.O.
2586	16 Aug 2023	Konstantin Olchanski	Bug Report	midas wants to show notification?
> This feature was asked by some people ... "show notifications" popups are strongly associated with disreputable web sites (presumably to push spam), it was surprising to see it from midas. K.O.
2587	16 Aug 2023	Konstantin Olchanski	Bug Report	excessive logging of http requests
> > Our default configuration of apache httpd logs every request. > > MIDAS custom web pages can easily make a huge number of RPC calls creating a > > huge log file and filling system disk to 100% capacity. added "daily" to /etc/logrotate.d/httpd, default was "weekly", not often enough. K.O.
2591	17 Aug 2023	Konstantin Olchanski	Bug Report	Error accessing history files
Confirmed. The error message is wrong. It is printed after a short read(), but short read() does not set errno, and errno reported by the error message is from some previous syscall. Corrected error message is already committed. K.O. > Tonight we got another error of that type after the update: > > 04:17 - [mhttpd,ERROR] [history_schema.cxx:2913:FileHistory::read_data,ERROR] Cannot read > '/data2/history/mhf_1692128214_20230815_gassystem.dat', read() errno 2 (No such file or directory) > > This morning I looked at the file, and it was there: > > [meg@megon02 history]$ ls -alg mhf_1692128214_20230815_gassystem.dat > -rw-rw-r--. 1 meg 4663228 Aug 17 08:50 mhf_1692128214_20230815_gassystem.dat > [meg@megon02 history]$ > > > Stefan
2592	17 Aug 2023	Konstantin Olchanski	Bug Report	excessive logging of http requests
> > > Our default configuration of apache httpd logs every request. > > > MIDAS custom web pages can easily make a huge number of RPC calls creating a > > > huge log file and filling system disk to 100% capacity. > added "daily" to /etc/logrotate.d/httpd, default was "weekly", not often enough. this should fix it good, make /var/log bigger: [root@mpmt-test ~]# df Filesystem 1K-blocks Used Available Use% Mounted on /dev/sdc2 52403200 52296356 106844 100% / [root@mpmt-test ~]# [root@mpmt-test ~]# xfs_growfs / data blocks changed from 13107200 to 106367750 [root@mpmt-test ~]# [root@mpmt-test ~]# df Filesystem 1K-blocks Used Available Use% Mounted on /dev/sdc2 425445400 52300264 373145136 13% / K.O.
2602	14 Sep 2023	Konstantin Olchanski	Forum	Hide start and stop buttons
I believe the original "hide run start / stop" was added specifically for ND280 GSC MIDAS. I do not know why it was removed. "hide pause / resume" is still there. I will restore them. Hiding logger channel section should probably be automatic of there is no /logger/channels, I can check if it works and what happens if there is more than one logger channel. K.O.
2614	03 Oct 2023	Konstantin Olchanski	Bug Fix	wrong array size after loading xml or json file
both the xml and the json decoders have a bug (fix pending). loading saved odb from xml and json file did not truncate arrays in odb to the size of arrays in the file. for example, if /example/double_array has size 20 in odb, but size 5 in xml or json file, after loading the file, array size is still 20. this is unexpected: after loading an odb save file we expect odb to return to same state as when odb save file was created. we do not expect some arrays to have half of their elements restored from file and half their elements left unchanged. save and restore from .odb file does not have this problem. I think this is a bug and I committed (but did not yet push) a fix for both xml and json odb decoder. I have run this problem while writing the new history panel editor, where deleting variables did not work because json rpc db_paste() was not truncating any arrays. I am still finishing up the last few bits of the new history panel editor, and there is a bit of time to discuss and comment this odb change before I push it to midas. K.O.
2615	06 Oct 2023	Konstantin Olchanski	Bug Report	Error accessing history files
> Still get the same error with the latest version: > 3:28 [mhttpd,ERROR] [history_schema.cxx:2913:FileHistory::read_data,ERROR] Cannot read > '/data2/history/mhf_1692391703_20230818_hv_tc.dat', read() errno 2 (No such file or directory) I figured it out. I claim defense of temporary insanity and old age senility. 1) I added the "short read" check in one place, missed the second place 2) writes of history were meant to be atomic, and they are atomic in my head, but not in the midas code: history_schema.cxx:HsFileSchema::write_event() ... status = write(s->writer_fd, &t, 4); if (status != 4) { cm_msg(MERROR, "FileHistory::write_event", "Cannot write to \'%s\', write(timestamp) errno %d (%s)", s->file_name.c_str(), errno, strerror(errno)); return HS_FILE_ERROR; } status = write(s->writer_fd, data, expected_size); if (status != expected_size) { cm_msg(MERROR, "FileHistory::write_event", "Cannot write to \'%s\', write(%d) errno %d (%s)", s->file_name.c_str(), data_size, errno, strerror(errno)); return HS_FILE_ERROR; } ... that's not atomic, that's two separate writes. history reader hits the history file between the two writes and gets a short read of 4 bytes timestamp instead of full record size. that's the error message reported by mhttpd. two fixes forthcoming: a) check for short read in the 2nd place that I missed b) two write() are replaced by 2 memcpy() to a preallocated buffer and 1 write() Overall, I am pretty happy that this is the only bug in the FILE history code found in N years, and it does not even cause data corruption... K.O.
2616	06 Oct 2023	Konstantin Olchanski	Bug Report	Error accessing history files
> two fixes forthcoming: > a) check for short read in the 2nd place that I missed > b) two write() are replaced by 2 memcpy() to a preallocated buffer and 1 write() commit 713ec4a583365d57ffcd700ceeb09dcc14518295 K.O.
2617	06 Oct 2023	Konstantin Olchanski	Info	default midas history switched to "FILE" and "PerVariable" history
We are very happy with the "FILE" implementation of MIDAS history and it is time to make it the default for new experiments. This history driver works best if "per variable" history is alos enabled. (SQL history already only works in "per- variable" mode). commit 676051b3024965bd8a04da112965a141d5f61a39 K.O.
2618	06 Oct 2023	Konstantin Olchanski	Info	new history panel editor
the new history panel editor has been activated. it is meant to work the same as the old editor, with some improvements to the history variables selection page. this new version is written in html+javascript and it will be easier to improve, update and maintain compared to the old version written in c++. the old history panel editor is still usable and accessible by pressing the "edit in old editor" button. please report any problem, quirks and improvements in this thread or in the bitbucket bug reports. K.O.
2625	13 Nov 2023	Konstantin Olchanski	Forum	mlogger does not HAVE_ROOT
> I am setting up Midas (v2.1) for a new experiment. We want to save the data in the ROOT format. We installed ROOT from source (v6.28/06), and ROOTSYS is set. When we compile Midas, it says that it found ROOT. We set up a second logger channel where we set the filename to run%05d.root, the format to ROOT, and the output to ROOT. Nevertheless, when starting a run, the logger writes the error that "channel '1' requested ROOT output, but mlogger is built without HAVE_ROOT". From the CMake file, I would assume that it is set automatically if ROOT is found. Do you have any idea why the mlogger does not find ROOT or save the data in the ROOT format? when you build midas using "make cmake", it prints information about packages that it finds (or does not). please post this here. it would be even more helpful if you post the whole output of "make cmake" (make cmake >& make.log, post make.log here as attachment). historically, this problem has been a major annoyance over the years, mlogger would not find ROOT when needed, will find the wrong ROOT when not needed or ROOT at run time will be different from ROOT at build time. "cmake" has been of no help in improving on this, only made all debugging more difficult. K.O.
2627	14 Nov 2023	Konstantin Olchanski	Forum	mlogger does not HAVE_ROOT
> Finally, make sure you start "rmlogger" and not "mlogger". Only "rmlogger" contains the ROOT binding. Stefan is right. I forgot this. As solution to our troubles, mlogger is built without root support. use rmlogger instead. K.O.
2661	27 Dec 2023	Konstantin Olchanski	Forum	MidasWiki updated to 1.39.6
MidasWiki was updated to current mediawiki LTS 1.39.6 supported until Nov 2025, see https://www.mediawiki.org/wiki/Version_lifecycle as downside, after this update, I see large amounts of "account request" spam, something that did not exist before. I suspect new mediawiki phones home to subscribe itself to some "please spam me" list. if you want a user account on MidasWiki, please email me or Stefan directly, we will make it happen. K.O.
2662	29 Dec 2023	Konstantin Olchanski	Bug Report	Compilation error on RPi
> git pull > git submodule update confirmed. just run into this myself. I think "make" should warn about out of date git modules. Also check that the build git version is tagged with "-dirty". K.O.
2663	02 Jan 2024	Konstantin Olchanski	Forum	midas.triumf.ca alias moved to daq00.triumf.ca
the DNS alias for midas.triumf.ca moved from old ladd00.triumf.ca to new daq00.triumf.ca. same as before it redirects to the MidasWiki and to the midas forum (elog) that moved from ladd00 to daq00 quite some time ago. if you see any anomalies in accessing them (broken links, bad https certificates), please report them to this forum or to me directly at olchansk@triumf.ca. K.O.

Goto page Previous 1, 2, 3 ... 111, 112, 113 ... 136, 137, 138 Next

ELOG V3.1.4-2e1708b5