ID |
Date |
Author |
Topic |
Subject |
2580
|
09 Aug 2023 |
Konstantin Olchanski | Bug Fix | Stefan's improved ODB flush to disk |
This is an important improvement, should have a post of it's own. K.O.
> > > RFE filed:
> > > https://bitbucket.org/tmidas/midas/issues/367/odb-should-be-saved-to-disk-
periodically
> >
> > Implemented and closed: https://bitbucket.org/tmidas/midas/issues/367/odb-
should-be-saved-to-disk-periodically
> >
> > Stefan
>
> Stefan's comments from the closed bug report:
>
> Ok I implemented some periodic flushing. Here is what I did:
>
> Created
>
> /System/Flush/Flush period : TID_UINT32 /System/Flush/Last flush : TID_UINT32
>
> which control the flushing to disk. The default value for “Flush period” is 60
seconds or one minute.
>
> All clients call db_flush_database() through their cm_yield() function
> db_flush_database() checks the “Last flush” and only flushes the ODB when the
period has expired. This test is
> done inside the ODB semaphore so that we don’t get a race condigiton
> If the period has expired, db_flush_database() calls ss_shm_flush()
> ss_shm_flush() tries to allocate a buffer of the shared memory. If the
allocation is not successful (out of
> memory), ss_shm_flush() writes directly to the binary file as before.
> If the allocation is successful, ss_shm_flush() copies the share memory to a
buffer and passes this buffer to a
> dedicated thread which writes the buffer to the binary file. This causes
ss_shm_flush() to return immediately and
> not block the calling program during the disk write operation.
> Added back the “if (destroy_flag) ss_shm_flush()” so that the ODB is flushed
for sure before the shared memory
> gets deleted.
> This means now that under normal circumstances, exiting programs like odbedit
do NOT flush the ODB. This allows to
> call many “odbedit -c” in a row without the flush penalty. Nevertheless, the
ODB then gets flushed by other
> clients latest 60 seconds (or whatever the flush period is) after odbedit
exits.
>
> Please note that ODB flushing has two purposes:
>
> When all programs exit, we need a persistent storage for the ODB. In most
experiments this only happens very
> seldom. Maybe at the end of a beam time period.
> If the computer crashes, a recent version of the ODB is kept on disk to
simplify recovery after the crash.
> Since crashes are not so often (during production periods we have maybe one
hardware failure every few years) the
> flushing of the ODB too often does not make sense and just consumes resources.
Flushing does also not help from
> corrupted ODBs, since the binary image will also get corrupted. So the only
reason for periodic flushes is to ease
> recovery after a total crash. I put the default to 60 seconds, but if people
are really paranoid they can decrease
> it to 10 seconds or so. Or increase it to 600 seconds if their system does not
crash every week and disks are
> slow.
>
> I made a dedicated branch feature/periodic_odb_flush so people can test the
new functionality. If there are no
> complaints within the next few days, I will merge that into develop.
>
> Stefan |
2614
|
03 Oct 2023 |
Konstantin Olchanski | Bug Fix | wrong array size after loading xml or json file |
both the xml and the json decoders have a bug (fix pending). loading saved odb
from xml and json file did not truncate arrays in odb to the size of arrays in
the file. for example, if /example/double_array has size 20 in odb, but size 5
in xml or json file, after loading the file, array size is still 20.
this is unexpected: after loading an odb save file we expect odb to return to
same state as when odb save file was created. we do not expect some arrays to
have half of their elements restored from file and half their elements left
unchanged.
save and restore from .odb file does not have this problem.
I think this is a bug and I committed (but did not yet push) a fix for both xml
and json odb decoder.
I have run this problem while writing the new history panel editor, where
deleting variables did not work because json rpc db_paste() was not truncating
any arrays.
I am still finishing up the last few bits of the new history panel editor, and
there is a bit of time to discuss and comment this odb change before I push it
to midas.
K.O. |
2703
|
05 Feb 2024 |
Ben Smith | Bug Fix | string --> int64 conversion in the python interface ? |
> The symptoms are consistent with a string --> int64 conversion not happening
> where it is needed.
Thanks for the report Pasha. Indeed I was missing a conversion in one place. Fixed now!
Ben |
2710
|
13 Feb 2024 |
Konstantin Olchanski | Bug Fix | string --> int64 conversion in the python interface ? |
> > The symptoms are consistent with a string --> int64 conversion not happening
> > where it is needed.
>
> Thanks for the report Pasha. Indeed I was missing a conversion in one place. Fixed now!
>
Are we running these tests as part of the nightly build on bitbucket? They would be part of
the "make test" target. Correct python dependancies may need to be added to the bitbucket OS
image in bitbucket-pipelines.yml. (This is a PITA to get right).
K.O. |
2711
|
14 Feb 2024 |
Konstantin Olchanski | Bug Fix | added ubuntu-22 to nightly build on bitbucket, now need python! |
> Are we running these tests as part of the nightly build on bitbucket? They would be part of
> the "make test" target. Correct python dependancies may need to be added to the bitbucket OS
> image in bitbucket-pipelines.yml. (This is a PITA to get right).
I added ubuntu-22 to the nightly builds.
but I notice the build says "no python" and I am not sure what packages I need to install for
midas python to work.
Ben, can you help me with this?
https://bitbucket.org/tmidas/midas/pipelines/results/1106/steps/%7B9ef2cf97-bd9f-4fd3-9ca2-9c6aa5e20828%7D
K.O. |
2792
|
26 Jul 2024 |
Lukas Gerritzen | Bug Fix | strlcpy and strlcat added to glibc 2.38 |
A year ago, these two function were included in glibc. If trying to compile midas with a recent version of
Ubuntu or Fedora, one gets errors like this:
/usr/include/string.h:506:15: error: declaration of ‘size_t strlcpy(char*, const char*, size_t) noexcept’ has a
different exception specifier
506 | extern size_t strlcpy (char *__restrict __dest,
| ^~~~~~~
In file included from /home/luk/midas/src/midas.cxx:14:
/home/luk/midas/include/midas.h:2190:17: note: from previous declaration ‘size_t strlcpy(char*, const
char*, size_t)’
My proposed solution is a check in midas.h around line 248:
#if (__GLIBC__ > 2) || (__GLIBC__ == 2 && __GLIBC_MINOR__ >= 38)
#ifndef HAVE_STRLCPY
#define HAVE_STRLCPY 1
#endif
#endif
|
2793
|
26 Jul 2024 |
Stefan Ritt | Bug Fix | strlcpy and strlcat added to glibc 2.38 |
Good catch. I added your code to the current develop branch of MIDAS.
Stefan |
2839
|
12 Sep 2024 |
Konstantin Olchanski | Bug Fix | bitbucket builds repaired |
bitbucket builds work again, also added ubuntu-24 and almalinux-9.
two problems fixed:
- cmake file in examples/experiment was replaced by a non-working version
- unannounced change of strlcpy() to mstrlcpy() broke "make remoteonly"
P.S. I should also fix the rootana and the roody bitbucket builds.
K.O. |
2840
|
13 Sep 2024 |
Konstantin Olchanski | Bug Fix | rootana bitbucket build fixed |
rootana bitbucket build is fixed, only a few minor build problems. I am using the
root official docker image (which turned out to not work right out of the box
becuase of missing libvdt-dev package). K.O. |
2842
|
13 Sep 2024 |
Konstantin Olchanski | Bug Fix | mstrcpy, was: strlcpy and strlcat added to glibc 2.38 |
for the record, as ultimate solution, strlcpy() and strlcat() were wholesale
replaced by mstrlcpy() and mstrlcat(). this should fix "missing strlcpy()"
problem for good and make midas more consistent across all platforms (including
non-linux, non-unix). on my side, I continue replacing these function with proper
std::string operations. K.O. |
2967
|
20 Mar 2025 |
Konstantin Olchanski | Bug Fix | bitbucket builds fixed |
bitbucket automatic builds were broken after mfe.cxx started printing some additional messages added in commit
https://bitbucket.org/tmidas/midas/commits/0ae08cd3b96ebd8e4f57bfe00dd45527d82d7a38
this is now fixed. to check if your changes will break automatic builds, before final push, please do:
make clean
make mini -j
make cmake -j
make test
K.O. |
2988
|
21 Mar 2025 |
Stefan Ritt | Bug Fix | bitbucket builds fixed |
> bitbucket automatic builds were broken after mfe.cxx started printing some additional messages added in commit
> https://bitbucket.org/tmidas/midas/commits/0ae08cd3b96ebd8e4f57bfe00dd45527d82d7a38
>
> this is now fixed. to check if your changes will break automatic builds, before final push, please do:
>
> make clean
> make mini -j
> make cmake -j
> make test
Unfortunately we will break the automatic build each time a program outputs one different character, which even might happen if we add a line of code and
a cm_msg() gets produced with a different line number. Is there a standard way to update testexpt.example (like "make testexpt" or so). Should be trigger
the update of testexpt.example before each commit via a hook?
Stefan |
2992
|
21 Mar 2025 |
Konstantin Olchanski | Bug Fix | bitbucket builds fixed |
> > bitbucket automatic builds
>
> Unfortunately we will break the automatic build each time a program outputs one different character, which even might happen if we add a line of code and
> a cm_msg() gets produced with a different line number. Is there a standard way to update testexpt.example (like "make testexpt" or so). Should be trigger
> the update of testexpt.example before each commit via a hook?
>
Actually line numbers are not logged by messages printed from "make test", so moving code around does not break the test.
Changing what programs output does break the test and this is intentional - somebody must look at confirm
that program output was changed on purpose or because a bug was introduced (or fixed).
Most "make test" things work this way - run programs, compare output to what is expected. Discrepancies are flagged for human examination.
K.O. |
3008
|
28 Mar 2025 |
Konstantin Olchanski | Bug Fix | midas cmake update |
MIDAS git tag midas-2025-01-a introduced an incompatible change to "include midas-targets.cmake". Instead of "midas" one now has to
say "midas::midas", as updated below. K.O.
>
> #
> # CMakeLists.txt for alpha-g frontends
> #
>
> cmake_minimum_required(VERSION 3.12)
> project(agdaq_frontends)
>
> include($ENV{MIDASSYS}/lib/midas-targets.cmake)
>
> add_compile_options("-O2")
> add_compile_options("-g")
> #add_compile_options("-std=c++11")
> add_compile_options(-Wall -Wformat=2 -Wno-format-nonliteral -Wno-strict-aliasing -Wuninitialized -Wno-unused-function)
> add_compile_options("-DTMFE_REV0")
> add_compile_options("-DOS_LINUX")
>
> add_executable(feevb feevb.cxx TsSync.cxx)
> target_link_libraries(feevb midas::midas)
>
> add_executable(fectrl fectrl.cxx GrifComm.cxx EsperComm.cxx JsonTo.cxx KOtcp.cxx $ENV{MIDASSYS}/src/tmfe_rev0.cxx)
> target_link_libraries(fectrl midas::midas)
>
> #end |
3010
|
28 Mar 2025 |
Konstantin Olchanski | Bug Fix | manalyzer -R8082 --jsroot |
When processing MIDAS files offline, JSROOT did not work, -Rxxx worked, http
connection would open, but would not serve any histograms. This should now be
fixed.
In addition, normally, after processing all input MIDAS files, manalyzer would
exit, JSROOT would abruptly stop. To look at final results one had to open the
ROOT files using some other method (roody, TBrowser, mjsroot, etc).
I now added a command line switch "--jsroot", if supplied, after processing all
input MIDAS files, manalyzer will keep running in the JSROOT server mode (same as
mjsroot).
"manalyzer -R8082 --jsroot run*.mid.lz4" now does something useful: open
http://localhost:8082 (or ssh tunnel or mhttpd proxy per my mjsroot message) and
watch histograms fill in real time, after analysis finishes, keep looking at the
final results until bored. stop manalyzer using Ctrl-C. (we should add a "Stop
JSROOT" botton to the JSROOT main page).
MIDAS commit 1d0d6448c3ec4ffd225b8d2030fe13e379fcd007
K.O. |
3011
|
30 Mar 2025 |
Konstantin Olchanski | Bug Fix | manalyzer improvements |
updated manalyzer:
- similar to --jsroot switch, in online mode, the ROOT output file remains open after run is stopped. Previously, after run was
stopped, all histograms & etc would disappear from JSROOT, making it hard to look at the full collected and analyzed data.
- there was a buglet in the multithreading code, if some module cannot analyze flow events as fast as we can read data from disk,
the flow event queue of the first module thread would grow and grow and grow infinitely, potentially consume lots of RAM. This is
because control of queue size for the first module thread was disabled to avoid a deadlock. I now added the queue size check to the
main event loop (both offline mode and online mode) and this problem should now be fixed.
- also adjusted the default queue size from 100 to 1000 and queue-full wait sleep time from 100 us to 10 us.
- another buglet was in the flow event processing. per the README, module EndRun() should not generate flow events (instead, they
should be generated in PreEndRun()). Previously this was not enforced, now there is an error message about this and the offending
flow events are deleted. (they were not being processed anyway).
K.O. |
3017
|
01 Apr 2025 |
Konstantin Olchanski | Bug Fix | ODB and event buffer - release semaphore before abort() and core dump |
There is a long standing problem with ODB and event buffers. If they detect an
internal data inconsistency and cannot continue running, they call abort() to
dump core and stop.
Problem is in some code paths, they do this while holding the ODB or event
buffer semaphore. (Linux kernel automatically releases SYSV semaphores after
core dump is finished and program holding them is stopped).
If core dump takes longer than 10 seconds (for whatever reason, but we see this
often enough), all other programs that wait for ODB or event buffer access, will
also timeout and also crash (with core dump). Result is a core dump storm, at
the end all MIDAS programs are crashed. (Luckily recovery is easy, simply
restart everything).
Now I realize that in many situation, we do not need to hold the semaphore while
dumping core - the content of ODB and event buffer shared memories is not
important for debugging the crash - and it is safe to release the semaphore
before calling abort().
This is now implemented for ODB and event buffers. Hopefully core dump storms
will not happen again.
commit 96369c29deba1752fd3d25bed53e6594773d7e1a
release ODB semaphore before calling abort() to dump core. if core dump takes
longer than 10 sec all other midas programs will timeout and crash.
commit 2506406813f1e7581572f0d5721d3761b7c8e8dd
unlock event buffer before calling abort() in bm_validate_client_index_locked(),
refactor bm_get_my_client_locked()
K.O. |
3034
|
05 May 2025 |
Konstantin Olchanski | Bug Fix | Bug fix in SQL history |
A bug was introduced to the SQL history in 2022 that made renaming of variable names not work. This is now fixed.
break commit:
54bbc9ed5d65d8409e8c9fe60b024e99c9f34a85
fix commit:
159d8d3912c8c92da7d6d674321c8a26b7ba68d4
P.S.
This problem was caused by an unfortunate design of the c++ class system. If I want to add more data to an existing
class, I write this:
class old_class {
int i,j,k;
}
class bigger_class: public old_class {
int additional_variable;
}
But if I have this:
struct x { int i,j; }
class y {
std::vector<x> array_of_x;
}
and I want to add "k" to "x", c++ has not way to do this. history code has this workaround:
class bigger_y: public y
{
std::vector<int> array_of_k;
}
int bigger_y:foo(int n) {
printf("%d %d %d\", array_of_x[n].i, array_of_x[n].j, array_of_k[n]);
}
problem is that it is not obvious that "array_of_x" and "array_of_k" are connected
and they can easily get out of sync (if elements are added or removed). this is the
bug that happened in the history code. I now added assert(array_of_x.size()==array_of_k.size())
to offer at least some protection going forward.
P.S. As final solution I think I want to completely separate file history and sql history code,
they have more things different than common.
K.O. |
1
|
06 Jun 2003 |
Pierre-André Amaudruz | | Welcome |
Dear Midas users,
As you certainly aware, ELOG (Electronic Logbook) has been written
by Stefan Ritt and its functionality is part of the Midas package too.
This web site using Elog is replacing the W-Agora Forum previously setup.
You will need to register to this forum in order to gain Write access and
possible Email notification.
We would like to encourage you to post your questions or comments at
this Midas Elog site instead of using private Email to the authors as your
remarks are surely of interest to the other users too.
|
145
|
12 Jun 2003 |
Pierre-André Amaudruz | | Tape handling |
- remove ss_tape_get_blockn from lazylogger.c
- add ss_tape_get_blockn to system.c
- add ss_tape_get_blockn prototype into midas.h
- fix buffer size for "dir" in mtape.c
- add block# for "dir" in mtape if command successful.
- handle TID_STRUCT bank type by display as 8bit in ybos.c (mdump) |