ELOG Midas

Back Midas Rome Roody Rootana

Midas DAQ System, Page 117 of 152

Not logged in

Find | Login | Help

Full | Summary | Threaded | Hide attachments

3028 Entries

Goto page Previous 1, 2, 3 ... 116, 117, 118 ... 150, 151, 152 Next

ID	Date	Author	Topic	Subject
2343	14 Feb 2022	jago aartsen	Bug Fix	ODBINC/Sequencer Issue
> > I noticed that "Jacob Thorne" in the forum had the same issue as us in Novemeber last > > year. Indeed we have not installed any later versions of MIDAS since then so we will > > double check we have the latest version. > > As you see from my reply to Jacob, the bug has been fixed in midas since then, so just > update. > > Stefan We have tried updating using both: git submodule update --init --recursive and: git pull --recurse-submodules But the error still persists. Is there another way to update which we are missing? Cheers Jago
2344	15 Feb 2022	Stefan Ritt	Bug Fix	ODBINC/Sequencer Issue
> But the error still persists. Is there another way to update which we are missing? The bug was definitively fixed in this modification: https://bitbucket.org/tmidas/midas/commits/5f33f9f7f21bcaa474455ab72b15abc424bbebf2 You probably forgot to compile/install correctly after your pull. Of you start "odbedit" and do a "ver" you see which git revision you are currently running. Make sure to get this output: MIDAS version: 2.1 GIT revision: Fri Feb 11 08:56:02 2022 +0100 - midas-2020-08-a-509-g585faa96 on branch develop ODB version: 3 Stefan
Draft	15 Feb 2022	jago aartsen	Bug Fix	ODBINC/Sequencer Issue
> > But the error still persists. Is there another way to update which we are missing? > > The bug was definitively fixed in this modification: > > https://bitbucket.org/tmidas/midas/commits/5f33f9f7f21bcaa474455ab72b15abc424bbebf2 > > You probably forgot to compile/install correctly after your pull. Of you start "odbedit" and do > a "ver" you see which git revision you are currently running. Make sure to get this output: > > MIDAS version: 2.1 > GIT revision: Fri Feb 11 08:56:02 2022 +0100 - midas-2020-08-a-509-g585faa96 on branch > develop > ODB version: 3 > > > Stefan Hey Stefan, We are running the GIT revision midas-2020-08-a-509-g585faa96: [local:mu3eMSci:S]/>ver MIDAS version: 2.1 GIT revision: Tue Feb 15 16:31:07 2022 +0000 - midas-2020-08-a-521-ge43ea7c5 on branch develop ODB version: 3 which is still giving the error unfortunately.
2346	16 Feb 2022	jago aartsen	Bug Fix	ODBINC/Sequencer Issue
> > But the error still persists. Is there another way to update which we are missing? > > The bug was definitively fixed in this modification: > > https://bitbucket.org/tmidas/midas/commits/5f33f9f7f21bcaa474455ab72b15abc424bbebf2 > > You probably forgot to compile/install correctly after your pull. Of you start "odbedit" and do > a "ver" you see which git revision you are currently running. Make sure to get this output: > > MIDAS version: 2.1 > GIT revision: Fri Feb 11 08:56:02 2022 +0100 - midas-2020-08-a-509-g585faa96 on branch > develop > ODB version: 3 > > > Stefan We we're having some problems compiling but have got it sorted now - thanks for your help:) Jago
2347	16 Feb 2022	Marius Koeppel	Bug Report	Writting MIDAS Events via FPGAs
I just came back to this and started to use the dummy frontend. Unfortunately, I have a problem during run cycles: Starting the frontend and starting a run works fine -> seeing events with mdump and also on the web GUI. But when I stop the run and try to start the next run the frontend is sending no events anymore. It get stuck at line 221 (if (status == DB_TIMEOUT)). I tried to reduce the nEvents to 1 which helped in terms of DB_TIMEOUT but still I don't get any events after I did a stop / start cycle -> no events in mdump and no events counting up at the web GUI. If I kill the frontend in the terminal (ctrl+c) and restart it, while the run is still running, it starts to send events again. Cheers, Marius
2348	23 Feb 2022	Stefan Ritt	Info	Midas slow control event generation switched to 32-bit banks
The midas slow control system class drivers automatically read their equipment and generate events containing midas banks. So far these have been 16-bit banks using bk_init(). But now more and more experiments use large amount of channels, so the 16-bit address space is exceeded. Until last week, there was even no check that this happens, leading to unpredictable crashes. Therefore I switched the bank generation in the drivers generic.cxx, hv.cxx and multi.cxx to 32-bit banks via bk_init32(). This should be in principle transparent, since the midas bank functions automatically detect the bank type during reading. But I thought I let everybody know just in case. Stefan
2349	03 Mar 2022	Stefan Ritt	Bug Report	Writting MIDAS Events via FPGAs
> Starting the frontend and starting a run works fine -> seeing events with mdump and also on the web GUI. > But when I stop the run and try to start the next run the frontend is sending no events anymore. > It get stuck at line 221 (if (status == DB_TIMEOUT)). > I tried to reduce the nEvents to 1 which helped in terms of DB_TIMEOUT but still I don't get any events after I did a stop / start cycle -> no events in mdump and no events counting up at the web GUI. > If I kill the frontend in the terminal (ctrl+c) and restart it, while the run is still running, it starts to send events again. This problem has (likely) been fixed in the current version. Please pull develop and try again. Was a recursive call to the event collection routine which is only triggered if you send events faster than the logger can digest, so not many people see it. Best, Stefan
2350	03 Mar 2022	Konstantin Olchanski	Info	zlib required, lz4 internal
as of commit 8eb18e4ae9c57a8a802219b90d4dc218eb8fdefb, the gzip compression library is required, not optional. this fixes midas and manalyzer mis-build if the system gzip library is accidentally not installed. (is there any situation where gzip library is not installed on purpose?) midas internal lz4 compression library was renamed to mlz4 to avoid collision against system lz4 library (where present). lz4 files from midasio are now used, lz4 files in midas/include and midas/src are removed. I see that on recent versions of ubuntu we could switch to the system version of the lz4 library. however, on centos-7 systems it is usually not present and it still is a supported and widely used platform, so we stay with the midas-internal library for now. K.O.
2351	03 Mar 2022	Konstantin Olchanski	Info	manalyzer updated
manalyzer was updated to latest version. mostly multi-threading improvements from Joseph and myself. K.O.
2352	07 Mar 2022	Marius Koeppel	Bug Report	Writting MIDAS Events via FPGAs
> This problem has (likely) been fixed in the current version. Please pull develop and try again. Was a recursive call to the event collection routine which is only triggered if you send events faster than > the logger can digest, so not many people see it. I just pulled the current version (d945fa9) but the problem as explained in 2347 stays the same. Best, Marius
2353	10 Mar 2022	Gennaro Tortone	Bug Report	Python ODB watch
Hi, I have an issue with ODB watch on MIDAS Python library; I wrote a simple frontend that read/write FPGA registers through ODB keys (simplified version at link below): https://gist.github.com/gtortone/cd035a9ac4ea7a78ea9cd931e80e2c75 Everything works fine but there is a boolean array in Settings (Enable ADC sampling) that I need to "toggle" (19 bit to 0 and 19 bit to 1). This operation is handled by detailed_settings_changed_func that write the value of toggled bit to FPGA. The issue is that if I quickly toggle the boolean array by odbedit: set "/Equipment/odbtest/Settings/Enable ADC sampling[0-18]" 0 set "/Equipment/odbtest/Settings/Enable ADC sampling[0-18]" 1 I see in the Python script the following list of callbacks: detailed_settings_changed_func: /Equipment/odbtest/Settings/Enable ADC sampling[0] - new value 0 detailed_settings_changed_func: /Equipment/odbtest/Settings/Enable ADC sampling[1] - new value 0 detailed_settings_changed_func: /Equipment/odbtest/Settings/Enable ADC sampling[2] - new value 0 detailed_settings_changed_func: /Equipment/odbtest/Settings/Enable ADC sampling[3] - new value 0 detailed_settings_changed_func: /Equipment/odbtest/Settings/Enable ADC sampling[4] - new value 0 detailed_settings_changed_func: /Equipment/odbtest/Settings/Enable ADC sampling[5] - new value 0 detailed_settings_changed_func: /Equipment/odbtest/Settings/Enable ADC sampling[6] - new value 0 detailed_settings_changed_func: /Equipment/odbtest/Settings/Enable ADC sampling[7] - new value 0 detailed_settings_changed_func: /Equipment/odbtest/Settings/Enable ADC sampling[8] - new value 1 *** detailed_settings_changed_func: /Equipment/odbtest/Settings/Enable ADC sampling[9] - new value 1 detailed_settings_changed_func: /Equipment/odbtest/Settings/Enable ADC sampling[10] - new value 1 detailed_settings_changed_func: /Equipment/odbtest/Settings/Enable ADC sampling[11] - new value 1 detailed_settings_changed_func: /Equipment/odbtest/Settings/Enable ADC sampling[12] - new value 1 detailed_settings_changed_func: /Equipment/odbtest/Settings/Enable ADC sampling[13] - new value 1 detailed_settings_changed_func: /Equipment/odbtest/Settings/Enable ADC sampling[14] - new value 1 detailed_settings_changed_func: /Equipment/odbtest/Settings/Enable ADC sampling[15] - new value 1 detailed_settings_changed_func: /Equipment/odbtest/Settings/Enable ADC sampling[16] - new value 1 detailed_settings_changed_func: /Equipment/odbtest/Settings/Enable ADC sampling[17] - new value 1 detailed_settings_changed_func: /Equipment/odbtest/Settings/Enable ADC sampling[18] - new value 1 detailed_settings_changed_func: /Equipment/odbtest/Settings/Enable ADC sampling[0] - new value 1 detailed_settings_changed_func: /Equipment/odbtest/Settings/Enable ADC sampling[1] - new value 1 detailed_settings_changed_func: /Equipment/odbtest/Settings/Enable ADC sampling[2] - new value 1 detailed_settings_changed_func: /Equipment/odbtest/Settings/Enable ADC sampling[3] - new value 1 detailed_settings_changed_func: /Equipment/odbtest/Settings/Enable ADC sampling[4] - new value 1 detailed_settings_changed_func: /Equipment/odbtest/Settings/Enable ADC sampling[5] - new value 1 detailed_settings_changed_func: /Equipment/odbtest/Settings/Enable ADC sampling[6] - new value 1 detailed_settings_changed_func: /Equipment/odbtest/Settings/Enable ADC sampling[7] - new value 1 detailed_settings_changed_func: /Equipment/odbtest/Settings/Enable ADC sampling[8] - new value 1 detailed_settings_changed_func: /Equipment/odbtest/Settings/Enable ADC sampling[9] - new value 1 detailed_settings_changed_func: /Equipment/odbtest/Settings/Enable ADC sampling[10] - new value 1 detailed_settings_changed_func: /Equipment/odbtest/Settings/Enable ADC sampling[11] - new value 1 detailed_settings_changed_func: /Equipment/odbtest/Settings/Enable ADC sampling[12] - new value 1 detailed_settings_changed_func: /Equipment/odbtest/Settings/Enable ADC sampling[13] - new value 1 detailed_settings_changed_func: /Equipment/odbtest/Settings/Enable ADC sampling[14] - new value 1 detailed_settings_changed_func: /Equipment/odbtest/Settings/Enable ADC sampling[15] - new value 1 detailed_settings_changed_func: /Equipment/odbtest/Settings/Enable ADC sampling[16] - new value 1 detailed_settings_changed_func: /Equipment/odbtest/Settings/Enable ADC sampling[17] - new value 1 detailed_settings_changed_func: /Equipment/odbtest/Settings/Enable ADC sampling[18] - new value 1 It seems that the second write operation "overlaps" the first one... The same behavior is not observed using a 'watch' in odbedit... I can overcame this problem using the value of register as ODB key to avoid array of boolean... but I report this issue as "possible" bug/limitation on Python ODB watch; Cheers, Gennaro
2354	15 Mar 2022	Konstantin Olchanski	Bug Fix	mhttpd ipv6 bind should be fixed now
Something changed after my initial implementation of ipv6 in mhttpd and listening to ipv6 http/https connections was broken. It turns out, I do not need to listen to both ipv4 and ipv6 sockets, it is sufficient to listen to just ipv6. ipv4 connections will also magically work. see linux kernel "bindv6only" sysctl setting: https://sysctl- explorer.net/net/ipv6/bindv6only/ The specific bug in mhttpd was to bind to ipv4 socket first, subsequent bind() to ipv6 socket fails with error "Address already in use", which is silent, not reported by the mongoose library. For reasons unknown, this does not happen to bind() to "localhost" aka ipv6 "::1". Apparently other web servers (apache, nginx) are/were also affected by this problem. https://chrisjean.com/fix-nginx-emerg-bind-to-80-failed-98-address-already-in-use/ First fix was to bind to ipv6 first (success) and to ipv4 second (fails). Second fix committed to git is to only listen to ipv6. This works both on MacOS and on Linux. Linux reports the listener socket is "tcp6", MacOS reports the listener socket as "tcp46": 4ed0:javascript1 olchansk$ netstat -an \| grep 808 \| grep LISTEN tcp46 0 0 .8081 .* LISTEN tcp6 0 0 ::1.8080 . LISTEN tcp4 0 0 127.0.0.1.8080 . LISTEN 4ed0:javascript1 olchansk$ K.O.
2355	16 Mar 2022	Stefan Ritt	Info	New midas sequencer version
A new version of the midas sequencer has been developed and now available in the develop/seq_eval branch. Many thanks to Lewis Van Winkle and his TinyExpr library (https://codeplea.com/tinyexpr), which has now been integrated into the sequencer and allow arbitrary Math expressions. Here is a complete list of new features: * Math is now possible in all expressions, such as "x = $i3 + sin($ypi)^2", or in "ODBSET /Path/value[$i2+1], 10" "SET <var>,<value>" can be written as "<var>=<value>", but the old syntax is still possible. * There are new functions ODBCREATE and ODBDLETE to create and delete ODB keys, including arrays * Variable arrays are now possible, like "a[5] = 0" and "MESSAGE $a[5]" If the branch works for us in the next days and I don't get complaints from others, I will merge the branch into develop next week. Stefan
2356	16 Mar 2022	Ben Smith	Bug Report	Python ODB watch
> It seems that the second write operation "overlaps" the first one... Hi Gennaro, In principle the same issue can happen in C++ code, but is much less likely as the callbacks get executed more quickly (partly due to C++/python in general, and partly because the python code does some extra work to make the interface more user-friendly). The C++ code at the end of this message adds a 100ms sleep to the callback and can result in output like this when you do quick edits of "Test[0-19]" in odbedit. Element 1 is 0 Element 2 is 0 Element 3 is 0 Element 4 is 0 Element 5 is 0 Element 6 is 0 Element 7 is 1 Element 8 is 1 Element 9 is 1 etc... I agree that this can be a really nasty source of bugs if you need to react to every change. I'll add a warning to the python docstrings, but I can't think of a way to make this more robust at the midas level - I think we'd need some sort of ODB "snapshot" system... #include "midas.h" void watch_fn(HNDLE hDB, HNDLE hKey, int index, void info) { DWORD data = 0; INT buf_size = sizeof(data); db_get_data_index(hDB, hKey, &data, &buf_size, index, TID_DWORD); printf("Element %d is %u\n", index, data); ss_sleep(100); } int main() { HNDLE hDB, hClient, hTestKey; std::string host, expt; cm_get_environment(&host, &expt); cm_connect_experiment(host.c_str(), expt.c_str(), "test_odb", nullptr); cm_get_experiment_database(&hDB, &hClient); static const DWORD numValues = 20; DWORD data[numValues] = {}; db_set_value(hDB, 0, "Test", data, sizeof(DWORD) numValues, numValues, TID_DWORD); db_find_key(hDB, 0, "Test", &hTestKey); db_watch(hDB, hTestKey, watch_fn, nullptr); printf("Press any key to exit loop...\n"); while (!ss_kbhit()) { cm_yield(1); } db_unwatch_all(); db_delete_key(hDB, hTestKey, FALSE); cm_disconnect_experiment(); return 0; }
2357	21 Mar 2022	Stefan Ritt	Bug Report	Python ODB watch
What you describe is a well-known problem with the ODB. At PSI we have similar issues. There are two approaches to solve it: 1) Write values one-by-one to the ODB, but do not trigger a watch update. In the sequencer, this can be achieved with the ODBSET command (see https://daq00.triumf.ca/MidasWiki/index.php/Sequencer and the last paragraph right of the ODBSET command). You use notify=0 for all set commands except the last one where you use notify=1. On the C++ API, you can use db_set_data_index1() which has this notify flag as the last parameter. 2) You add intelligence to your front-end. If you get a watchdog update, you do not apply this directly to the hardware, but put it into a FIFO. Once you do not get any more update for a certain period (like 1s is a good value), you empty the FIFO and apply all setting immediately. Both methods have been used at PSI successfully, although 1) is much easier to implement, especially if you use the midas sequencer. Stefan
2358	22 Mar 2022	Stefan Ritt	Info	New midas sequencer version
After several days of testing in various experiments, the new sequencer has been merged into the develop branch. One more feature was added. The path to the ODB can now contain variables which are substituted with their values. Instead writing ODBSET /Equipment/XYZ/Setting/1/Switch, 1 ODBSET /Equipment/XYZ/Setting/2/Switch, 1 ODBSET /Equipment/XYZ/Setting/3/Switch, 1 one can now write LOOP i, 3 ODBSET /Equipment/XYZ/Setting/$i/Switch, 1 ENDLOOP Of course it is not possible for me to test any possible script. So if you have issues with the new sequencer, please don't hesitate to report them back to me. Best, Stefan
2359	22 Mar 2022	Konstantin Olchanski	Bug Fix	fix for event buffer corruption in bm_flush_cache()
multithreaded frontends have an unusual event buffer corruption if the write cache is enabled. For a long time now I had to disable the write cache on all multithreaded frontends in alpha-g, I was hitting this bug quite often. (somehow I do not see this problem reported on bitbucket!) last week I reworked the multithread locking of event buffers, in hope that this bug will turn up, but nope, all mutexes and locking look okey, except for a number of unrelated problems (races against bm_close_buffer() were the most troublesome to fix). but finally found the trouble. first, some background. because multiprocess locking is expensive, frontends that generate a large number of small events can use the write cache to reduce this overhead. instead of locking the shared memory event buffer for each event, events are accumulated in the write cache, and periodic calls to bm_flush_buffer() flush them to shared memory. For best effect, one should increase the size of the write cache until lock rate is around 10/second. it turns out introduction of multithreading broke bm_flush_cache(). it does this: - int ask_free = pbuf->wp; // how much data we have in the write cache now - call bm_wait_for_free_space(ask_free); // ensure we have this much free shared memory space - copy pbuf->wp worth if events to shared memory looks okey at first sight. this is what happens to trigger the bug: - int ask_free = pbuf->wp; // ok - call bm_wait_for_free_space(ask_free); // ok, but if shared memory is full, it will go to sleep waiting for free space - in the mean time, another thread calls bm_send_event(), this adds more data to the write cache, moves pbuf->wp - bm_wait_for_free_space() eventually returns - copy pbuf->wp worth of data to shared memo KABOOM! shared memory corruption! we just overwrote some unlucky event in shared memory: we only have "ask_free" free bytes available, but pbuf->wp moved and now has more data, and it does not fit, and there is no check against it. of course in the single threaded world this bug did not exist, there was no other thread to call bm_send_event() while bm_flush_cache() is sleeping. the obvious fix is to ask for more free space if cached data does not fit. this is now implemented on the branch feature/buffer_mutex. after a bit more tested I will merge it into develop. so that's it? not so fast. there was more going on. as described, the bug will only happen when shared memory event buffer is full. (i.e. rarely or never). It turns out the old version of thread locking code was defective and permitted a race between bm_send_event() and bm_send_event() in another thread: thread 1: while (1) { bm_send_event(very small event); } thread 2: -> bm_send_event(very big event) -> no space in the cache for the very big event, call bm_flush_cache() -> bm_flush_cache() asks bm_wait_for_free_space() to make space for cached data -> this was done with write cache mutex released (mistake!) -> at the same time bm_send_event(very small event) added 1 more small event to the cache -> back in bm_flush_cache() write cache mutex is locked correctly, we copy cached data to shared memory and again KABOOM because we now have more data than we asked free space for. So in the original implementation, corruption was possible even when share memory event buffer was pretty much empty. The reworked locking code closed that loop hole - bm_flush_cache() is now called with write cache locked, and bm_send_event() from another thread cannot confuse things, unless shared memory buffer is full and we go to sleep inside bm_wait_for_free_space(). And this is now fixed, too. K.O.
2360	22 Mar 2022	Stefan Ritt	Bug Fix	fix for event buffer corruption in bm_flush_cache()
Thanks Konstantin for your detailed description. I wonder why we never saw this problem at PSI. Here is the reason: In multil-threaded environments, we never call bm_send_event() directly from all threads (since in the old days nothing was thread safe in midas). Instead, we use a collector thread which gets all events via the rb_xxx functions from the individual readout threads. This is well integrated into the mfe.cxx framework. Look at examples/mtfe/mfte.cxx. Each thread does (simplified): while (true) { do { status = rb_get_wp(&pevent); } while (status == DB_TIMEOUT) bm_compose_event_threadsafe(pevent, ..., &serial_number); bk_init32(pevent+1); ... fill event ... bk_close(pevent) rb_increment_wp(sizeof(EVENT_HEADER) + pevent->data_size); } The framework now collects all these events in receive_trigger_event() which runs in the main thread: for (i=0 ; i<n_thread ; i++) { rb_get_rp(i, pevent); if (pevent->serial_number == prev_serial+1) break; } prev_serial = pevent->serial_number; rpc_send_event(pevent); rb_increment_rp(sizeof(EVENT_HEADER) + pevent->data_size); This code ensures that all events are in the right sequence (before the serial numbers where mixed up) and that all events are sent only from a single thread, so the write buffer can be used effectively without complicated multi-thread locks. This solution works nicely at PSI since many years, maybe you should put some thought to use it in your tmfe framework in Alpha-g as well instead of struggling with all your locks. Stefan
2361	23 Mar 2022	Ivo Schulthess	Bug Fix	fix for event buffer corruption in bm_flush_cache()
Thanks for the investigation. Back in 2020, we had some issues of losing data between the system buffer and the logger writing them to disk (https://daq00.triumf.ca/elog-midas/Midas/1966). This was polled equipment but we had a multithreaded FE running at the same time. Could this be related to the same problem? Best, Ivo
2362	23 Mar 2022	Konstantin Olchanski	Bug Fix	mhttpd ipv6 bind should be fixed now
> Something changed after my initial implementation of ipv6 in mhttpd > and listening to ipv6 http/https connections was broken. Reporting that mhttpd ipv6 works at CERN. The hostnames for ipv6 connections come back as alphacpc05.ipv6.cern.ch instead of alphacpc05.cern.ch so both are added to the http "insecure port" whitelist. K.O.

Goto page Previous 1, 2, 3 ... 116, 117, 118 ... 150, 151, 152 Next

ELOG V3.1.4-2e1708b5