ELOG Midas

Back Midas Rome Roody Rootana

Midas DAQ System, Page 113 of 137

Not logged in

Find | Login | Help

New entries since:

Wed Dec 31 16:00:00 1969

Full | Summary | Threaded | Show attachments

2739 Entries

Goto page Previous 1, 2, 3 ... 112, 113, 114 ... 135, 136, 137 Next

ID	Date	Author	Topic	Subject
812	24 Jun 2012	Konstantin Olchanski	Info	midas vme benchmarks
> > > Just for completeness: Attached is the VME transfer speed I get with the SIS3100/SIS1100 interface using > > > 2eVME transfer. This curve can be explained exactly with an overhead of 125 us per DMA transfer and a > > > continuous link speed of 83 MB/sec. > > [with ...] the PSI-built DRS4 board, where we implemented the 2eVME protocol in the Virtex II FPGA. This is an interesting hardware benchmark. Do you also have benchmarks of the MIDAS system using the DRS4 (measurements of end-to-end data rates, maximum event rate, maximum trigger rate, any tuning of the frontend program and of the MIDAS experiment to achieve those rates, etc)? K.O.
813	24 Jun 2012	Konstantin Olchanski	Info	midas vme benchmarks
> > I am recording here the results from a test VME system using two VF48 waveform digitizers (I now have 4 VF48 waveform digitizers, so the event rates are half of those reported before. Date rate is up to 51 M/s - event size has doubled, per-event overhead is the same, so the effective data rate goes up). This message demonstrates the effects of tuning the MIDAS system for high rate data taking. Attached is the history plot of the event rate counters which show the real-time performance of the MIDAS system with better detail compared to the average event rate reported on the MIDAS status page. For an ideal real-time system, the event rate should be a constant, without any drop-outs. Seen on the plot: run 75: the periodic dropouts in the event rate correspond to the lazylogger writing data into HADOOP HDFS. Clearly the host computer cannot keep up with both data taking and data archiving at the same time. (see the output of "top" "with HDFS" and "without HDFS" below) run 76: SYSTEM buffer size increased from 100Mbytes to 300Mbytes. Maybe there is an improvement. run 77-78: "event_buffer_size" inside the multithreaded (EQ_MULTITHREAD) VME frontend increased from 100Mbytes to 300Mbytes. (6 seconds of data at 50M/s). Much better, yes? Conclusion: for improved real-time performance, there should be sufficient buffering between the VME frontend readout thread and the mlogger data compression thread. For benchmark hardware, at 50M/s, 4 seconds of buffer space (100M in the SYSTEM buffer and 100M in the frontend) is not enough. 12 seconds of buffer space (300+300) is much better. (Or buy a faster backend computer). P.S. HDFS data rate as measured by lazylogger is around 20M/s for CDH3 HADOOP and around 30M/s for CDH4 HADOOP. P.S. Observe the ever present unexplained event rate fluctuations between 130-140 event/sec. K.O. ---- "top" output during normal data taking, notice mlogger data compression consumes 99% CPU at 51 M/s data rate. top - 08:55:22 up 72 days, 17:00, 5 users, load average: 2.47, 2.32, 2.27 Tasks: 206 total, 2 running, 204 sleeping, 0 stopped, 0 zombie Cpu(s): 52.2%us, 6.1%sy, 0.0%ni, 34.4%id, 0.8%wa, 0.1%hi, 6.2%si, 0.0%st Mem: 3925556k total, 3064928k used, 860628k free, 3788k buffers Swap: 32766900k total, 200704k used, 32566196k free, 2061048k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 5826 trinat 20 0 437m 291m 287m R 97.6 7.6 636:39.63 mlogger 27617 trinat 20 0 310m 288m 288m S 24.6 7.5 6:59.28 mserver 1806 ganglia 20 0 415m 62m 1488 S 0.9 1.6 668:43.55 gmond --- "top" output during lazylogger/HDFS activity. Observe high CPU use by lazylogger and fuse_dfs (the HADOOP HDFS client). Observe that CPU use adds up to 167% out of 200% available. top - 08:57:16 up 72 days, 17:01, 5 users, load average: 2.65, 2.35, 2.29 Tasks: 206 total, 2 running, 204 sleeping, 0 stopped, 0 zombie Cpu(s): 57.6%us, 23.1%sy, 0.0%ni, 8.1%id, 0.0%wa, 0.4%hi, 10.7%si, 0.0%st Mem: 3925556k total, 3642136k used, 283420k free, 4316k buffers Swap: 32766900k total, 200692k used, 32566208k free, 2597752k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 5826 trinat 20 0 437m 291m 287m R 68.7 7.6 638:24.07 mlogger 23450 root 20 0 1849m 200m 4472 S 64.4 5.2 75:35.64 fuse_dfs 27617 trinat 20 0 310m 288m 288m S 18.5 7.5 7:22.06 mserver 26723 trinat 20 0 38720 11m 1172 S 17.9 0.3 22:37.38 lazylogger 7268 trinat 20 0 1007m 35m 4004 D 1.3 0.9 187:14.52 nautilus 1097 root 20 0 0 0 0 S 0.8 0.0 101:45.55 md3_raid1
814	25 Jun 2012	Stefan Ritt	Info	midas vme benchmarks
> P.S. Observe the ever present unexplained event rate fluctuations between 130-140 event/sec. An important aspect of optimizing your system is to keep the network traffic under control. I use GBit Ethernet between FE and BE, and make sure the switch can accomodate all accumulated network traffic through its backplane. This way I do not have any TCP retransmits which kill you. Like if a single low-level ethernet packet is lost due to collision, the TCP stack retransmits it. Depending on the local settings, this can be after a timeout of one (!) second, which punches already a hole in your data rate. On the MSCB system actually I use UDP packets, where I schedule the retransmit myself. For a LAN, 10-100ms timeout is there enough. The one second is optimized for a WAN (like between two continents) where this is fine, but it is not what you want on a LAN system. Also make sure that the outgoing traffic (lazylogger) uses a different network card than the incoming traffic. I found that this also helps a lot. - Stefan
815	25 Jun 2012	Konstantin Olchanski	Info	midas vme benchmarks
> > P.S. Observe the ever present unexplained event rate fluctuations between 130-140 event/sec. > > An important aspect of optimizing your system is to keep the network traffic under control. I use GBit Ethernet between FE and BE, and make sure the switch > can accomodate all accumulated network traffic through its backplane. This way I do not have any TCP retransmits which kill you. Like if a single low-level > ethernet packet is lost due to collision, the TCP stack retransmits it. Depending on the local settings, this can be after a timeout of one (!) second, which > punches already a hole in your data rate. On the MSCB system actually I use UDP packets, where I schedule the retransmit myself. For a LAN, 10-100ms timeout > is there enough. The one second is optimized for a WAN (like between two continents) where this is fine, but it is not what you want on a LAN system. Also > make sure that the outgoing traffic (lazylogger) uses a different network card than the incoming traffic. I found that this also helps a lot. > In typical applications at TRIUMF we do not setup a private network for the data traffic - data from VME to backend computer and data from backend computer to DCACHE all go through the TRIUMF network. This is justified by the required data rates - the highest data rate experiment running right now is PIENU - running at about 10 M/s sustained, nominally April through December. (This is 20% of the data rate of the present benchmark). The next highest data rate experiment is T2K/ND280 in Japan running at about 20 M/s (neutrino beam, data rate is dominated by calibration events). All other experiments at TRIUMF run at lower data rates (low intensity light ion beams), but we are planning for an experiment that will run at 300 M/s sustained over 1 week of scheduled beam time. But we do have the technical capability to separate data traffic from the TRIUMF network - the VME processors and the backend computers all have dual GigE NICs. (I did not say so, but obviously the present benchmark at 50 M/s VME to backend and 20-30 M/s from backend to HDFS is a GigE network). (I am not monitoring the TCP loss and retransmit rates at present time) (The network switch between VME and backend is a "the cheapest available" rackmountable 8-port GigE switch. The network between the backend and the HDFS nodes is mostly Nortel 48-port GigE edge switches with single-GigE uplinks to the core router). K.O.
816	26 Jun 2012	Konstantin Olchanski	Info	midas vme benchmarks
> > > I am recording here the results from a test VME system using four VF48 waveform digitizers Now we look at the detail of the event readout, or if you want, the real-time properties of the MIDAS multithreaded VME frontend program. The benchmark system includes a TRIUMF-made VME-NIMIO32 VME trigger module which records the time of the trigger and provides a 20 MHz timestamp register. The frontend program is instrumented to save the trigger time and readout timing data into a special "trigger" bank ("VTR0"). The ROOTANA-based MIDAS analyzer is used to analyze this data and to make these plots. Timing data is recorded like this: NIM trigger signal ---> latched into the IO32 trigger time register (VTR0 "trigger time") ... int read_event(pevent, etc) { VTR0 "trigger time" = io32->latched_trigger_time(); VTR0 "readout start time" = io32->timestamp(); read the VF48 data io32->release_busy(); VTR0 "readout end time" = io32->timestamp(); } From the VTR0 time data, we compute these values: 1) "trigger latency" = "readout start time" - "trigger time" --- the time it takes us to "see" the trigger 2) "readout time" = "readout end time" - "readout start time" --- the time it takes to read the VF48 data 3) "busy time" = "readout end time" - "trigger time" --- time during which the "DAQ busy" trigger veto is active. also computed is 4) "time between events" = "trigger time" - "time of previous trigger" And plot them on the attached graphs: 1) "trigger latency" - we see average trigger latency is 5 usec with hardly any events taking more than 10 usec (notice the log Y scale!). Also notice that there are 35 events that took longer that 100 usec (0.7% out of 5000 events). So how "real time" is this? For "hard real time" the trigger latency should never exceed some maximum, which is determined by formal analysis or experimentally (in which case it will carry an experimental error bar - "response time is always less than X usec with probability 99.9...%" - the better system will have smaller X and more nines). Since I did not record the maximum latency, I can only claim that the "response time is always less than 1 sec, I am pretty sure of it". For "soft real time" systems, such as subatomic particle physics DAQ systems, one is permitted to exceed that maximum response time, but "not too often". Such systems are characterized by the quantities derived from the present plot (mean response time, frequency of exceeding some deadlines, etc). The quality of a soft real time system is usually judged by non-DAQ criteria (i.e. if the DAQ for the T2K/ND280 experiment does not respond within 20 msec, a neutrino beam spill an be lost and the experiment is required to report the number of lost spills to the weekly facility management meeting). Can the trigger latency be improved by using interrupts instead of polling? Remember that on most hardware, the VME and PCI bus access time is around 1 usec and trigger latency of 5-10 usec corresponds to roughly 5-10 reads of a PCI or VME register. So there is not much room for speed up. Consider that an interrupt handler has to perform at least 2-3 PCI register reads (to determine the source of the interrupt and to clear the interrupt condition), it has to wake up the right process and do a rather slow CPU context switch, maybe do a cross-CPU interrupt (if VME interrupts are routed to the wrong CPU core). All this takes time. Then the Linux kernel interrupt latency comes into play. All this is overhead absent in pure- polling implementations. (Yes, burning a CPU core to poll for data is wasteful, but is there any other use for this CPU core? With a dual-core CPU, the 1st core polls for data, the 2nd core runs mfe.c, the TCP/IP stack and the ethernet transmitter.) 2) "readout time" - between 7 and 8 msec, corresponding to the 50 Mbytes/sec VME block transfer rate. No events taking more than 10 msec. (Could claim hard real time performance here). 3) "busy time" - for the simple benchmark system it is a boring sum of plots (1) and (2). The mean busy time ("dead time") goes straight into the formula for computing cross-sections (if that is what you do). 4) "time between events" - provides an independent measurement of dead time - one can see that no event takes less than 7 msec to process and 27 events took longer than 10 msec (0.65% out of 4154 events). If the trigger were cosmic rays instead of a pulser, this plot would also measure the cosmic ray event rate - one would see the exponential shape of the Poisson distribution (linear on Log scale, with the slope being the cosmic event rate). K.O.
817	26 Jun 2012	Konstantin Olchanski	Info	midas vme benchmarks
> > > > I am recording here the results from a test VME system using four VF48 waveform digitizers Last message from this series. After all the tuning, I reduce the trigger rate from 120 Hz to 100 Hz to see what happens when the backend computer is not overloaded and has some spare capacity. event rate: 100 Hz (down from 120 Hz) data rate: 37 Mbytes/sec (down from 50 M/s) mlogger cpu use: 65% (down from 99%) Attached: 1) trigger rate event plot: now the rate is solid 100 Hz without dropouts 2) CPU and Network plots frog ganglia: the spikes is lazylogger saving mid.gz files to HDFS storage 3) time structure plots: a) trigger latency: mean 5 us, most below 10 us, 59 events (0.046%) longer than 100 us, (bottom left graph) 7000 us is longest latency observed. b) readout time is 7000-8000 us (same as before - VME data rate is independant from the trigger rate) c) busy time: mean 7.2 us, 12 events (0.0094%) longer than 10 ms, longest busy time ever observed is 17 ms (bottom middle graph) d) time between events is 10 ms (100 Hz pulser trigger), 1 event was missed about 10 times (spike at 20 ms) (0.0085%), more than 1 event missed never (no spike at 30 ms, 40 ms, etc). CPU use on the backend computer: top - 16:30:59 up 75 days, 35 min, 6 users, load average: 0.98, 0.99, 1.01 Tasks: 206 total, 3 running, 203 sleeping, 0 stopped, 0 zombie Cpu(s): 39.3%us, 8.2%sy, 0.0%ni, 39.4%id, 5.7%wa, 0.3%hi, 7.2%si, 0.0%st Mem: 3925556k total, 3404192k used, 521364k free, 8792k buffers Swap: 32766900k total, 296304k used, 32470596k free, 2477268k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 5826 trinat 20 0 441m 292m 287m R 65.8 7.6 2215:16 mlogger 26756 trinat 20 0 310m 288m 288m S 16.8 7.5 34:32.03 mserver 29005 olchansk 20 0 206m 39m 17m R 14.7 1.0 26:19.42 ana_vf48.exe 7878 olchansk 20 0 99m 3988 740 S 7.7 0.1 27:06.34 sshd 29012 trinat 20 0 314m 288m 288m S 2.8 7.5 4:22.14 mserver 23317 root 20 0 0 0 0 S 1.4 0.0 24:21.52 flush-9:3 K.O.
2583	16 Aug 2023	Konstantin Olchanski	Bug Report	midas wants to show notification?
I started to get web browser popups about "midas wants to show notifications, block/allow/x". is this a glitch or a new unannounced/undocumented feature? google chrome on macos. K.O.
2584	16 Aug 2023	Stefan Ritt	Bug Report	midas wants to show notification?
> I started to get web browser popups about "midas wants to show notifications, > block/allow/x". is this a glitch or a new unannounced/undocumented feature? > google chrome on macos. K.O. https://bitbucket.org/tmidas/midas/commits/e101dea764c647211c560a68db7ecda1834198db I did not consider this a significant feature to be announced here. Just a few lines of code. You can turn it on/off via the "Config" web page. Stefan
2585	16 Aug 2023	Stefan Ritt	Bug Report	midas wants to show notification?
> > I started to get web browser popups about "midas wants to show notifications, > > block/allow/x". is this a glitch or a new unannounced/undocumented feature? > > google chrome on macos. K.O. > > https://bitbucket.org/tmidas/midas/commits/e101dea764c647211c560a68db7ecda1834198db > > I did not consider this a significant feature to be announced here. Just a few lines > of code. You can turn it on/off via the "Config" web page. > > Stefan Now as I look at it again I realized that the config check boxes had a bug. I fixed that and now the disable should work correctly. This feature was asked by some people who monitor an experiment and have the browser window in the background, also have sound off (large office). So desktop notifications are a good thing for them. Stefan
2586	16 Aug 2023	Konstantin Olchanski	Bug Report	midas wants to show notification?
> This feature was asked by some people ... "show notifications" popups are strongly associated with disreputable web sites (presumably to push spam), it was surprising to see it from midas. K.O.
2589	17 Aug 2023	Stefan Ritt	Bug Report	midas wants to show notification?
> > This feature was asked by some people ... > > "show notifications" popups are strongly associated with disreputable web sites (presumably to > push spam), it was surprising to see it from midas. > > K.O. I agree. But unlike emails (where you get lots of spam as well), you can nicely blacklist/whitelist desktop notifications. I suppress all of them except the one for MIDAS. This allows me to watch our experiment without staring on the web page all the time. The main question here is maybe if the desktop notification should be on or off by default (for a fresh browser). While you always can change that via the mhttpd "Config" page, the default value is chosen by the system. I thought I put it to "on" so people can experience it, and then turn it off if they don't like. Having them off by default, most people never would notice this possibility. But I'm open to a discussion here. Stefan
1142	20 Nov 2015	Konstantin Olchanski	Info	midas wiki doxygen documentation links
I updated the links on the midas wiki to the doxygen-generated documentation for MIDAS that you get after running "git clone midas; cd midas; make dox; firefox html/index.html". Correct link is: https://daq.triumf.ca/~daqweb/doc/midas-devel/html/ This takes you to a daily/nightly generated snapshot of the midas develop branch and the generated documentation with full call graphs. Previous links were deficient is different ways: - referred to http://ladd00 instead of https://daq - referred to wrong path ~daqweb/doc/midas instead of ~daqweb/doc/midas-devel - referred to the obsolete doxygen generator in midas/doc/html instead of midas/html. If wrong links are still present on the midas wiki, please let us know and we will fix them. K.O.
982	14 Mar 2014	Konstantin Olchanski	Info	midas wiki updated to mediawiki 1.22.4
The midas wiki at https://midas.triumf.ca was updated to mediawiki 1.22.4 - the latest production version. If you see any problems, please report them to this elog. K.O.
1222	01 Dec 2016	Konstantin Olchanski	Info	midas wiki updated to mediawiki 1.27.1
midas wiki at https://midas.triumf.ca/MidasWiki/index.php/Main_Page was updated to MediaWiki version 1.27.1, the current MediaWiki LTS release. Everything should work as before, but if you see any problems or anomalies, please report them on this forum here. K.O.
1538	03 Jun 2019	Konstantin Olchanski	Forum	midas wiki updated to mediawiki 1.27.5
the midas wiki was updated to the latest LTS point release 1.27.5. Also, an installation error was fixed that prevented confirmation of new accounts (git checkout REL1_28 instead of REL1_27, resulting in a version mismatch). Support for MediaWiki LTS release 1.27 ends this Summer. Next LTS release series is 1.31, see https://en.wikipedia.org/wiki/MediaWiki_version_history This version requires php version 7 or newer which comes standard with ubuntu LTS 18.04 and el8 (RHEL8), but not with el6 (SL6) and el7 (CentOS-7). I guess we shall start planning this upgrade and the move of the wiki to a new host machine. K.O.
1544	07 Jun 2019	Konstantin Olchanski	Forum	midas wiki updated to mediawiki 1.27.7
the midas wiki was updated to the latest LTS point release 1.27.7, the latest (last?) security update. mediawiki series 1.27 is now officially EOL, see https://lists.wikimedia.org/pipermail/mediawiki-announce/2019-June/000231.html they recommend that all users upgrade to the current LTS series 1.31. for us it means moving the wiki from the present el6 (SL6) computer to a more up-to-date platform (el8 or ubuntu LTS 18.04). K.O.
1113	16 Sep 2015	Konstantin Olchanski	Info	midas wiki upgraded
The midas wiki at https://midas.triumf.ca has been upgraded to mediawiki version 1.25.2 (current production version). If you see any problems, please report them on this forum. K.O.
1513	28 Mar 2019	Konstantin Olchanski	Release	midas-2019-03-f
the midas release 2019-03 is ready for general use. main changes from previous releases (midas-2017-10, midas-2018-12 and midas-2019-02): - change to the midas URL scheme - removal of cm_watchdog() - rewrite of event buffer code (and fix of hard to trigger event buffer corruption bug) - fully thread safe odb and event buffer code (except for rpc_send_event()) - corrected compatibility problems wrt older versions of midas when serving custom web pages via odb /custom/path To obtain this release, either checkout the top of branch feature/midas-2019-03 (recommended) or checkout the tag midas-2019-03-f. K.O.
1530	22 May 2019	Konstantin Olchanski	Release	midas-2019-03-g
> the midas release 2019-03 is ready for general use. first ever bug fix release on a git release branch. fixed a crash if frontend built against this midas is connected to mserver from old (pre-db_watch) midas (size mismatch of MSG_ODB message). to use this update: # recommended: git pull git checkout feature/midas-2019-03 git pull make ... # or checkout "detached HEAD" git pull git checkout midas-2019-03-g make ... odbedit "ver" should report: GIT revision: Wed May 22 07:35:11 2019 -0700 - midas-2019-03-g on branch feature/midas-2019-03 K.O. P.S. Thanks for finding this bug go to Greg Hackman on TIGRESS and EMMA experiments at TRIUMF. K.O.
1543	06 Jun 2019	Konstantin Olchanski	Release	midas-2019-03-h
> > the midas release 2019-03 is ready for general use. A bug fix update for midas-2019-03: - fix broken expand_env() in mhttpd - fix "Invalid name passed to db_create_key: should not be an empty string" in midas.log when loading the MIDAS status page if one of the alarms has empty class name. odbedit "ver" should report: Thu Jun 6 18:02:14 2019 -0700 - midas-2019-03-h on branch feature/midas-2019-03 K.O.

Goto page Previous 1, 2, 3 ... 112, 113, 114 ... 135, 136, 137 Next

ELOG V3.1.4-2e1708b5