ELOG Midas

Back Midas Rome Roody Rootana

Midas DAQ System, Page 114 of 152

Not logged in

Find | Login | Help

New entries since:

Wed Dec 31 16:00:00 1969

Full | Summary | Threaded | Hide attachments

3027 Entries

Goto page Previous 1, 2, 3 ... 113, 114, 115 ... 150, 151, 152 Next

ID	Date	Author	Topic	Subject
2270	19 Aug 2021	Konstantin Olchanski	Bug Report	select() FD_SETSIZE overrun
I am looking at the mlogger in the ALPHA anti-hydrogen experiment at CERN. It is mysteriously misbehaving during run start and stop. The problem turns out to be with the select() system call. The corresponding FD_SET(), FD_ISSET() & co operate on a an array of fixed size FD_SETSIZE, value 1024, in my case. But the socket number is 1409, so we overrun the FD_SET() array. Ouch. I see that all uses of select() in midas have no protection against this. (we should probably move away from select() to newer poll() or whatever it is) Why does mlogger open so many file descriptors? The usual, scaling problems in the history. The old midas history does not reuse file descriptors, so opens the same 3 history files (.hst, .idx, etc) for each history event. The new FILE history opens just one file per history event. But if the number of events is bigger than 1024, we run into same trouble. (BTW, the system limit on file descriptors is 4096 on the affected machine, 1024 on some other machines, see "limit" or "ulimit -a"). K.O.
2274	06 Sep 2021	Konstantin Olchanski	Forum	mhttpd crash
> [mhttpd,ERROR] [mhttpd.cxx:18886:on_work_complete,ERROR] Should not send response to request from socket 28 to socket 26, abort! > Can anybody hint me what is going wrong here? > The bad thing on the crash is, that sometimes it is leading to a "chain-reaction" killing multiple midas frontends, which essentially stop the experiment. This is my code. I am the culprit. I had a bit of discussion about this with Stefan. Bottom line is something is rotten in the multithreading code inside mhttpd and under conditions unknown, it sends the wrong data into the wrong socket. This causes midas web pages to be really confused (RPC replies processed as CSS file, HTML code processed at RPC replies, a mess), this wrong data is cached by the browser, so restarting mhttpd does not fix the web pages. So a mess. I find this is impossible to replicate, and so cannot debug it, cannot fix it. Best I was able to do is to add a check for socket numbers, and thankfully it catches the condition before web browser caches become poisoned. So, broken web pages replaced by mhttpd crash. This situation reinforces my opinion that multi-threading and C++ classes "do not mix" (like H2 and O2 do not mix). If you write a multithreaded C++ program and it works, good for you, if there is a malfunction, good luck with it, C++ just does not have any built-in support for debugging typical multithreading problems. I think others have come to the same conclusion and invented all these new "safe" programming languages, like Rust and Go. Back to your troubles. 1) If you see a way to replicate this crash, or some way to reliably cause the crash within 5-10 minutes after starting mhttpd, please let me know. I can work with that and I wish to fix this problem very much. 2) My "wrong socket" check calls abort() to produce a core dump. In my experience these core dumps are useless for debugging the present problem. There is just no way to examine the state of each thread and of each http request using gdb by hand. 3) this abort() causes linux to write a core dump, this takes a long time and I think it causes other MIDAS program to stop, timeout and die. You can try to fix this by disabling core dumps (set "enable core dumps" to "false" in ODB and set core dump size limit to 0), or change abort() to exit(). (You can also disable the "wrong socket" check, but most likely you will not like the result). 4) run mhttpd inside a script: "while (1) { start mhttpd; sleep 1 sec; rinse, repeat; }" (run mhttpd without "-D", yes?) In other news, the mongoose web server library have a new version available, they again changed their multithreading scheme (I think it is an improvement). If I update mhttpd to this new version, it is very likely the code with the "wrong socket" bug will be deleted. (with new bugs added to replace old bugs, of course). K.O.
2284	11 Oct 2021	Konstantin Olchanski	Forum	test
test, no email. K.O.
2285	11 Oct 2021	Konstantin Olchanski	Forum	test
> test, no email. K.O. test reply, no email. K.O.
2286	11 Oct 2021	Konstantin Olchanski	Forum	test
> > test, no email. K.O. > > test reply, no email. K.O. test attachment, no email. K.O.
Attachment 1: image.png

2287	11 Oct 2021	Konstantin Olchanski	Forum	test
> > > test, no email. K.O. > > > > test reply, no email. K.O. > > test attachment, no email. K.O. test email. K.O.
2288	11 Oct 2021	Konstantin Olchanski	Forum	midas forum updated, moved
The midas forum software (elogd) was updated to latest version and moved from our old server (ladd00.triumf.ca) to our new server (daq00.triumf.ca). The following URLs should work: https://daq00.triumf.ca/elog-midas/Midas/ (new URL) https://midas.triumf.ca/elog/Midas/ (old URL, redirects to daq00) https://midas.triumf.ca/forum (link from midas wiki) The configuration on the old server ladd00.triumf.ca is quite tangled between several virtual hosts and several DNS CNAMEs. I think I got all the redirects correct and all old URLs and links in old emails & etc still work. If you see something wrong, please reply to this message here or email me directly. K.O.
2311	26 Jan 2022	Konstantin Olchanski	Forum	.gz files
> I adapted our analyzer to compile against the manalyzer included in the midas repo. > TMReadEvent: error: short read 0 instead of -1193512213 I think this problem is fixed in the latest version of midasio and manalyzer, but this update was not pulled into midas yet. (Canada is in the middle of a covid wave since December). What happens is you do not have the gzip library installed on your computer and your analyzer is built without support for gzip. The fix is done the hard way, the gzip library is no longer optional, but required. You do not say what linux you use, so I cannot give exact instructions, but for: ubuntu: apt -y install libz-dev centos7: installed by default centos8: installed by default debian11/raspbian: same as ubuntu K.O.
2312	26 Jan 2022	Konstantin Olchanski	Forum	Device driver for modbus
> Dear all, does anyone have an example of for a device driver using modbus or modbus tcp to communicate with a device and willing to share it? Thanks. I have not seen any modbus devices recently, so all my code and examples are quite old. Basic modbus/tcp communication driver is in the midas repo: daq00:midas$ find . \| grep -i modbus ./drivers/divers/ModbusTcp.cxx ./drivers/divers/ModbusTcp.h daq00:midas$ This driver worked for communication to a modbus PLC (T2K/ND280/TPC experiment in Japan). An example program to use this driver and test modbus communication is here: https://bitbucket.org/expalpha/agdaq/src/master/src/modbus.cxx Because at the end, we do not have any modbus devices in any recent experiment, I do not have any example of using this driver in the midas frontend. Sorry. K.O.
2313	26 Jan 2022	Konstantin Olchanski	Bug Report	Writting MIDAS Events via FPGAs
> today I did not get the data into MIDAS. Any error messages printed by the frontend? any error message in midas.log? core dumps? crashes? I do not understand what you mean by "did not get the data into midas". You create events and send them to a midas event buffer and you do not see them there? With mdump? Do you see this both connected locally and connected remotely through the mserver? BTW, I see you are using the mfe.c frontend. Event data handling in mfe.c frontends is quite convoluted and impossible to straighten out. I recommend that you use the tmfe c++ frontend instead. Event data handling is much simplified and is easier to debug compared to the mfe.c frontend. There is examples in the midas repository and there are tutorials for converting frontends from mfe.c to tmfe posted in this forum here. BTW, the commit you refer to only changed some html files, could not have affected your data. K.O.
2314	26 Jan 2022	Konstantin Olchanski	Bug Report	some frontend kicked by cm_periodic_tasks
> The problem is that eventually some of frontend closed with message > :19:22:31.834 2021/12/02 [rootana,INFO] Client 'Sample Frontend38' on buffer > 'SYSMSG' removed by cm_periodic_tasks because process pid 9789 does not exist This messages means what it says. A client was registered with the SYSMSG buffer and this client had pid 9789. At some point some other client (rootana, in this case) checked it and process pid 9789 was no longer running. (it then proceeded to remove the registration). There is 2 possibilities: - simplest: your frontend has crashed. best to debug this by running it inside gdb, wait for the crash. - unlikely: reported pid is bogus, real pid of your frontend is different, the client registration in SYSMSG is corrupted. this would indicate massive corruption of midas shared memory buffers, not impossible if your frontend misbehaves and writes to random memory addresses. ODB has protection against this (normally turned off, easy to enable, set ODB "/experiment/protect odb" to yes), shared memory buffers do not have protection against this (should be added?). Do this. When you start your frontend, write down it's pid, when you see the crash message, confirm pid number printed is the same. As additional test, run your frontend inside gdb, after it crashes, you can print the stack trace, etc. > > in the meantime mserver loggging : > mserver started interactively > mserver will listen on TCP port 1175 > double free or corruption (!prev) > double free or corruption (!prev) > free(): invalid next size (normal) > double free or corruption (!prev) > Are these "double free" messages coming from the mserver or from your frontend? (i.e. you run them in different terminals, not all in the same terminal?). If messages are coming from the mserver, this confirms possibility (1), except that for frontends connected remotely, the pid is the pid of the mserver, and what we see are crashes of mserver, not crashes of your frontend. These are much harder to debug. You will need to enable core dumps (ODB /Experiment/Enable core dumps set to "y"), confirm that core dumps work (i.e. "killall -SEGV mserver", observe core files are created in the directory where you started the mserver), reproduce the crash, run "gdb mserver core.NNNN", run "bt" to print the stack trace, post the stack trace here (or email to me directly). > > I can find some correlation between number of events/event size produced by > frontend, cause its failed when its become big enough. > There is no limit on event size or event rate in midas, you should not see any crash regardless of what you do. (there is a limit of event size, because an event has to fit inside an event buffer and event buffer size is limited to 2 GB). Obviously you hit a bug in mserver that makes it crash. Let's debug it. One thing to try is set the write cache size to zero and see if your crash goes away. I see some indication of something rotten in the event buffer code if write cache is enabled. This is set in ODB "/Eq/XXX/Common/Write Cache Size", set it to zero. (beware recent confusion where odb settings have no effect depending on value of "equipment_common_overwrite"). > > frontend scheme is like this: > Best if you use the tmfe c++ frontend, event data handling is much simpler and we do not have to debug the convoluted old code in mfe.c. K.O. > > poll event time set to 0; > > poll_event{ > //if buffer not transferred return (continue cutting the main buffer) > //read main buffer from hardware > //buffer not transfered > } > > read event{ > // cut the main buffer to subevents (cut one event from main buffer) return; > //if (last subevent) {buffer transfered ;return} > } > > What is strange to me that 2 frontends (1 per remote pc) causing this. > > Also, I'm executing one FEcode with -i # flag , put setting eventid in > frontend_init , and using SYSTEM buffer for all. > > Is there something I'm missing? > Thanks. > A.
2315	26 Jan 2022	Konstantin Olchanski	Bug Report	Off-by-one in sequencer documentation
> > 3 LOOP n,4 > > 4 MESSAGE $n,1 > > 5 ENDLOOP > > Indeed you're right. The loop variable runs from 1...n. I fixed that in the documentation. Shades/ghosts of FORTRAN. c/c++/perl/python loops loop from 0 to n-1. K.O.
2316	26 Jan 2022	Konstantin Olchanski	Info	MityCAMAC Login
For those curious about CAMAC controllers, this one was built around 2014 to replace the aging CAMAC A1/A2 controllers (parallel and serial) in the TRIUMF cyclotron controls system (around 50 CAMAC crates). It implements the main and the auxiliary controller mode (single width and double width modules). The design predates Altera Cyclone-5 SoC and has separate ARM processor (TI 335x) and Cyclone-4 FPGA connected by GPMC bus. ARM processor boots Linux kernel and CentOS-7 userland from an SD card, FPGA boots from it's own EPCS flash. User program running on the ARM processor (i.e. a MIDAS frontend) initiates CAMAC operations, FPGA executes them. Quite simple. K.O.
2318	26 Jan 2022	Konstantin Olchanski	Forum	Issue in data writing speed
Francesco, when you say "writing an event is slow", do you mean it in the frontend or in the output data file? Stefan is quite right about the data file, it can take seconds between generating an event in the frontend and seeing it written to the data file. (if compression buffers are too big, an event can sit there forever, until pushed out by next events or by run stop). But maybe you see this on the frontend side. What you are looking at is "real time" performance of the frontend and of the linux kernel. The mfe.c frontend has many problems with real time performance, it can stall and take a long time between calls to read_event(), for many reasons. There are ways around that, but it is simpler to switch to the tmfe c++ frontend that was designed for good real time performance. In the tmfe frontend, if you use the polled equipment and enable the poll thread, your frontend will be limited only by the linux kernel real time performance (i.e. on a single-core CPU, other programs will delay execution of your frontend and you will see it as long delays (usec, millisec) between calls to your read_event(). Next limit to real time performance (common to mfe.c and tmfe frontends) is the writing of event data to the midas shared event buffer. One has to lock the shared memory semaphore and this has to wait until other users of the event buffer finish their reading or writing and unlock it. Arbitrary amount of time (usec, millisec, sec) can pass. (there is also problems with "fairness" of the linux semaphores, a different story, again). Making things more interesting, midas event buffers implement a write cache (default size 100 kbytes), events smaller than the cache are quickly accumulated (no need to lock the shared memory semaphore), them flushed to shared memory when cache is full. This is done to reduce the number of shared memory semaphore locks per event, in the case of very high rate of very small events. Solution to all this is to use 2 threads: read the data from hardware in one thread and write the data to midas in a different thread. Between the threads would be an event fifo (circular buffer in mfe.c, std::deque<EVENT> in tmfe c++ frontends). For remote connected frontends, things are a bit different. Event data is written directly into the TCP socket and as long as socket buffers are big enough, there is no real-time delays, unless SYSTEM buffer is very congested and mserver does not read the TCP socket quickly enough. So depending on event size, data rate and tcp socket buffer size, the extra 2nd thread may not be necessary and poll thread real time performance may be good enough. I hope this clarifies the situation somewhat. K.O. > Dear all, > I've a frontend writing a quite big bunch of data into a MIDAS bank (16bit output from a 4MP photo camera). > I'm experiencing a writing speed problem that I don't understand. When the photo camera is triggered at a low rate (< 2 Hz) > writing into the bank takes a very short time for each event (indeed, what I measure is the time to write and go back > into the polling function). If I increase the rate to 4 Hz, I see that writing the first two events takes a sort time, > but the third event takes a very long time (hundreds of ms), then again the fourth and fifth events are very fast, and > the sixth is very slow. If I further increase the rate, every other event is very slow. The problem is not in the readout > of the camera, because if I just remove the bank writing and keep the camera readout, the problem disappears. Can you > explain this behavior? Is there any way to improve it? > > Below you can also find the code I use to copy the data from the camera buffer into the bank. If you have any suggestion > to improve it, it would be really appreciated. > > Thank you very much, > Francesco > > > > const char* pSrc = (const char)bufframe.buf; > > for(int y = 0; y < bufframe.height; y++ ){ > > //Copy one row > const unsigned short pDst = (const unsigned short)pSrc; > > //go through the row > for(int x = 0; x < bufframe.width; x++ ){ > > WORD tmpData = pDst++; > > *pdata++ = tmpData; > > } > > pSrc += bufframe.rowbytes; > > } >
2320	26 Jan 2022	Konstantin Olchanski	Forum	Issue in data writing speed
> Francesco, when you say "writing an event is slow", do you mean it in the frontend > or in the output data file? Another explanation just occurred to me. We do not know your event size and we do not know the size of your SYSTEM buffer. But if you have an unlucky combination, this can happen: Consider event size is 6 Mbytes, buffer size is 8 Mbytes, enough space for only 1 event. First event is written quickly (buffer is empty). Second event will be delayed, there is not enough free space in the buffer, we have to wait for mlogger to finish reading the first event. Same thing happens if event size is 3 Mbytes, the first 2 events will write quickly, writing the 3rd event will be delayed until mlogger does it's thing. The mlogger reads the SYSTEM buffer "fast" and "quickly", but it can be delayed for a number of reasons, i.e. handling a history event, a delay writing to disk, a delay writing to network connected storage, etc. In general, it is best to size the SYSTEM buffer to hold about 1 second worth of data (of average size, average rate). If your event size is 4 Mbytes, and you record them at 10/sec, SYSTEM buffer should be at least 40 Mbytes big. (this is set in ODB /Experiment/Buffer Sizes). (MIDAS event buffer size is limited to 2 GBytes). K.O.
2321	26 Jan 2022	Konstantin Olchanski	Bug Report	Off-by-one in sequencer documentation
> > Shades/ghosts of FORTRAN. c/c++/perl/python loops loop from 0 to n-1. > > for (i=1 ; i<=10 ; i++); ;-) Similar code made big news just recently: (scroll down to the example main() program) https://blog.qualys.com/vulnerabilities-threat-research/2022/01/25/pwnkit-local-privilege-escalation- vulnerability-discovered-in-polkits-pkexec-cve-2021-4034 I forget if the FORTRAN rules were "loop once" or "never loop" or if it was different between Fortran-4, fortran-77, DEC extensions and IBM extension, or if it was a compiler switch. We should check that we do something reasonable with such loops to zero: LOOP n,0 MESSAGE $n,1 ENDLOOP P.S. Yup. "man g77" option "-fonetrip". K.O.
2322	26 Jan 2022	Konstantin Olchanski	Bug Report	Writting MIDAS Events via FPGAs
> > > Any error messages printed by the frontend? any error message in midas.log? core dumps? crashes? > > I do not understand what you mean by "did not get the data into midas". You create events > > and send them to a midas event buffer and you do not see them there? With mdump? > > Do you see this both connected locally and connected remotely through the mserver? > > I simply don't see the event counter counting up and I also don't see them using mdump. No logs, no dumps and no crashes - every is quite. I only tested it locally. > If you are connected locally (no mserver), I want to know the value returned by bm_send_event(). Simplest if you edit mfe.c and everywhere it calls bm_send_event() and rpc_send_event(), print the returned value. It would be very interesting to see if bm_send_event() returns 1 (SUCCESS), but the event vanishes without a trace. Before you do that, try something simpler: Run "mdump -s -d", it will print some event buffer internals. Watch to see if any data pointers change when you send your events ("wp", "rp", etc). If nothing changes at all, then we are not sending anything (fault is in your code or on mfe.c). If you see "wp" counting up, then we definitely write your events into the buffer and mdump & mlogger should see them. But there is some funny logic for event_id and trigger_mask and it is worth checking their values. For a good test, set event_id=1 and trigger_mask=0x1. There might be trouble if either is set to zero. K.O.
2323	26 Jan 2022	Konstantin Olchanski	Bug Report	Unknown Error 319 from client
> I�m trying to run MIDAS using a frontend code/client named �fetiglab�. Run stops > after 2/3sec with an error saying �Unknown error 319 from client �fetiglab� on > localhost. actually run never starts. > 11:46:32 [fetiglab,ERROR] [odb.cxx:11268:db_get_record,ERROR] struct size > mismatch for "/" (expected size: 1, size in ODB: 41920) this is the error that causes run start to fail. for reasons unknown your frontend is trying to do a db_get_record() from "/" (ODB root top directory). if this is an mfe.c frontend, I do not think I have ever seen it do something like this. so, a puzzle. K.O.
2324	26 Jan 2022	Konstantin Olchanski	Forum	mhttpd error
> > Enable IPv6 y > > Probably the IPv6 problem, see here elog:2269 > > I asked to turn off IPv6 by default, or at least mention this in the documentation, > but unfortunately nothing happened. But IPv4 and IPv6 code is completely separate, if IPv6 bind fails, IPv4 should still work. This is all very strange. It does not help that the OP does not say in which way things do not work, "the server is not accessible from other machines" is not an error message reported by any browser, and we do not know what URL he is using to access mhttpd - http: or https: Also he is enabling the "insecure" port 8081, I am pretty sure the documentation is pretty clear, either use the secure https port or the insecure port, but not both at the same time. In any case, I see current version of mongoose have removed support for password files, so all this stuff will likely become reworked and at the end mhttpd will only listen to localhost ports. To make it "accessible to other machines", one will have to use the apache https proxy. (or mtpcproxy from midas). K.O.
2329	07 Feb 2022	Konstantin Olchanski	Forum	MidasWiki moved from ladd00 to daq00.triumf.ca and updated to MediaWiki 1.35
MidasWiki moved from ladd00 (obsolete SL6) to daq00.triumf.ca (Ubuntu LTS 20.04) and updated from obsolete MediaWiki LTS 1.27.7 to MediaWiki LTS 1.35, supported until mid-2023, see https://www.mediawiki.org/wiki/Version_lifecycle Old URL https://midas.triumf.ca and https://midas.triumf.ca/MidasWiki/... redirect to new URL https://daq00.triumf.ca/MidasWiki/index.php/Main_Page All old links and bookmarks should continue to work (via redirect). To report problems with this MediaWiki instance and to request any changes in configuration or installed extensions, please reply to this message here. K.O.

Goto page Previous 1, 2, 3 ... 113, 114, 115 ... 150, 151, 152 Next

ELOG V3.1.4-2e1708b5