ELOG Midas

Back Midas Rome Roody Rootana

Midas DAQ System, Page 72 of 150

Not logged in

Find | Login | Help

Full | Summary | Threaded | Hide attachments

2994 Entries

Goto page Previous 1, 2, 3 ... 71, 72, 73 ... 148, 149, 150 Next

ID	Date	Author	Topic	Subject
166	13 Oct 2004	Konstantin Olchanski	Bug Report	TWIST upgrade bombed...
> > The upgrade of TWIST to the latest midas has bombed- we see mevb and mlogger > > crashes during shared memory data buffer accesses. I am looking into it and I > > will add information as I figure things out. K.O. > > Since 1.9.5 the EventBuilder has been modified. Please consult the documentation > where the new mevb scheme is explained. > Test of the mevb with up to 16 frontends (15 different CPUs) has been tested > successfully. Data rate at the EventBuilder were measured about 50MB/s without the > logger and ~30MB/s with the logger. It turns out that TWIST uses a private mevb.c. We will consider upgrading to the standard one. K.O.
168	14 Oct 2004	Konstantin Olchanski	Bug Report	lazylogger complains about zero-size files
With latest midas, I see this: Thu Oct 14 19:31:17 2004 [Lazy_Tape] [lazylogger.c:1717:Lazy] lazy_file_exists file run17567.ybs doesn't exists Thu Oct 14 19:31:27 2004 [Lazy_Tape] [lazylogger.c:1717:Lazy] lazy_file_exists file run17567.ybs doesn't exists The file run17567.ybs has size zero: -rw-r--r-- 1 twistonl users 950272 Oct 13 19:29 /twist/data_onl/current/run17565.ybs -rw-r--r-- 1 twistonl users 950272 Oct 13 19:45 /twist/data_onl/current/run17566.ybs -rw-r--r-- 1 twistonl users 0 Oct 13 20:00 /twist/data_onl/current/run17567.ybs -rw-r--r-- 1 twistonl users 983040 Oct 13 20:03 /twist/data_onl/current/run17568.ybs -rw-r--r-- 1 twistonl users 950272 Oct 13 20:26 /twist/data_onl/current/run17569.ybs I am not sure how to fix this lazylogger logic. Please help. K.O.
169	14 Oct 2004	Konstantin Olchanski	Bug Report	TWIST upgrade bombed...
> The upgrade of TWIST to the latest midas has bombed- we see mevb and mlogger > crashes during shared memory data buffer accesses. I am looking into it and I > will add information as I figure things out. K.O. On second try, it looks like we are in business- the first try did not work because of two mistakes: 1) I did not delete all old .SHM files (.ODB.SHM, .SYSTEM.SHM, .YBUF1.SHM, .YBUF2.SHM). I deleted ODB.SHM, so odb worked, but forgot about the data buffers SYSTEM.SHM & co and ended up with segmentation faults and core dumps in the buffer management code caused by a mismatch of the old-midas buffers and new-midas code. 2) while debugging these core dumps, I made an error in my test code, so even after I deleted the old data buffers, things still did not work. Talk about over-debugging a problem... K.O.
170	22 Oct 2004	Konstantin Olchanski	Bug Fix	mhttpd message colouring
I commited a fix to mhttpd logic that decides which messages should be shown in "red" colour- before, any message with square brackets and colons would be highlighted in red. Now only messages matching the pattern [...:...] are highlighted. The decision logic was moved into a function message_red(). K.O.
177	14 Dec 2004	Konstantin Olchanski	Forum	use of assert in mhttpd
> We've had mhttpd aborting regularly since upgrading from midas-1.9.3. This > happens during elog queries, and is due to an elog file that was incorrectly > modified by hand. (sorry for delayed reply, for reasons unknown, I did not get an email notice when this was posted) Yes, I agree, error handling in midas elog code is insufficient (note missing error checks for read() and lseek() system calls). Anything but "perfect" elog files would cause funny errors and malfunctions. > The modification to the file occurred 6 months ago. > el_retrieve(midas.c:15683) now has several assert statements, one of which > aborts the program on reading the bad entry. I added those to fix problems with "broken last NN days" and with infinite looping in the elog code that we observed in TWIST. You are welcome to replace the assert() statements with proper error handling. I used to have some code that could report the filename of the bad elog file. Can we also report the exact file location for broken files. Please send me the diff, I will commit it to midas cvs. > Why is assert used, instead of an error return from the function (if > necessary), and maybe an error message in the log file? Assert statements are > often removed, using NDEBUG, for normal use. I use assert() in several ways: 0) I want a core dump each time X happens. (This is the only reasonable action when facing memory/stack corruption. The problems in the elog code were stack corruption). 1) "I am too lazy to write proper error handling code" so I just crash and burn. This includes the case where "proper error handling" would be "too invasive". 2) the error is too bad (or too deep) and there is no reasonable way to recover. Print an error message and dump core (for later analysis). I sometimes use "cm_msg(); abort()". (assert is "printf("error"); abort()") Please refer to literature for philosophic discussions on uses of assert() (Argh! Stefan will have my head again!), but I will mention that "abort() early, abort() often" I find very effective. BTW, this technique is heavily used in the Linux kernel (oops(), bug(), panic()) with some good effect, too. > The problem elog entry had one character removed, so end-of-file came before > the end of the message. This could probably occur without the file being > altered, if the disk containing the elog fills. Yes, I think you are right. In TWIST, we have seen disk-full conditions break both elog and history. K.O.
178	14 Dec 2004	Konstantin Olchanski	Info	Commit local TWIST modifications
I am commiting MIDAS modification accumulated during the last few months of running TWIST: 1) system.c::ss_shm_open() fail if trying to map a file that is smaller than we expect. 2) midas.c::bm_lock_buffer(), el_submit(), el_delete_message(): do not wait for mutexes forever, use a 5 minute timeout. If we can't get the lock, cm_msg()/abort(). The above helps dealing with complete midas freezes. I also have code to keep track of "who locked the mutex and is still holding it?!?" but it is way too ugly to commit. I wish we had a "lockedByPid" entry for all lockable objects. K.O.
179	14 Dec 2004	Konstantin Olchanski	Info	Commit local TWIST modifications
> I am commiting MIDAS modification accumulated during the last few months of running TWIST: More: - mfe.c: in error messages "cannot find statistics record", also print the name of the record we are looking for. - mlogger.c: in warning message "Write operation took N ms", report the name of the offending data stream. - system.c: do not chdir("/") in ss_daemon_init()- it prevents us from ever getting core dumps from midas daemons. The old behaviour is trivially restored by "cd /" before starting the daemon; or by "limit coredumpsize 0". - odb.c: db_validate_db() detect and break infinite looping on free list corruption. K.O.
180	14 Dec 2004	Konstantin Olchanski	Info	mhttpd: Commit local TWIST modifications
> > I am commiting MIDAS modification accumulated... mhttpd changes: - Renee's improvements on http transaction logging - Implement "minimum" and "maximum" clamping for history graphs. Unfortunately there is no GUI code for changing the "minimum" and "maximum" settings, other than directly frobbing the odb. - When making history graphs, detect NaNs in the history data. (- status page code for the TWIST event builder (precursor of the standard event builder) stays uncommited). K.O.
186	16 Dec 2004	Konstantin Olchanski	Info	"cd /" in ss_daemon_init(), was- Commit local TWIST modifications
> > - system.c: do not chdir("/") in ss_daemon_init()- it prevents us from ever > > getting core dumps from midas daemons. > > The chdir("/") is from one of the unix text books. They say you HAVE to do it. If you start a > daemon on an NFS file system, you cannot unmount that file system as long as the daemon is > running. Right, I remember this NFS problem from a while back. This problem does not exist in the current crop of Linux systems (since Red Hat 7.3 at least) - they either kill off all user programs or use "umount -f" and "umount -l". "umount -l" works in any case to unmount a "busy" filesystem. For systems where the NFS problem does still exist, one should do this: "mlogger -D" becomes "(cd /; mlogger -D)". So I suspect that the "cd /" advice from the unix programming book is no longer as necessary as it used to be. (Perhaps a better advice would have been to "cd /tmp", so we could still get core dumps from non-root daemons). K.O.
191	20 Jan 2005	Konstantin Olchanski	Suggestion	HOWTO create ROOT objects in the MIDAS analyzer
With recent changes to mana.c, creation of user ROOT objects in the MIDAS analyser has changed. Here is the new example code for creating ROOT objects that are visible in ROODY and are saved into the histogram file. 1) in the "global" context (outside of any function) #include <TH1D.h> #include <TProfile.h> static TH1D* gMyHist1 = 0; static TProfile* gMyHist2 = 0; 2) In the analyzer "init" or "begin run" method, create the histogram: //extern TFolder *gManaHistosFolder; // from midas.h gMyHist1 = new TH1D("gMyHist1",...); gMyHist2 = new TProfile("gMyHist2",...); gManaHistosFolder->Add(gMyHist1); gManaHistosFolder->Add(gMyHist2); (note: this will produce an warning about "possible memory leak") 3) In the per-event method, fill the histograms gMyHist1->Fill(x); gMyHist2->Fill(x,y); 4) In the Makefile, where you compile the frontend, add "-DUSE_ROOT" right after "-I$(ROOTSYS)/include" K.O.
192	20 Jan 2005	Konstantin Olchanski	Bug Report	Persistency problem with h1_book() & co
The current h1_book() macros (and the previous example analyzer code) have an odd persistency problem: for example, the user wants to change some histogram limits, edits the h1_book() calls, rebuilds and restarts the analyzer, starts a new run, and observes that all histograms are filled using the old limits, his changes "did not take". The user panics, I get paged during the Holy Lunch Hour, everybody is unhappy. This is what I think happens: 1) analyzer starts 2) LoadRootHistgrams() loads old histograms from file 3) user code calls h1_book() 4) h1_book template in midas.h does this (roughly): hist = (TH1X *) gManaHistosFolder->FindObjectAny(name); if (hist == NULL) { hist = new TH1X(name, title, bins, min, max); 5) since the histogram already exists (loaded from the file, with the old limits), the TH1X constructor is not called at all, new histogram limits are utterly ignored. A possible solution is to unconditionally create the ROOT objects, like I do in the example code posted at http://dasdevpc.triumf.ca:9080/Midas/191. That code produces an annoying warning from ROOT about possible memory leaks. This could be fixed by adding a two liner to "find and delete" the object before it is created, trippling the number of user code lines per histogram (find & delete, then create). Highly ugly. midas.h macros (h1_book & co) can be fixed by adding checks for histogram limits and such, but I would much prefer a generic solution/convention that would work for arbitrary ROOT objects without MIDAS-specific wrappers (think TProfile, TGraph, etc...). Any suggestions? K.O.
200	25 Feb 2005	Konstantin Olchanski	Bug Fix	fixed: double free in FORMAT_MIDAS ybos.c causing lazylogger crashes
We stumbled upon and fixed a "double free" bug in src/ybos.c causing crashes in lazylogger writing .mid files in the FORMAT_MIDAS format (why does it use ybos.c? Pierre says- for generic file i/o). Why this code had ever worked before remains a mystery. K.O.
204	31 Mar 2005	Konstantin Olchanski	Info	ODB dump format switched to XML
> > All the XML functionality is implemented in the new mxml.c/h library > > mxml.c/h ... I separated it's CVS tree. > > The midas Makefile has been adjusted accordingly. Looks like the midas mxml Makefile bits did not make it to CVS. Current Makefile revision 1.67 does not have them and building midas from cvs sources fails because it does not find mxml.h and mxml.c K.O.
207	21 Apr 2005	Konstantin Olchanski	Bug Report	pointers and segfault in yb_any_file_rclose
> I'm getting segfaults in yb_any_file_rclose (closing a file opened with > yb_any_file_ropen with type MIDAS). > > I think there are bugs with freeing from uninitialized pointers my.pmagta, > my.pyh, and my.pylrl (which are only set when opening a YBOS file). These > should be set to NULL in yb_any_file_ropen (case MIDAS). Likewise, the MIDAS > format pointers my.pmp and my.pmrd should be NULLed for YBOS opens. > > It might be wise to also initialize the pointers in the "my" structure to null. Do you see this crash even after my fix to (another?) double free? K.O.
208	21 Apr 2005	Konstantin Olchanski	Suggestion	Correct MIDASSYS setting?
Current MIDAS versions nag me about setting the env.variable MIDASSYS to the "midas installation directory", but I do not have one, so what should I set MIDASSYS to? I checkout MIDAS from cvs into /home/olchansk/daq/midas, build it there, run it from there. I never do "make install" (I am not "root" on every machine; I am not the only MIDAS user on every machine). What should I set MIDASSYS to? K.O.
211	05 May 2005	Konstantin Olchanski	Bug Fix	fix: minor bit rot in the example experiment
I fixed some minor bit rot in the example experiment: a few minor Makefile problems, make the analyzer use the current histogram creation macros, etc. I also added startup and shutdown scripts. These will be documented as we work through them with our Summer student. K.O.
212	02 Aug 2005	Konstantin Olchanski	Bug Fix	fix odb corruption when running analzer for the first time
I have been plagued by ODB corruption when I run the analyzer for the first time after setting up the new experiment. Some time ago, I traced this to mana.c::book_ttree() and now I found and fixed the bug, fix now commited to midas cvs. In book_ttree(), db_find("/Analyzer/Bank switches") was returning an error and setting hkey to zero. Then we called db_open_record() with hkey==0, which cased ODB corruption later on. The normal db_validate_hkey() did not catch this because it considers hkey==0 to be valid (when most likely it is not). K.O.
213	18 Aug 2005	Konstantin Olchanski	Info	midas Makefile changes
Minor Makefile changes: - add "-m32" gcc flag to force 32-bit compilation on 64-bit Linux. - do not link ybos.o into lazylogger and mdump. K.O.
214	18 Aug 2005	Konstantin Olchanski	Info	CAMAC register_cnaf_callback()
Some time ago, the "remote CAMAC" functionality in mfe.c was made conditional on HAVE_CAMAC. This flag is not set by default so remote camac calls silently do not work, unless midas is compiled in a special way. I am too lazy to compile midas differently depending on what hardware I use, so I split register_cnaf_callback() into a separate file and made it easy to call directly from the user front end. I left the HAVE_CAMAC bits in mfe.c so people who use that would see no change. Affected files: Makefile (add cnaf_callback.o) midas.h (add void register_cnaf_callback(int debug); mfe.c (move the rpc code to cnaf_callback.c, call register_cnaf_callback()) cnaf_callback.c (new file) K.O.
215	18 Aug 2005	Konstantin Olchanski	Info	minor changes to run transition code
Minor changes to run transitions code: - improve debug messages - fail transition if cannot connect to one of the clients K.O.

Goto page Previous 1, 2, 3 ... 71, 72, 73 ... 148, 149, 150 Next

ELOG V3.1.4-2e1708b5