ELOG Midas

Back Midas Rome Roody Rootana

Midas DAQ System, Page 11 of 152

Not logged in

Find | Login | Help

Full | Summary | Threaded | Hide attachments

3026 Entries

Goto page Previous 1, 2, 3 ... 10, 11, 12 ... 150, 151, 152 Next

ID	Date	Author	Topic	Subject
321	11 Jan 2007	Steve Hardy	Forum	Shared memory problems
Thanks for your help. I tried again and it got me back to the initial problem I had. The frontend will start, and the analyzer starts (complains about there not being a last.root, but other than that it's fine), and then when starting mlogger, I get: [odb.c:860:db_validate_db] Warning: database corruption, first_free_key 0x0001A4 04 [odb.c:3666:db_get_key] invalid key handle [midas.c:1970:cm_check_client] cannot delete client info [odb.c:3666:db_get_key] invalid key handle [midas.c:1970:cm_check_client] cannot delete client info [odb.c:3666:db_get_key] invalid key handle And it continues to shoot out error messages about invalid key handles until I kill it. Then trying to start the frontend again fails until I remove the .ODB.SHM file. Any other ideas? > > Hello, > > > > Just did a fresh install of MIDAS from the SVN repository under CentOS and > > everything compiles fine, but when I go to run the frontend (using dio), I get > > the following error message: > > > > Connect to experiment ...[odb.c:868:db_open_database] Different database format: > > Shared memory is 14, program is 2 > > [midas.c:1763:cm_connect_experiment1] cannot open database > > > > > > Any ideas on what the problem could be, or how to fix it? > > You have an old .ODB.SHM from a previous version in your directoy (note the '.' in > front, so you need a 'ls -alg' to see it). Delete that file and try again.
2458	22 Feb 2023	Stefano Piacentini	Info	connection to a MySQL server: retry procedure in the Logger
Dear all, we are experiencing a connection problem to the MySQL server that we use to log informations. Is there an option to retry multiple times the I/O on the MySQL? The error we are experiencing is the following (hiding the IP address): [Logger,ERROR] [mlogger.cxx:2455:write_runlog_sql,ERROR] Failed to connect to database: Error: Can't connect to MySQL server on 'xxx.xxx.xxx.xxx:6033' (110) Then the logger stops, and must be restarted. This eventually happens only during the BOR or the EOR. Best, Stefano.
2464	07 Mar 2023	Stefano Piacentini	Info	connection to a MySQL server: retry procedure in the Logger
> > Dear all, > > > > we are experiencing a connection problem to the MySQL server that we use to log informations. Is there an > > option to retry multiple times the I/O on the MySQL? > > > > The error we are experiencing is the following (hiding the IP address): > > > > [Logger,ERROR] [mlogger.cxx:2455:write_runlog_sql,ERROR] Failed to connect to database: Error: Can't > > connect to MySQL server on 'xxx.xxx.xxx.xxx:6033' (110) > > > > Then the logger stops, and must be restarted. This eventually happens only during the BOR or the EOR. > > What would you propose? If the connection does not work, most likely the server is down or busy. If we retry, > the connection still might not work. If we retry many times, people will complain that the run start or stop > takes very long. If we then just continue (without stopping the logger), the MySQL database will miss important > information and the runs probably cannot be analyzed later. So I believe it's better to really stop the logger > so that people get aware that there is a problem and fix the source, rather than curing the symptoms. > > In the MEG experiment at PSI we run the logger with a MySQL database and we never see any connection issue, > except when the MySQL server gets in maintenance (once a year), but usually we don't take data then. Since we > use the same logger code, it cannot be a problem there. So I would try to fix the problem on the MySQL side. > > Best, > Stefan Dear Stefan, a possible solution could be to define the number of times to retry as a parameter that is 0 by default, as well as a wait time between two subsequent tries. This would leave the decision on how to handle a possible failed connection to the user. In our case, for example, we would prefer to not stop the acquisition in case of a failed connection to the external SQL. In addition, we have other software that, with a retry procedure, doesn’t fail: with 1 re-try and a sleep time of 0.5 s we already recover 100% of the faults. Anyway, we implemented a local database, which is a mirror of the external one, and the problems disappeared. Thanks, Stefano.
144	17 Jun 2003	Stefan Ritt		example experiment makefile for NT
I have added ROOT support to midas\examples\experiment\makefile.nt. To compile the example experiment under Windows, one needs 1) Installed version of ROOT 2) Having ROOTSYS environment variable defined 3) Invoke "nmake -f makefile.nt" in the midas\examples\experiment directory Please note that in the current release 3.05 of ROOT, sockets are not yet working under Windows, so the histogram server built into the analyzer cannot be accessed. It is however possible to output the analyzed data into a .root file and visualize it with the root browser like analyzer -i run00001.mid -o run00001.root
131	13 Oct 2003	Stefan Ritt		Array overruns in mhttpd.c::submit_elog()
> > While adding new functionality to submit_elog() (add the message text to the > > outgoing email), I noticed that the email text is being stored into an array > > of size 256, mail_text[256], without any checks for array overrun. This > > cannot be good. How should this be corrected? > > K.O. > > Similar problem exists in midas.c::el_submit(). The array "message[10000]" is > easy to overrun by submitting a long elog message. > > K.O. The whole elog functionality in mhttpd will be replaced (sometime) by the standalone ELOG package, linked against mhttpd. The ELOG functionality is much richer and does not conatin all the mentioned problems which have been fixed there some time ago. For the time being it might however be worth to fix the mentioned problems, but without spending too much time on it.
126	13 Oct 2003	Stefan Ritt		mhttpd: add Elog text to outgoing email.
> around to implement it, until now. I also added assert() traps for the most > common array overruns in the Elog code. In addition to the assert() one should use strlcat() and strlcpy() all over the code to avoid buffer overruns. The ELOG standalone code does that already properly. - Stefan
128	13 Oct 2003	Stefan Ritt		mhttpd: add Elog text to outgoing email.
> > > around to implement it, until now. I also added assert() traps for the most > > > common array overruns in the Elog code. > > > > In addition to the assert() one should use strlcat() and strlcpy() all over > > the code to avoid buffer overruns. The ELOG standalone code does that already > > properly. > > > > - Stefan > > Yes, the original authors should have used strlcat(). Now that I uncovered this source of mhttpd > memory corruption, maybe some volunteer will fix it up properly. > > K.O. I am the original author and will fix all that once I merged mhttpd and elog. Due to my current task list, this will happen probably in November. - Stefan
124	15 Oct 2003	Stefan Ritt		test
> > test > > test > > test > > another test > > K.O. I got the two email notifications, if you have tried that...
121	28 Oct 2003	Stefan Ritt		Updated thread functions
> ss_thread_create now returns the thread ID on success, and zero on failure. > Previously returned SS_SUCCESS or SS_NO_THREAD. User must now test the > return value to determine result. > > ss_thread_kill added to kill the passed thread ID. Returns SS_SUCCESS or > SS_NO_THREAD. > > Any thread creation must be verified now, and old code must be examined to > ensure the return value is checked. Thank you for that post. Internally, threads are not use in midas, so there should be no problem. Only experiments using threads explicitly should take care.
119	30 Oct 2003	Stefan Ritt		'umask' added to lazylogger for FTP connections
I had to add a 'umask' opiton to the loggers (lazy and mlogger) for the new PSI archive. One can now put a filename into the settings like: archive,21,user,pw,dir,run%05d.mid,026 where the optional last parameter is used for a "umask 026" command just sent to the FTP server after the connection has been established. This changes the mode bits of the newly transferred file. We needed that so that the files are group readable, since several people from one group want to read the data. I committed mlogger.c and ybos.c which contains the ftp code (should actually go into lazylogger.c instead of ybos.c).
118	30 Oct 2003	Stefan Ritt		Fixed several potential problems for ODB corruption
I just realized that db_set_value, db_set_data, db_set_num_values and db_merge_data do not check for num_values == 0. With such a parameter the ODB can become corrupted, since zero length ODB entries are not allowed. I fixed the according places in odb.c and committed the changes. Everyone with ODB corruption problems should update that code.
115	01 Nov 2003	Stefan Ritt		mana.c without ROOT and HBOOK
> Stephan, why did you prohibit building mana.c without ROOT and HBOOK > support? I think such a configuration is valid and should be allowed. Oops, sorry, my fault. I forgto that people use mana.c without ROOT and HBOOK. The reason I made the change was that people forgot the -DHVAE_HBOOK in their makefile. In that case, no HBOOK init is done in mana.c and the first histogram booking in the user code crashes HBOOK. So please take the #error statement out of mana.c (I'm away in two hours for one week), but think about preventing the above mentionend problem. I don't know any way for the makefile or mana.c to figure out if there is any HF1 call in the user code. Actually HF1 should return a "proper" error message than just crashing. One possibility is that we put an additional layer on top of the histogram boooking/filling. These macros are converted to their HBOOK or ROOT equivalents depending on the HAVE_HBOOK/HAVE_ROOT. If none of both is present, the histogram booking macro can produce a runtime error. This has the additional advantage that users can switch from HBOOK to ROOT without change of their user code.
112	01 Nov 2003	Stefan Ritt		Do not frob
> I found where we tickle the race condition in db_create_record(). > > 1) in mhttpd.c, every time we show the status page, we call > db_create_record(hDB, 0, "/Runinfo", strcomb(runinfo_str)); > 2) internally db_create_record() deletes /RunInfo > 3) other programs read "/runinfo/run number" while it is deleted do not > check for the db_get_value() error code and happily get a zero run number. > > Stephan fixed the race condition, and now I commited an mhttpd.c change that > only calls db_create_record(hDB, 0, "/Runinfo", strcomb(runinfo_str)); if > /runinfo does not exist. This seems to be redundant with a similar call in > cm_connect_experiment1(), called each time a new client starts up. The reason for the db_create_record() is the following: Assume that we change the /runinfo structure, by adding an additional variable in the future. If we run a "new" mhttpd on an "old" experiment, the "runinfo" C structure does not match the ODB contents. The db_create_record() ensures that the ODB structure exactly matches the C structure. I agree with you that this can cause potential problems. But most of them should be fixed by the additional lock() I added recently. So other programs cannot read the run number while it is deleted. One could think of checking the record size, and re-creating the runinfo if the ODB record size does not match the C record size. But this does not prevent the potential error that some variable are reversed in order. They are then mapped wrongly to the C runinfo structure. I see that you work very hard now on all possible checks for the run number. But I would not commit that and make it part of the distribution, since all experiments at PSI for example do not have this run number problem. Run it locally, determine the cause of your problem (the discovery of the race condition was already very good, I'm glad that your found it, should make the system much more stable), and we'll fix it. Puttin ASSERT's is a good idea, I should have done it from the very beginning. But if you start now, please put it in all other 100000 places (;-) I would not add a db_get_value_cannot_possibly_fail() into the standard distribution, because it probably cannot correct the initial problem and then just will go into an infinite loop. We should tackle problems always at their source. If you cannot resolve your zero run number problem, do the following: There is a cm_msg(MDEBUG, ...) which only puts a message into the shared memory, but not in midas.log. This can be used for real time debugging. Add those message temporarily in db_get_value() etc. to see what is going on. As soon as the run number goes to zero, stop all processes immediately (for example by locking the database with db_lock_database), and the look backwards in the sysmsg buffer to see what happened before the run number went to zero. - Stefan
108	01 Nov 2003	Stefan Ritt		more odb
> I added error checking to the places where we read "/runinfo/run number". In > general, I do this: > Affected files: > src/lazylogger.c > src/odbedit.c > src/mlogger.c > src/mfe.c > src/odb.c > src/mana.c > src/midas.c > src/mhttpd.c Now YOU broke the system by editing all these files with something I consider temporary debugging code. A run number of zero is VALILD. If I want to make sure a new experiment starts with run number #1, I put a run number of 0 into the ODB. So on the first start the number is incremented by one which results in run number from one. So please remove those checks which prevents me of doing that. Again, your "run number zero" problem is soemhow specific to your environment, and I would not put all these tests into the distribution, because this can have side effects, like that one I described above. - Stefan
110	14 Nov 2003	Stefan Ritt		more odb
Ok, I apologize. It's all ok. Thanks for clearifying. Concerning the assert's, it would be nice to be able to disable them in release code. Under Windows, the assert() is actually a macro which expands to zero if NDEBUG is defined. I believe it's the same under linux, but I don't know about VxWorks. So we have three options: 1) Keep asserts always. This might possible slow down a DAQ system, but I'm not sure how much. Might be negligible. 2) Disable asserts by default (standard make). Only the "experts" can enable it in the make file (by removing NDEBUG), since only they know what to do with the assertation messages. 3) Let the user decide on the standard installation. Maybe have two libraries, one debug, one no-debug. The no-debug can even have the compiler optimization disabled, which makes debugging easier. So what is your opinion (comments from others are welcome as well) of which way to go?
104	16 Nov 2003	Stefan Ritt		Phantom
I have seen the same behaviour and it annoys me, too. What I did in the past is a "cleanup" in ODBEdit which removes these open records. I have soem code in cm_watchdog(), which should take care of that. If a client is dead, it gets removed from the ODB, and its open records should get its notify_count decremented. So obviously this code has some bug. I plan to do in the following week (now I got some spare time) the following: - exchange most db_create_record() by something better. Maybe db_check_record(..., correct_flag), which creates the record only if it does not exist at all, otherwise checks the structure. If correct_flag is TRUE, it corrects the strucure (by calling db_create_record()), if it's false it just returns an error code. This way one can decide from case to case which option is better. Like for the /Runinfo, the flag would be FALSE, maybe with a notification that the /Runinfo is different from the compiled-in structure, and one hast to recompile the application. - revisit the open record issue from dying frontends. I remember vaguely that I tried to kill a frontend (kill -9), wait until the watchdog cleans up its entries, and it worked fine. So it's more the problem to reproduce the issue described in the previous elog entry.
98	17 Nov 2003	Stefan Ritt		Revised MVMESTD
Let me propose a revised scheme for midas standard VME calls (mvmestd.h). Pierre mentioned some limitations before, and I find now also some fields to improve. Right now, the vme_open() call retrieves a handle. For some interfaces (like SBS/Bit3), one has to obtain separate handles for different addressing modes A24D32/A32D32 and so on, which I find a bit troublesome. I would rather keep the handle internally, invisible to the user, and use ioctl() statments to change the address/data mode. So the API could look like: vme_open() Deprecated, will be removed vme_init(void) Standard initialization, open device(s), stores handles internally in a table vme_exit(void) Deallocates any memory, close handles vme_read(void dst, DWORD vme_addr, DWORD size) vme_write(void src, DWORD vme_addr, DWORD size) vme_ioctl(int request, int param) Request is one of VME_IOCTL_CRATE_SET/GET Sets VME crate (in case several interfaces are plugged into singlePC, meaningless for embedded CPUs) VME_IOCTL_DEST_SET/GET VME_BUS/VME_RAM/VME_LM for VME bus, RAM in VME interface, or LM for local memory (used in Bit3 interface) VME_IOCTL_AMOD_SET/GET Sets/Retrieves VME AMOD (= VME_AMOD_xxx as currently defined in mvmestd.h) VME_IOCTL_DSIZE_SET/GET Sets/Retrieves VME data size (D8/D16/D32/D64) VME_IOCTL_DMA_SET/GET Enable/Disable DMA, should be independent of AMOD VME_IOCTL_INTR_ATTACH/DETACH/ENABLE/DISABLE Set VME interrupts VME_IOCTL_AUTO_INCR_SET/GET Set autoincremet of source pointer, can be disabled for FIFO readout vme_mmap(void ptr, DWORD vme_addr, DWORD size) vme_unmap(void ptr, DWORD size) Map/Unmap VME to local memory vme_read2(void dst, DWORD vme_addr, DWORD size, DWORD flags) vme_write2(void src, DWORD vme_addr, DWORD size, DWORD flags) With these functions one can directly specify the flags usually managed by vme_ioctl(). Usefule for applications where the address modifier for example has to be different in each read/write operation. Note that the vme_read/write functions do not have a VME handle any more, nor an address modifier. This is all accomplished with vme_ioctl() calls. Please have a look at this proposal, compare it with what you do currently in VME, and let me know if we should add/modify something. I volunteer to implement the API for the SBS/Bit3 617 and the Struck SIS1100/3100 interfaces, for VxWorks somebody at TRIUMF should take care.
82	20 Nov 2003	Stefan Ritt		Implementation of db_check_record()
As Konstantin pointed out correctly, the db_create_record() call is pretty heavy since it copies whole structures around the ODB. Therefore, it should not used frequently. It might be that several problems are caused by that, for example the "phantom" records reported in elog:40 . I have therefore implemented the function db_check_record(HNDLE hDB, HNDLE hKey, char keyname, char rec_str, BOOL correct) which takes an ASCII structure in the same way as db_create_record(), but only checks this ASCII structure against the ODB contents without writing anything to the ODB. If the record does not exist at all, it is created via db_create_record(). This is useful for example with the /Runinfo structure on a virgin ODB. If the parameter "correct" is FALSE, the function returns DB_STRUCT_MISMATCH if the ODB contents is wrong (wrong order of variables, wrong name of variables, wrong type or array size). The calling function should then abort, since a subsequent db_open_record() would fail. Note that although abort() is useful, one should add cm_disconnect_experiment() just before the abort() in order to have the application "log out" from the ODB gracefully. If the parameter "correct" is TRUE, the function db_create_record() is called internally to correct a mismatching record. I have changed most calls of db_create_record() in mhttpd.c, mfe.c, mana.c and mlogger.c. Pierre, could you do the same for lazylogger.c? I also started to put assert()'s everywhere and encourage everyone to follow. Under Windows, the asserts() are removed automatically if compiling in "Release" mode. So I committed many changes, did some quick tests, but am not 100% convinced that all the changes are good. So please use the new code cautiously, and let me know if there is any new problem. I also would like to get some feedback if the whole thing becomes more stable now.
105	20 Nov 2003	Stefan Ritt		Phantom
I tried to reproduce the problem, but without success. So in case this happens again, one should debug the code im cm_watchdog() next to the line /* decrement notify_count for open records and clear exclusive mode */ ... So if a killed client is removed from the ODB via the watchdog (or a "cleanup" is done in ODBEdit), the notify_count should be decreased and thus the "open records" should be closed.
93	20 Nov 2003	Stefan Ritt		cannot shutdown defunct clients
> 1) shutdown from mhttpd "programs" page -> "cannot shutdown client" > 2) "sh mhttpd" from odbedit -> > [midas.c:5298:cm_shutdown] cannot connect to client mhttpd on host > midtig01.triumf.ca, port 32853 > Client mhttpd not active > 3) in odbedit: "cd /system/clients; rm xxxx" > refuses to delete the key Have you tried a "cleanup" in ODBEdit? The "last_activity" is a 32-bit int, filled with milliseconds. So indeed it wraps around after about one month. So if a all clients are stopped simultaneously the hard way (such that nobody's watchdog can clean any other client from the ODB), like with a power off, and you start the thing one month later, there might be a problem. I never tried that before. So next time to a cleanup. If that does not help, we should change last_activity from INT to DWORD. This way it's alway positive and the wraparound does not hurt.

Goto page Previous 1, 2, 3 ... 10, 11, 12 ... 150, 151, 152 Next

ELOG V3.1.4-2e1708b5