Back Midas Rome Roody Rootana
  Midas DAQ System, Page 23 of 146  Not logged in ELOG logo
New entries since:Wed Dec 31 16:00:00 1969
ID Date Author Topic Subjectdown
  595   16 Jun 2009 Konstantin OlchanskiBug Reportmlogger duplicate event problem
> > We have seen on several daq systems this problem: we start a run and observe that the number of 
> > events written by mlogger to the output file is double the number of events actually collected. Upon 
> > inspection of the output file, we see that every event is written twice. Restarting the run usually fixes 
> > this problem.
> 
> mlogger.c fixed svn rev 4497. (from tr_start(), call tr_stop() if somehow it was not called already by end-run transition).

There is a new problem: after an unsuccessful run start, the next run start bombs with the error "output file runNNN.mid already exists". One way around this is to 
manually remove the useless data file, another is to bump up the run number. Better solution is to automatically erase the output file created by unsuccessful run 
starts.

K.O.
  2624   13 Nov 2023 Ivo SchulthessForummlogger does not HAVE_ROOT
Good evening, 

I am setting up Midas (v2.1) for a new experiment. We want to save the data in the ROOT format. We installed ROOT from source (v6.28/06), and ROOTSYS is set. When we compile Midas, it says that it found ROOT. We set up a second logger channel where we set the filename to run%05d.root, the format to ROOT, and the output to ROOT. Nevertheless, when starting a run, the logger writes the error that "channel '1' requested ROOT output, but mlogger is built without HAVE_ROOT". From the CMake file, I would assume that it is set automatically if ROOT is found. Do you have any idea why the mlogger does not find ROOT or save the data in the ROOT format?

Thanks in advance for your ideas, input, and help. 

Cheers,
Ivo
  2625   13 Nov 2023 Konstantin OlchanskiForummlogger does not HAVE_ROOT
> I am setting up Midas (v2.1) for a new experiment. We want to save the data in the ROOT format. We installed ROOT from source (v6.28/06), and ROOTSYS is set. When we compile Midas, it says that it found ROOT. We set up a second logger channel where we set the filename to run%05d.root, the format to ROOT, and the output to ROOT. Nevertheless, when starting a run, the logger writes the error that "channel '1' requested ROOT output, but mlogger is built without HAVE_ROOT". From the CMake file, I would assume that it is set automatically if ROOT is found. Do you have any idea why the mlogger does not find ROOT or save the data in the ROOT format?

when you build midas using "make cmake", it prints information about packages that it finds (or does not). please post this here. it would be even more helpful if you post the whole output of "make cmake" (make cmake >& make.log, post make.log here as attachment).

historically, this problem has been a major annoyance over the years, mlogger would not find ROOT when needed, will find the wrong ROOT when not needed or ROOT at run time will be different from ROOT at build time. "cmake" has been of no help in improving on this, only made all debugging more difficult.

K.O.
  2626   13 Nov 2023 Stefan RittForummlogger does not HAVE_ROOT
When you do "cmake .." in the build directory, you will see

-- MIDAS: Found ROOT version xxx in yyy

which will tell you that ROOT has been found. Then you should check if it has been turned off manually by doing\

ccmake ..

in the build directory. You will then see all the control variables. Make sure NO_ROOT is turned OFF (meaning ROOT is enabled).

Finally, make sure you start "rmlogger" and not "mlogger". Only "rmlogger" contains the ROOT binding.

Stefan
  2627   14 Nov 2023 Konstantin OlchanskiForummlogger does not HAVE_ROOT
> Finally, make sure you start "rmlogger" and not "mlogger". Only "rmlogger" contains the ROOT binding.

Stefan is right. I forgot this. As solution to our troubles, mlogger is built without root support. use rmlogger instead.

K.O.
  2628   14 Nov 2023 Ivo SchulthessForummlogger does not HAVE_ROOT
> Stefan is right. I forgot this. As solution to our troubles, mlogger is built without root support. use rmlogger instead.
> 
> K.O.

Thanks, Stefan and Konstantin, for your feedback. 

So I checked the cmake file, and already the existence of the rmlogger shows that HAVE_ROOT was set. It was really only the problem of not being aware of rmlogger. This now works, and it produces root files that are readable. 

However, we encountered a new problem not that it does not find a bank that is produced by a multi-threaded slow-control frontend. The logger triggers the error "mlogger.cxx:3328:root_book_bank,ERROR] received unknown bank 'MSRD' in event #8". After this, we get a segmentation violation, but I guess this is then coming from the error. If we run only the polled FE, it works fine. If we run the polled and the multi-threaded FE with only the logger saving mid files, it works fine as well. Are you aware of issues with multi-threaded slow-control frontends and saving their banks in root format?

Cheers,
Ivo
  2629   14 Nov 2023 Stefan RittForummlogger does not HAVE_ROOT
No, I'm not aware of this problem, but I suspect that your events somehow got corrupted. You can try the mdump utility
or the "Event Dump" web page to peek into your events, maybe you see an issue there. To give you more detailed information,
I would have to reproduce your problem, which is probably hard without your hardware.

Stefan
  2630   15 Nov 2023 Ivo SchulthessForummlogger does not HAVE_ROOT
> No, I'm not aware of this problem, but I suspect that your events somehow got corrupted. You can try the mdump utility
> or the "Event Dump" web page to peek into your events, maybe you see an issue there. To give you more detailed information,
> I would have to reproduce your problem, which is probably hard without your hardware.
> 
> Stefan

Hi Stefan,

So I did a few things:
- I checked with mdump online, the data stream looks good, and I can see the bank name properly
- I checked with mdump offline the .mid files, the banks are there, and the data look good
- I removed the creating of the bank MSRD in the class driver. This stopped the writing of the data in the midas/root file but kept the stream to the history files. In principle, this is a quick and dirty fix because we still have all the data in the history files. Do you see any bigger problem with that solution?
- I tried to run the multi-threaded slow-control frontend with the generic class driver (generic.cxx) and the nulldev device driver (nulldev.cxx). This produces the DMND and MSRD bank with and also produces the error in the with the logger when trying to save in the root format (received unknown bank "DMND" in event #8). This means it is not related to the devices (maybe some other part of my user code of course). 

Cheers,
Ivo
  43   31 Aug 2004 Konstantin Olchanski mlogger crash if using mserver.
Our users keep making a simple mistake- they set MIDAS_SERVER_HOST in their
environement. Most midas programs do not mind this- they go through the
mserver, inefficient but benign- except for the mlogger, which dumps core
about 10 seconds after starting. This mightily confuses the users-
everything works perfectly, except for the mlogger, (for most users) the
most obscure and magical part of midas. Obviously they can't take data
without the mlogger and they fail to correlate this crash with editing their
.cshrc file, so we get panic calls at midnight or whenever. And every time,
while debugging midas malfunctions, changes to .cshrc is absolutely the last
place we look for. Ouch!

As it turns out, mlogger does crash if it uses the mserver-
log_system_history() calls db_lock_database(), with a prompt crash because
the mlogger is not directly connected to any ODB (it's mserver is).

Obviously, running the mlogger via the mserver makes no sense, but we should
warn about this rather than dump core.

I propose this patch to src/mlogger.c::log_system_history():

-   db_lock_database(hDB);
-   db_notify_clients(hDB, hist_log[index].hKeyVar, FALSE);
-   db_unlock_database(hDB);
-
+   if (!rpc_is_remote())
+     {
+       db_lock_database(hDB);
+       db_notify_clients(hDB, hist_log[index].hKeyVar, FALSE);
+       db_unlock_database(hDB);
+     }
+   else
+     {
+       cm_msg(MERROR, "log_system_history", "Warning: mlogger is running
remotely via the mserver. This is an unsupported configuration. Please unset
MIDAS_SERVER_HOST and restart the mlogger");
+     }

K.O.
  44   07 Sep 2004 Stefan Ritt mlogger crash if using mserver.
I trapped myself into that problem recently so it's the right time to fix it (;-).

We have two options: 

a) Make the logger work remotely, even if it's suboptimal and 
b) Make the logger refuse to run remotely. 

I have no case where I need to run the logger remotely, so I would opt for b).
This would mean removing the "-h" command line switch and the evaluation of
MIDAS_SERVER_HOST, or just supplying an empty host string to
cm_connect_experiment().

Let me know if you agree, I can then remove the "-h" option. The patch you
suggested I would apply in addition.

- Stefan
  45   15 Sep 2004 Konstantin Olchanski mlogger crash if using mserver.
> I trapped myself into that problem recently so it's the right time to fix it (;-).
> We have two options: 
> a) Make the logger work remotely, even if it's suboptimal and 
> b) Make the logger refuse to run remotely.

After some discussion between Stefan, Pierre and myself, it was decided to disallow
running mlogger remotely via the mserver.

K.O.
  507   17 Oct 2008 Konstantin OlchanskiInfomlogger async transitions, etc
As we were looking into problems with starting and stopping runs in one of our
daq systems, we found that the mlogger does something differently compared to
mhttpd and odbedit. Starting and stopping runs from mhttpd and odbedit works
correctly, but runs restarted by the file size limit in mlogger would often have
problems.

It turns out that mlogger calls cm_transition() with the ASYNC flag, while
mhttpd and odbedit always use SYNC.

The best I can tell, the ASYNC flag tells cm_transition() to fire off the
end-run rpc calls to all clients all at once, without waiting for reply from the
previous client before calling the next one. This effectively defeats the
transition sequence numbers - higher-numbered clients are told to end-run before
the lower-numbered clients have finished their end-run processing.

Most of the time, transition sequence numbers do not matter - all frontends can
stop at the same time, only mlogger has to be the very last, and for transitions
initiated by the mlogger itself, this sequencing is preserved.

It turns out that for our system, correct sequencing of individual frontends is
important, for example, the frontend controlling the trigger system has to stop
first. As we are using correctly adjusted transition sequence numbers, the right
sequence is always done when runs are started/stopped from mhttpd and from
odbedit, but not for runs started/stopped by the mlogger.

So by changing mlogger to always do SYNC transitions, we fixed our sequencing
problem - now runs always start and stop correctly.

But then we ran into a deadlock between the mlogger and the event builder:

1) mlogger wants to stop the run
2a) mlogger stops reading the SYSTEM buffer
2b) mlogger starts cm_transition(SYNC)
3) rpc call to trigger frontend, trigger is blocked (no new events are
generated, but existing data is still flowing through the system)
4) other frontends are stopped (data still flowing)
5) data still flowing through the system, into the event builder, into the
SYSTEM buffer
6) SYSTEM buffer becomes 100% full (mlogger is not reading it, it is busy inside
cm_transition()), event builder is waiting for free space inside bm_send_event()
7) mlogger issues end-run rpc call to event builder
8) deadlock: mlogger is waiting for a reply from the event builder, the event
builder is waiting for free space in the SYSTEM buffer (not processing rpc
calls), mlogger is supposed to empty the SYSTEM buffer, but it is waiting for an
rpc reply instead.

In our particular case, the dead lock was easy to avoid by making the SYSTEM
buffer big enough to accommodate all in-flight data, but the problem remains in
the general case. I suspect mlogger uses ASYNC transactions exactly to avoid
this type of deadlock (mlogger used ASYNC transactions since svn revision 2, the
beginning of time).

Personally, I am not happy about the inconsistency of run sequencing between
mlogger and mhttpd/odbedit (hmm... should also check mfe.c, it also stops runs
based on event count limits, etc). I think it would be better if all programs
did the same exact thing when starting/stopping runs. When mlogger does
something different, we get surprising unexpected behaviour, best avoided.

One possible solution could be to add an odb variable "/logger/async
transitions", set to "false" by default - to be consistent with other programs.
Systems that benefit from the old ASYNC behaviour and do not care about exact
sequencing can set this flag to "true".

K.O.
  508   18 Oct 2008 Stefan RittInfomlogger async transitions, etc
> I suspect mlogger uses ASYNC transactions exactly to avoid
> this type of deadlock (mlogger used ASYNC transactions since svn revision 2, the
> beginning of time).

That's exactly the case. If you would have asked me, I would have told you 
immediately, but it is also good that you re-confirmed the deadlock behavior with 
the SYNC flag. I didn't check this for the last ten years or so.

Making the buffers bigger is only a partial solution. Assume that the disk gets 
slow for some reason, then any buffer will fill up and you get the dead lock.

The only real solution is to put the logic into a separate thread. So the thread 
does all the RPC communication with the clients, while the main logger thread logs 
data as usual in parallel. The problem is that the RPC layer is not yet completely 
tested to be thread safe. I put some mutex and you correctly realized that these 
are system wide, but you want a local mutex just for the logger process. You need 
also some basic communication between the "run stop thread" and the "logger main 
thread". Maybe Pierre remembers that once there was the problem that the logger did 
not know when all events "came down the pipe" and could close the file. He added 
some delay which helped most of the time. But if we would have some communication 
from the "run stop thread" telling the main thread that all programs except the 
logger have stopped the run, then the logger only has to empty the local system 
buffer and knows 100% that everything is done.

In the MEG experiment we have the same problem. We need a certain sequence 
(basically because we have 9 front-ends and one event builder, which has to be 
called after the front-ends). We realized quickly that the logger cannot stop the 
run, so we wrote a little tool "RunSubmit", which is a run sequence with scripting 
facility. So you write a XML file, telling RunSubmit to start 10 runs, each with 
5000 events. RunSubmit now watches the run statistics and stops the run. Since it's 
outside the logger process, there is no dead lock. Unfortunately RunSubmit was 
written by one of our students and contains some MEG specific code. Otherwise it 
could be committed to the distribution.

So I feel that a separate thread for run stop (and maybe even start) would be a 
good thing, but I'm not sure when I will have time to address this issue.

- Stefan
  906   14 Sep 2013 Konstantin OlchanskiInfomktime() and daylight savings time
I would like to share with you a silly problem with mktime() and daylight savings time (Summer 
time/Winter time) that I have run into while working on the mhttpd history query page.

I am implementing 1 hour granularity for the queries (was 1 day granularity) and somehow all my queries 
were off by 1 hour.

It turns out that the mktime() and localtime() functions for converting between time_t and normal time 
units (days, hours) are not exact inverses of each other.

Daylight savings time (DST) is to blame.

While localtime() always applies the current DST,  mktime() will return the wrong answer unless tm_isdst is 
set correctly.

For tm_isdst, the default value 0 is wrong 50% of the time in most locations as it means "DST off" (whether 
that's Summer time or Winter time depends on your location).

Today in Vancouver, BC, DST is in effect, and localtime(mktime()) is off by 1 hour.

If I were doing this in January, I would not see this problem.

"man mktime" talks about "tm_isdst" special value "-1" that is supposed to fix this. But the wording of 
"man mktime" on Linux and on MacOS is different (I am amused by the talk about "attempting to divine 
the DST setting"). Wording at http://pubs.opengroup.org/onlinepubs/007908799/xsh/mktime.html is 
different again. MS Windows (Visual Studio) documentation says different things for different versions.

So for mhttpd I use the following code. First mktime() gets the approximate time, a call to localtime() 
returns the DST setting in effect for that date, a second mktime() with the correct DST setting returns the 
correct time. (By "correct" I mean that localtime(mktime(t)) == t).

time_t mktime_with_dst(const struct tm* ptms)
{
   // this silly stuff is required to correctly handle daylight savings time (Summer time/Winter time)
   // when we fill "struct tm" from user input, we cannot know if daylight savings time is in effect
   // and we do not know how to initialize the value of tms->tm_isdst.
   // This can cause the output of mktime() to be off by one hour.
   // (Rules for daylight savings time are set by national and local govt and in some locations, changes 
yearly)
   // (There are no locations with 2 hour or half-hour daylight savings that I know of)
   // (Yes, "man mktime" talks about using "tms->tm_isdst = -1")
   //
   // We assume the user is using local time and we convert in two steps:
   //
   // first we convert "struct tm" to "time_t" using mktime() with unknown tm_isdst
   // second we convert "time_t" back to "struct tm" using localtime_r()
   // this fills "tm_isdst" with correct value from the system time zone database
   // then we reset all the time fields (except for sub-minute fields not affected by daylight savings)
   // and call mktime() again, now with the correct value of "tm_isdst".
   // K.O. 2013-09-14

   struct tm tms = *ptms;
   struct tm tms2;
   time_t t1 = mktime(&tms);
   localtime_r(&t1, &tms2);
   tms2.tm_year = ptms->tm_year;
   tms2.tm_mon  = ptms->tm_mon;
   tms2.tm_mday = ptms->tm_mday;
   tms2.tm_hour = ptms->tm_hour;
   tms2.tm_min  = ptms->tm_min;
   time_t t2 = mktime(&tms2);
   //printf("t1 %.0f, t2 %.0f, diff %d\n", (double)t1, (double)t2, (int)(t1-t2));
   return t2;
}

K.O.
  909   24 Sep 2013 Stefan RittInfomktime() and daylight savings time
I vaguely remember that I had a similar problem with ELOG. The solution was to call tzset() at the beginning of the program. The man page says that 
this function is called automatically by programs using time zones, but apparently it is not. Can you try that? There is also the TZ environment 
variable and /etc/localtime. I never understood the details, but playing with these things can influence mktime() and localtime().

/Stefan
  911   24 Sep 2013 Konstantin OlchanskiInfomktime() and daylight savings time
> I vaguely remember that I had a similar problem with ELOG. The solution was to call tzset() at the beginning of the program. The man page says that 
> this function is called automatically by programs using time zones, but apparently it is not. Can you try that? There is also the TZ environment 
> variable and /etc/localtime. I never understood the details, but playing with these things can influence mktime() and localtime().

I confirm that the timezone is set correctly - I do get the correct time eventually - so there is no missing call to tzet().

K.O.
  912   24 Sep 2013 Stefan RittInfomktime() and daylight savings time
> > I vaguely remember that I had a similar problem with ELOG. The solution was to call tzset() at the beginning of the program. The man page says that 
> > this function is called automatically by programs using time zones, but apparently it is not. Can you try that? There is also the TZ environment 
> > variable and /etc/localtime. I never understood the details, but playing with these things can influence mktime() and localtime().
> 
> I confirm that the timezone is set correctly - I do get the correct time eventually - so there is no missing call to tzet().
> 
> K.O.

tzset() not only sets the time zone, but also DST.
  913   24 Sep 2013 Stefan RittInfomktime() and daylight savings time
> > > I vaguely remember that I had a similar problem with ELOG. The solution was to call tzset() at the beginning of the program. The man page says that 
> > > this function is called automatically by programs using time zones, but apparently it is not. Can you try that? There is also the TZ environment 
> > > variable and /etc/localtime. I never understood the details, but playing with these things can influence mktime() and localtime().
> > 
> > I confirm that the timezone is set correctly - I do get the correct time eventually - so there is no missing call to tzet().
> > 
> > K.O.
> 
> tzset() not only sets the time zone, but also DST.

I found following code in elogd.c, maybe it helps:

/* workaround for wong timezone under MAX OSX */
long my_timezone()
{
#if defined(OS_MACOSX) || defined(__FreeBSD__) || defined(__OpenBSD__)
   time_t tp;
   time(&tp);
   return -localtime(&tp)->tm_gmtoff;
#else
   return timezone;
#endif
}



void get_rfc2822_date(char *date, int size, time_t ltime)
{
   time_t now;
   char buf[256];
   int offset;
   struct tm *ts;

   /* switch locale temporarily back to english to comply with RFC2822 date format */
   setlocale(LC_ALL, "C");

   if (ltime == 0)
      time(&now);
   else
      now = ltime;
   ts = localtime(&now);
   assert(ts);
   strftime(buf, sizeof(buf), "%a, %d %b %Y %H:%M:%S", ts);
   offset = (-(int) my_timezone());
   if (ts->tm_isdst)
      offset += 3600;
   snprintf(date, size - 1, "%s %+03d%02d", buf, (int) (offset / 3600),
            (int) ((abs((int) offset) / 60) % 60));
}
  2610   26 Sep 2023 Stefan RittInfomjsonrpc_db_save / mjsonrpc_db_load have been dropped
The JavaScript function

mjsonrpc_db_save / mjonrpc_db_load

have been dropped from the API because they were not considered safe. Users 
should use now the new function

file_save_ascii()

and

file_load_ascii()

These function have the additional advantage that the file is not loaded 
directly into the ODB but goes into the JavaScript code in the browser, which 
can check or modify it before sending it to the ODB via mjsonrpc_db_paste(). 

Access of these functions is limited to <experiment>/userfiles/* where 
<experiment> is the normal MIDAS experiment directory defined by "exptab" or 
"MIDAS_DIR". This ensures that there is no access to e.g. system-level files. If 
you need to access a directory not under "userfile", us symbolic links.

These files can be combined with file_picker(), which lets you select files on 
the server interactively.

Stefan
  237   14 Dec 2005 Konstantin OlchanskiBug Reportmisc problems
I would like to document a few problems I ran into while setting up a new
experiment (two USB interfaces to Alice TPC electronics, plus maybe a USB
interface to CAMAC). I am using a midas cvs checkout from last October, so I am
not sure if these problems exist in the very latest code. I have fixes for all
of them and I will commit them after some more testing and after I figure out
how to commit into this new svn thingy.

- mxml: writing xml into an in-memory buffer probably produces invalid xml
because one of the mxml functions always writes "/>" into writer->fh, which is 0
for in-memory writers, so the "/>" tag goes to the console instead of the xml
data stream.

- hs_write_event() closes fd 0 (standard input), which confuses ss_getch(),
which makes mlogger not work (at least on my machine). I traced this down to the
history file file descriptors being initialized to zero and hs_write_event()
closing files without checking that it ever opened them.

- mevb: event builder did not work with a single frontend (a two-liner fix, once
Pierre showed me where to look. Why? My second TPC-USB interface did not yet
arrive and I wanted to test my frontend code. Yes, it had enough bugs to prevent
the event builder from working).

- mevb: consumes 100% CPU. Fix: add a delay in the main busy-loop.

- mlogger ROOT tree output does not work for data banks coming through the event
builder: mlogger looks for the bank definition under the event_id of mevb, in 
/equipment/evb/variables, which is empty, as the data banks are under
/equipment/frontendNN/variables. This may be hard to fix: bank "TPCA" may be
under "fe01", "TPCB" under "fe02" and mlogger knows nothing about any of this.
Fix: go back to .mid files.

K.O.
ELOG V3.1.4-2e1708b5