Back Midas Rome Roody Rootana
  Midas DAQ System, Page 24 of 150  Not logged in ELOG logo
ID Date Authordown Topic Subject
  850   20 Dec 2012 Stefan RittBug ReportMIDAS does not function correctly on F17
If is not so easy to get out of zlib how many bytes have been written actually. I used an undocumented function, 
which breaks down on 64-bit systems.

I now rewrote the code in mlogger.cxx to use lseek() to "measure" actually the output file and set the values 
correctly. I tried on a few systems but am not 100% sure if it works everywhere. Can you please double check?

The fix is in SVN revision 5347.

/Stefan
  853   09 Jan 2013 Stefan RittBug ReportOutputting ADC and TDC data into ROOT tree with the MIDAS SVN Revision:5347.
Dear Bill,

the Midas analyzer "mana.c" is currently not maintained. At PSI we use the ROME framework (which might be too complicated for a 
small experiment) and at TRIUMF the ROOTANA framework is used:

http://ladd00.triumf.ca/~olchansk/rootana/

You might be better off switching to that one.

Best regards,
Stefan
  857   01 Feb 2013 Stefan RittForumanalyzer cannot connect to the statistics database
> The simplest thing is probably to delete all files .[A-Z]*.SHM in the odb directory (the
> one you specified in /etc/exptab).
> This wipes the ODB, shared memory and all the other obscure stuff, giving you a clean,
> fresh start.
> 
> Of course it wipes all the valuable stuff, too. That's why it's handy to sometimes open
> odbedit and "save odb_<yyyymmdd>.odb". You can reload the thing after such a fatal 
> "rm .[A-Z]*.SHM" 

Thanks Randolf for helping out, I was not in the office this week.

In addition of deleting the *SHM files, it's sometimes necessary to delete the shared memory. You do this with the 
command line tools

ipcs -m
ipcrm -m <shmid>


/Stefan
  858   06 Feb 2013 Stefan RittInfoCompression benchmarks
I redid the tests from Konstantin for our MEG experiment at PSI. The event structure is different, so it
is interesting how the two different experiments compare. We have an event size of 2.4 MB and a trigger
rate of ~10 Hz, so we produce a raw data rate of 24 MB/sec. A typical run contains 2000 events, so has a 
size of 5 GB. Here are the results:


cat                 : time   7.8s, size   4960156030   4960156030, comp   0%, rate 639M/s 639M/s

gzip -1             : time 147.2s, size   4960156030   2468073901, comp  50%, rate  33M/s  16M/s

pbzip2 -p1          : time 679.6s, size   4960156030   1738127829, comp  65%, rate   7M/s   2M/s (1 CPU)
pbzip2 -p8          : time  96.1s, size   4960156030   1738127829, comp  65%, rate  51M/s  18M/s (8 CPU)


As one can see, our compression ratio is poorer (due to the quasi random noise in our waveforms), but the
difference between gzip -1 and pbzip2 is larger (15% instead 10% for DEAP). The single CPU version of
pbzip cannot sustain our DAQ rate of 24 MB, but the parallel version can. Actually we have a somehow old
dual-core dual-CPU board 2.5 GHz Xenon box, and make 8 hyper-threading CPUs out of the total 4 cores.
Interestingly the compression rate scales with 7.3 for 8 virtual cores, so hyper-threading does its job.
So we take all our data with the pbzip2 compression. The additional 15% as compared with gzip does 
not sound much, but we produce raw 250 TB/year. So gzip gives us 132 TB/year and pbzip2 gives 
us 98 TB/year, and we save quite some disks.

Note that you can run bzip2 (as all the other methods) already now with the current logger, if you specify
an external compression program in the ODB using the pipe functionality:


local:MEG:S]/>cd Logger/Channels/0/Settings/
[local:MEG:S]Settings>ls
Active                          y
Type                            Disk
Filename                        |pbzip2>/megdata/run%06d.mid.bz2
Format                          MIDAS
Compression                     0
ODB dump                        y
Log messages                    0
Buffer                          SYSTEM
Event ID                        -1
Trigger mask                    -1
Event limit                     0
Byte limit                      0
Subrun Byte limit               0
Tape capacity                   0
Subdir format                   
Current filename                /megdata/run197090.mid.bz2
</pre>
  860   11 Feb 2013 Stefan RittForumsend_tcp error
> I am getting a series of errors from MIDAS that I do not understand, so I hope
> someone can help me figure this out.
> 
> I am attempting to run many frontends on one machine. I can run 8 with no
> problem, but if I try to add a 9th I get errors relating to send_tcp. 
> 
> I have tried adjusting the max event sizes and buffer sizes, but it has not
> resolved the problem. I also tried adjusting the data rates and the total data
> volume going through each frontend, but there was no change. And as far as I can
> tell I am not up against any hardware limits.
> 
> The errors are repeated continuously while a run is going. The three errors I
> get are:
> 
> 16:45:22 [FakeData09,ERROR] [midas.c:9958:rpc_client_call,ERROR] send_tcp() failed
> 16:45:22 [FakeData09,ERROR] [frontend_rpc.c:191:rpc_call,ERROR] No RPC to master
> 16:45:22 [FakeData09,ERROR] [system.c:4166:send_tcp,ERROR]
> send(socket=9,size=16) returned -1, errno: 32 (Broken pipe)
> 
> If you have any suggestions of how I can debug this, please let me know. Thanks!

Can you tell me

- why you need 9 frontends
- what kind of data your frontends produce
- how your event builder looks like and how you assemble the fragments
- what messages/errors you see when you run odbedit BEFORE the crash

/Stefan
  862   12 Feb 2013 Stefan RittForumsend_tcp error
Ok, now the picture is clearer. I have however no idea what the real problem is. The number of concurrent programs in midas is 64 as defined in midas.h (MAX_CLIENTS) so that should not be the problem. In our experiment we run 10 front-ends (but 
on 10 different machines) without problems. Other experiments used 27 front-ends.

The TCP error you see comes probably from the fact that the mserver side crashes or quits, then the socket gets broken. What you can try to debug this is to run mserver manually. Just remove mserver from inetd, and start it with "mserver -d" and 
watch what happens. Do you see any additional error messages. If the mserver segfaults, you should turn on core dumps and have a look there. Note that the mserver starts a child process on each incoming connection, so running mserver in gdb 
does not really help, since the child processes (which connect back to the front-ends) are not seen by gdb.

Have you tried to run the 9 front-ends on maybe two different PCs (5 and 4) to see if the problem is on the client side?


Best regards,
Stefan
  864   14 Feb 2013 Stefan RittInfoReview of github and bitbucket
Let me add my five cents:

We use bitbucket now since two months at PSI, and are very happy with it.

Pros:

- We like the GIT flow model (http://nvie.com/posts/a-successful-git-branching-model/). You can at the same time do hot fixes, have a "distribution 
version", and keep a development branch, where you can try new things without compromising the distribution.
- Nice and fast Web interface, especially the "blame" is lightning fast compared to SVN/CVS
- GIT is non-centralized, so your local clone of a repository contains everything. If bitbucket is down/asks for money, you can continue with your local 
repository and clone it to some other hosting service, or host it yourself
- SourceTree (http://www.sourcetreeapp.com/) is a nice GUI for Mac lovers. 
- Easy user management
- Free for academic use

Con:

- Wiki is limited as KO wrote, so it should not be used as a "full" wiki to replace Plone for example, just to annotate your project
- SVN revision number is gone. This is on purpose since it does not make sense any more if you keep several parallel branches (merging becomes a 
nightmare), so one has to use either the (random) commit-ID or start tagging again.

So I conclusion, I would say that it's time to switch MIDAS to GIT. We'll probably do that in July when I will be at TRIUMF.

/Stefan
  870   03 Apr 2013 Stefan RittInfoReview of github and bitbucket
> * "git bisect" for finding which commit introduced a (reproducible) bug.

I did not know this command, so I read about it. This IS WONDERFUL! I had once (actually with MSCB) the case that a bug was introduced i the last 100 
revisions, but I did not know in which. So I checked out -1, -2, -3 revisions, then thought a bit, then tried -99, -98, then had the bright idea to try -50, then 
slowly converged. Later I realised that I should have done a binary search, like -50, if ok try -25, if bad try -37, and so on to iteratively find the offending 
commit. Finding that there is a command it git which does this automatically is great news.

Stefan
  875   11 Apr 2013 Stefan RittForumPersistent ipcrm error

Thorsten Lux wrote:
In addition now I cannot start anymore the mlogger from the web interface but only manually. However, I can stop it from the web interface.


At least that one can be fixed easily. Each program has a certain command with which one can start it. This has to be put into the ODB under /Programs/<program>. In your case you probably need

/Programs/Logger/Start command = mlogger -D

to start the logger from the Web page. To debug your run stop problems, I would recommend to start all programs in a terminal window and look which one crashes on the run end.

/Stefan
  877   12 Apr 2013 Stefan RittForumPersistent ipcrm error
> Hi Stefan,
> 
> under /Programs/Logger/Start command I have
> /home/next/MIDAS/midas/linux/bin/mlogger -D . This command does not work if I
> press the "Start Logger" button on the mhttpd webpage but when I copy and paste
> this command to a terminal window, it does the job. 
> 
> Well, thanks to you both for the fast response. I wrote Konstantin an email with
> the results of the tests he suggested me to do.
> 
> Ciao

Let me guess: mhttpd is started under root (to be able to connect to port 80), and for root the mlogger program 
is not in the path. Try to put into the odb the full path:

/Programs/Logger Start command = /usr/local/bin/mlogger -D
  880   12 Apr 2013 Stefan RittForumPersistent ipcrm error
> [odb.c:6038:db_paste,ERROR] found string exceeding MAX_STRING_LENGTH

Ok, so here is what probably happened. Some user program wrote a long string into the ODB and somehow corrupted it. This corruption persists as long as you work with 
binary data. Indeed "rebuilding" the ODB helps in that case. What we do actually is at the beginning of every run, the ODB contents is dumped into the data file via

/Logger/Channels/0/Setting/ODB dump

in case we get ODB corruption, we clear all *.shm files as well as the shared memory segments, create a fresh ODB, extract the ODB from the last successful run via

odbhist -e runxxx.mid

and load it via odbedit. I put some additional code in most midas functions to prevent this corruption (and thus your saw the above error "found string exceeding 
MAX_STRING_LENGTH"), but since the ODB is physically in the address space of each midas program, they can theoretically bypass the midas functions and write accidentally 
into the ODB with an uninitialized pointer or so.

Best regards,
Stefan
  893   22 Jul 2013 Stefan RittInfoMIDAS source code converted from SVN to GIT
Konstantin forgot to tell people outside of TRIUMF how to get the newest version of MIDAS. Here it is:

$ git clone https://bitbucket.org/tmidas/midas.git

Not that you can also browse the repository at

https://bitbucket.org/tmidas/midas

On some (older) systems, you might have to install git (http://git-scm.com/downloads).

/Stefan
  903   13 Sep 2013 Stefan RittForumMIDAS CITATION
> Dear MIDAS programmers,
> 
> I have been using your software in my lab (APC, Paris)
> to run our data acqusition system. It is very robust and flexible.s
> 
> I would like to give you the large amount of credit which you are due.
> How should I cite both MIDAS and ROODY? I have not been able to find any
> information in the usual places.
> 
> Cheers, and thanks for the great program!
> -Carl

The standard citation for midas is a link to

http://midas.psi.ch

At the moment this points automatically to http://midas.triumf.ca, so both institutes are credited.

/Stefan
  908   23 Sep 2013 Stefan RittInfoCustom page header implemented
Due to popular request, I implemented a custom header for mhttpd. This allows to inject some HTML code 
to be shown on top of the menu bar on all mhttpd pages. One possible application is to bring back the old 
status line with the name of the current experiment, the actual time and the refresh interval. 

To use this feature, one can put a new entry into the ODB under

/Custom/Header

which can be either a string (to show some short HTML code directly) or the name of a file containing some 
HTML code. If /Custom/Path is present, that path is used to locate the header file. A simple header file to 
recreate the GOT look (good-old-times) is here:

<div id="footerDiv" class="footerDiv">
<div style="display:inline; float:left;">MIDAS experiment "Test"</div>
<div id="refr" style="display:inline; float:right;"></div>
</div>
<script type="text/javascript">
var r = document.getElementById('refr');
var now	= new Date();
var c =	document.cookie.split('midas_refr=');
r.innerHTML = now.toString() + '&nbsp;&nbsp;&nbsp;' + 'Refr:' + c.pop().split(';').shift();
</script>

The JavaScript code is used to retrieve the midas_refr cookie which stores the refresh interval and displays 
it together with the current time.

Another application of this feature might be to check certain values in the ODB (via the ODBGet function) 
and some some important status or error condition.

/Stefan
Attachment 1: Screen_Shot_2013-09-23_at_15.17.40_.png
Screen_Shot_2013-09-23_at_15.17.40_.png
  909   24 Sep 2013 Stefan RittInfomktime() and daylight savings time
I vaguely remember that I had a similar problem with ELOG. The solution was to call tzset() at the beginning of the program. The man page says that 
this function is called automatically by programs using time zones, but apparently it is not. Can you try that? There is also the TZ environment 
variable and /etc/localtime. I never understood the details, but playing with these things can influence mktime() and localtime().

/Stefan
  910   24 Sep 2013 Stefan RittBug Reportmhttpd truncates string variables to 32 characters
Actually this was no bug, but a missing feature. Strings were never meant to be extended via the web interface. 
Now I added that feature to the current version. Please check it.

/Stefan
  912   24 Sep 2013 Stefan RittInfomktime() and daylight savings time
> > I vaguely remember that I had a similar problem with ELOG. The solution was to call tzset() at the beginning of the program. The man page says that 
> > this function is called automatically by programs using time zones, but apparently it is not. Can you try that? There is also the TZ environment 
> > variable and /etc/localtime. I never understood the details, but playing with these things can influence mktime() and localtime().
> 
> I confirm that the timezone is set correctly - I do get the correct time eventually - so there is no missing call to tzet().
> 
> K.O.

tzset() not only sets the time zone, but also DST.
  913   24 Sep 2013 Stefan RittInfomktime() and daylight savings time
> > > I vaguely remember that I had a similar problem with ELOG. The solution was to call tzset() at the beginning of the program. The man page says that 
> > > this function is called automatically by programs using time zones, but apparently it is not. Can you try that? There is also the TZ environment 
> > > variable and /etc/localtime. I never understood the details, but playing with these things can influence mktime() and localtime().
> > 
> > I confirm that the timezone is set correctly - I do get the correct time eventually - so there is no missing call to tzet().
> > 
> > K.O.
> 
> tzset() not only sets the time zone, but also DST.

I found following code in elogd.c, maybe it helps:

/* workaround for wong timezone under MAX OSX */
long my_timezone()
{
#if defined(OS_MACOSX) || defined(__FreeBSD__) || defined(__OpenBSD__)
   time_t tp;
   time(&tp);
   return -localtime(&tp)->tm_gmtoff;
#else
   return timezone;
#endif
}



void get_rfc2822_date(char *date, int size, time_t ltime)
{
   time_t now;
   char buf[256];
   int offset;
   struct tm *ts;

   /* switch locale temporarily back to english to comply with RFC2822 date format */
   setlocale(LC_ALL, "C");

   if (ltime == 0)
      time(&now);
   else
      now = ltime;
   ts = localtime(&now);
   assert(ts);
   strftime(buf, sizeof(buf), "%a, %d %b %Y %H:%M:%S", ts);
   offset = (-(int) my_timezone());
   if (ts->tm_isdst)
      offset += 3600;
   snprintf(date, size - 1, "%s %+03d%02d", buf, (int) (offset / 3600),
            (int) ((abs((int) offset) / 60) % 60));
}
  914   24 Sep 2013 Stefan RittBug Reportmhttpd truncates string variables to 32 characters
> This is the jset code. The best I can tell it truncates string variables to the existing size in ODB:
> 
> db_find_key(hDB, 0, str, &hkey)
> db_get_key(hDB, hkey, &key);
> memset(data, 0, sizeof(data));
> size = sizeof(data);
> db_sscanf(getparam("value"), data, &size, 0, key.type);
> db_set_data_index(hDB, hkey, data, key.item_size, index, key.type);


Correct. So I added some code which extends strings if necessary (NOT string arrays, they are more complicated to handle).
  923   25 Oct 2013 Stefan RittBug Fixfixed mlogger run auto restart bug
> A problem existed in midas for some time: when recording long data sets of time (or event) limited runs 
> with logger run auto restart set to "yes", the runs will automatically stop and restart as expected, but 
> sometimes the run will stop and never restart and beam will be lost until the experiment operator on shift 
> wakes up and restarts the run manually.
> 
> I have now traced this problem to a race condition inside the mlogger - when a run is being stopped from 
> the mlogger, the mlogger run transition handler (tr_stop) triggers an immediate attempt to start the next 
> run, without waiting for the run-stop transition to actually complete. If the run-stop transition does not 
> finish quickly enough, a safety check in start_the_run() will cause the run restart attempt to silently fail 
> without any error message.
> 
> This race condition is pretty rare but somehow I managed to replicate it while debugging the 
> multithreaded transitions. It is fixed by making mlogger wait until the run-stop transition completes.
> 
> https://bitbucket.org/tmidas/midas/commits/b2631fbed5f7b1ec80e8a6c8781ada0baed7702b
> 
> K.O.

More generally I kind of consider the mlogger auto restart facility as deprecated. It works in the background and the operator does not have a clue 
what is going on. We use now the sequencer to achieve exactly the same functionality. It just requires a few lines of sequencer code:

LOOP INFINITE 
  TRANSITION start 
  WAIT events, 5000 
  TRANSITION stop 
ENDLOOP 

So the run start is only executed after the runs has been successfully stopped. You can do things in the sequencer like "stop run and sequence 
immediately" or "stop after current run has finished" which are a bit hard to do with the old method. So people should move to the sequencer.

/Stefan
ELOG V3.1.4-2e1708b5