Back Midas Rome Roody Rootana
  Midas DAQ System, Page 23 of 146  Not logged in ELOG logo
ID Date Authordown Topic Subject
  787   19 Apr 2012 Stefan RittBug ReportBuild error with mlogger: invalid conversion from ‘void*’ to ‘gzFile’

Exaos Lee wrote:
I tried to build MIDAS under ArchLinux, failed on errors as following:
src/mlogger.cxx: In function ‘INT midas_flush_buffer(LOG_CHN*)’:
src/mlogger.cxx:1011:54: error: invalid conversion from ‘void*’ to ‘gzFile’ [-fpermissive]
In file included from src/mlogger.cxx:33:0:
/usr/include/zlib.h:1318:21: error:   initializing argument 1 of ‘int gzwrite(gzFile, voidpc, unsigned int)’ [-fpermissive]
src/mlogger.cxx: In function ‘INT midas_log_open(LOG_CHN*, INT)’:
src/mlogger.cxx:1200:79: error: invalid conversion from ‘void*’ to ‘gzFile’ [-fpermissive]
In file included from src/mlogger.cxx:33:0:
Please refer to attachment elog:786/1 for detail. There are also many warnings listed.

This error can be supressed by adding -fpermissive to CXXFLAGS. But the error message is correct."gzFile" is not equal to "void *"! C allows implicit casts between void* and any pointer type, C++ doesn't allow that. It's better to fix this error. A quick fix would be adding explicit casts. But I'm not sure what is the proper way to fix this.


Ah, dumb gcc gets pickier and pickier. I added a case (gzFile)log_chn->gzfile which fixes the error. I cannot put gzFile already into the header file since the zlib header is included after the midas header, otherwise we get some other problems. The SVN version with the fix is 5275.
  789   27 Apr 2012 Stefan RittBug ReportBuild error with mlogger: invalid conversion from ‘void*’ to ‘gzFile’

KO wrote:
BTW, I read the midas elog via email and if you post html or elcode messages, I receive complete
gibberish. For prompt service, please select message type "plain". (yes, you cannot use fancy colours and
blinking text, but better than me not reading your stuff at all).

BTW2, for easier reading, please include error messages as plain text in your message. As opposed to
compressed attachements.

K.O.


BTW3, if you use a real email program you don't get glibberish. I know some people prefer good-old-text-only pine, but I'm sure you do not use the ascii-only browser lynx to browse the internet, right? So if you browse the web in graphics, why not read your email in graphics as well. Better change yourself than the whole rest of the world Wink
  799   14 Jun 2012 Stefan RittBug ReportCannot start/stop run through mhttpd
> I found the problem only appears when I run mhttpd in scripts, whether bash or python.
> And I'm quite sure that the MIDAS environments (e.g. PATH, MIDAS_EXPTAB, MIDASSYS, etc.)
> are set in such scripts. If I start mhttpd in an xterm with or without "-D", it works
> fine. So, what's the difference between invoking mhttpd directly and through a script?

When you start it with "-D", then mhttpd become a daemon. According to linux rules, it has to "cd /", so it lives in the 
root directory, in order not to block any NFS mount/unmount. If something with the path is not correct then, mhttpd 
cannot find mtransition then. Once I fixed that problem my moving mtransition to /usr/bin.

Stefan
  807   21 Jun 2012 Stefan RittInfomidas vme benchmarks
Just for completeness: Attached is the VME transfer speed I get with the SIS3100/SIS1100 interface using 
2eVME transfer. This curve can be explained exactly with an overhead of 125 us per DMA transfer and a 
continuous link speed of 83 MB/sec.
Attachment 1: Screen_Shot_2012-06-21_at_10.14.09_.png
Screen_Shot_2012-06-21_at_10.14.09_.png
  808   21 Jun 2012 Stefan RittBug ReportCannot start/stop run through mhttpd
> I agree. Somehow mhttpd cannot run mtransition. I am not super happy with this dependance on user $PATH settings and the inability to capture error messages 
> from attempts to start mtransition. I am now thinking in the direction of running mtransition code by forking. But remember that mlogger and the event builder also
> have to use mtransition to stop runs (otherwise they can dead-lock). So an mhttpd-only solution is not good enough...

The way to go is to make cm_transition multi-threaded. Like on thread for each client to be contacted. This way the transition can go in parallel when there are many frontend computers for example, which will speed up 
transitions significantly. In addition, cm_transition should execute a callback whenever a client succeeded or failed, so to give immediate feedback to the user. I think of something like implementing WebSockets in mhttpd for that (http://en.wikipedia.org/wiki/WebSocket).

I have this in mind since many years, but did not have time to implement it yet. Maybe on my next visit to TRIUMF?

Stefan
  810   22 Jun 2012 Stefan RittInfomidas vme benchmarks
> > Just for completeness: Attached is the VME transfer speed I get with the SIS3100/SIS1100 interface using 
> > 2eVME transfer. This curve can be explained exactly with an overhead of 125 us per DMA transfer and a 
> > continuous link speed of 83 MB/sec.
> 
> What VME module is on the other end?
> 
> K.O.

The PSI-built DRS4 board, where we implemented the 2eVME protocol in the Virtex II FPGA. The same speed can be obtained with the commercial 
VME memory module CI-VME64 from Chrislin Industries (see http://www.controlled.com/vme/chinp1.html).

Stefan
  814   25 Jun 2012 Stefan RittInfomidas vme benchmarks
> P.S. Observe the ever present unexplained event rate fluctuations between 130-140 event/sec.

An important aspect of optimizing your system is to keep the network traffic under control. I use GBit Ethernet between FE and BE, and make sure the switch 
can accomodate all accumulated network traffic through its backplane. This way I do not have any TCP retransmits which kill you. Like if a single low-level 
ethernet packet is lost due to collision, the TCP stack retransmits it. Depending on the local settings, this can be after a timeout of one (!) second, which 
punches already a hole in your data rate. On the MSCB system actually I use UDP packets, where I schedule the retransmit myself. For a LAN, 10-100ms timeout 
is there enough. The one second is optimized for a WAN (like between two continents) where this is fine, but it is not what you want on a LAN system. Also 
make sure that the outgoing traffic (lazylogger) uses a different network card than the incoming traffic. I found that this also helps a lot.

- Stefan
  821   13 Jul 2012 Stefan RittBug ReportCrash after recursive use of rpc_execute()
> Then I realized that I see a recursive call to rpc_execute(): rpc_execute() calls tr_stop() calls cm_yield() calls 
> ss_suspend() calls rpc_execute(). The second rpc_execute successfully completes, but leave corrupted 
> data for the original rpc_execute(), which happily crashes. At the moment of the crash, recursive call to 
> rpc_execute() is no longer visible.

This is really strange. I did not protect rpc_execute against recursive calls since this should not happen. rpc_server_receive() is linked to rpc_call() on the client side. So there cannot be 
several rpc_call() since there I do the recursive checking (also multi-thread checking) via a mutex. See line 10142 in midas.c. So there CANNOT be recursive calls to rpc_execute() because 
there cannot be recursive calls to rpc_server_receive(). But apparently there are, according to your stack trace.

So even if your patch works fine, I would like to know where the recursive calls to rpc_server_receive() come from. Since we have one subproces of mserver for each client, there should only 
be one client connected to each mserver process, and the client is protected via the mutex in rpc_call(). Can you please debug this? I would like to understand what is going on there. Maybe 
there is a deeper underlying problem, which we better solve, otherwise it might fall back on use in the future.

For debugging, you have to see what commands rpc_call() send and what rpc_server_receive() gets, maybe by writing this into a common file together with a time stamp.

SR
  833   05 Sep 2012 Stefan RittInfoNew pipe compression implemented in mlogger
A new pipe compression has been implemented in mlogger thanks to Fedor Ignatov from BINP 
Novosibirsk. The way it works that the logger write into a pipe instead directly into a file. The pipe can 
then be connected to any compression program without the need to copile against any additional C 
library.

To use is, enter as the filename for example

|bzip2>run%05d.mid     (note the pipe '|' in front of the bzip2)

This way the data stream is run through the bzip2 program, which is known to have better compression 
ratio than gzip. Furthermore, the parallel version of bzip2 can be used, which spreads over all available 
CPU cures and speeds up compression almost linearly with the number of cores. This parallel version 
called pbzip2 can be found here:

http://compression.ca/pbzip2/

It can be easily compiled and installed. Using this method in the MEG experiment at PSI, we can compress 
our waveform data to 37% or it's original size (49% with gzip), and on 8 cores we get a compression rate 
of about 40 MBytes/sec (23 MBytes with gzip on a single core).

The disadvantage of that method is that one cannot see the compression ratio online, but this is not a big 
deal I guess. The new version has been committed as rev. 5324. 

/Stefan
  836   11 Sep 2012 Stefan RittInfoMIDAS button to display image
> Hi,
> 
> I've written a python script that reads some data from a file and generates a
> .png image. I want to have a button on my MIDAS status page that:
> 
> - executes the script and waits for it to finish,
> - then displays the image
> 
> How can I do that? I tried using the sequencer to just execute the script every
> 30 seconds, but I can't get it to work, and it would be better to only execute
> the script on demand anyway. 
> 
> I also am having trouble getting image display to work. I have the ODB keys set:
> 
> [local:oven1:S]/Custom>ls
> Temperature Map&                /home/deap/ovendaq/online/index.html
> Images
> 
> [local:oven1:S]/Custom>ls Images/temps.png/           
> Background                      /home/deap/ovendaq/online/temps.png
> 
> And the HTML file is just this:
> <img src="temps.png">
> 
> But the image won't display. It shows a "broken" picture, and when I try to view
> it directly it says: Invalid custom page: Page not found in ODB.
> 
> Any help would be appreciated...
> 
> Thanks
> Shaun


If you use the "custom" image system, you need to use GIF images. mhttpd can dynamically create GIF 
images, 
with a background image and overlaid labels, bar graphs etc. But mhttpd just contains a GIF library to do 
that 
in memory, but no PNG library.

Actually I would recommend you not to use a script to create an image, but use the custom image system 
to 
display temperatures. In the attachment you see an page from our experiment which contains a 
background image (the greyish boxes), labels (white temperature boxes), bar graphs (blue level boxes) 
and history pages (left side). This is all dynamically created inside mhttpd using the custom page system 
without any external script. All you have to do is to get the temperatures and levels inside the ODB via the 
slow control system. If you want, I can send you the full code for that page.

Cheers,
Stefan
Attachment 1: Screen_Shot_2012-09-11_at_14.36.56_.png
Screen_Shot_2012-09-11_at_14.36.56_.png
  839   09 Oct 2012 Stefan RittBug Fix[PATCH] mana.c compile fix, gz files
> Hi,
> 
> I had to apply the attached patch to convince SuSE Linux 12.2 to compile mana.c
> gcc version is "(SUSE Linux) 4.6.2"
> 
> Problem is that gz{write,close, etc.} expect a 1st argument of type gzFile (see
> zlib.h), whereas out_file is FILE*. In fact, out_file is a cast to FILE*, even
> in the case when we work on a gzfile (HAVE_ZLIB).
> 
> Could you please confirm that the patch is correct, and possibly apply it to trunk?
> 
> I haven't checked if mana works as advertised now.
> 
> Cheers,
> 
> 
> Randolf

I applied your patch to the trunk.

Best,
Stefan
  842   13 Dec 2012 Stefan RittBug Reportss_thread_kill() kills entire program
The Linux thread functionality was introduced by Konstantin, so he might have a better idea about that.

What I usually do is a graceful thread shutdown just by a flag. Like

int stop_thread = 0;

INT f(void *param)
{
  for (int x = 0; x < 100; x++) {
    sleep(1);
    if (stop_thread) {
      // clean up things here...
      return 0;
    }
  }
  return 0;
}

int main()
{
 printf("creating thread\n");
 midas_thread_t thr = ss_thread_create(f, NULL);
 sleep(2);
 printf("killing thread\n");
 stop_thread = 1;
 sleep(2);
 printf("success\n");
 return 0;
}


This way I have a chance to clean up things in the thread, which otherwise I would not be able to.
  847   14 Dec 2012 Stefan RittSuggestionMidas + Elog with SSL
> I've been trying to set up midas to create an automatic elog entry at the end of
> each run and I've run into a problem. I've setup an elog on our server which
> uses SSL and it seems that the melog provided by midas to create logbook entries
> doesn't know any SSL.
> 
> My solution to this was to copy the crypt.c from the elog package to the
> computer running midas and changed melog.c and the makefile to use SSL if a flag
> -s is used. Does this seem like a sensible solution or did I oversee the obvious
> and/or right way to do this?

Indeed melog.c is an old version of the elog.c utility in the elog package, which has not been maintained since a 
long time. Can't you just use the recent elog.c utility from the elog package?
  850   20 Dec 2012 Stefan RittBug ReportMIDAS does not function correctly on F17
If is not so easy to get out of zlib how many bytes have been written actually. I used an undocumented function, 
which breaks down on 64-bit systems.

I now rewrote the code in mlogger.cxx to use lseek() to "measure" actually the output file and set the values 
correctly. I tried on a few systems but am not 100% sure if it works everywhere. Can you please double check?

The fix is in SVN revision 5347.

/Stefan
  853   09 Jan 2013 Stefan RittBug ReportOutputting ADC and TDC data into ROOT tree with the MIDAS SVN Revision:5347.
Dear Bill,

the Midas analyzer "mana.c" is currently not maintained. At PSI we use the ROME framework (which might be too complicated for a 
small experiment) and at TRIUMF the ROOTANA framework is used:

http://ladd00.triumf.ca/~olchansk/rootana/

You might be better off switching to that one.

Best regards,
Stefan
  857   01 Feb 2013 Stefan RittForumanalyzer cannot connect to the statistics database
> The simplest thing is probably to delete all files .[A-Z]*.SHM in the odb directory (the
> one you specified in /etc/exptab).
> This wipes the ODB, shared memory and all the other obscure stuff, giving you a clean,
> fresh start.
> 
> Of course it wipes all the valuable stuff, too. That's why it's handy to sometimes open
> odbedit and "save odb_<yyyymmdd>.odb". You can reload the thing after such a fatal 
> "rm .[A-Z]*.SHM" 

Thanks Randolf for helping out, I was not in the office this week.

In addition of deleting the *SHM files, it's sometimes necessary to delete the shared memory. You do this with the 
command line tools

ipcs -m
ipcrm -m <shmid>


/Stefan
  858   06 Feb 2013 Stefan RittInfoCompression benchmarks
I redid the tests from Konstantin for our MEG experiment at PSI. The event structure is different, so it
is interesting how the two different experiments compare. We have an event size of 2.4 MB and a trigger
rate of ~10 Hz, so we produce a raw data rate of 24 MB/sec. A typical run contains 2000 events, so has a 
size of 5 GB. Here are the results:


cat                 : time   7.8s, size   4960156030   4960156030, comp   0%, rate 639M/s 639M/s

gzip -1             : time 147.2s, size   4960156030   2468073901, comp  50%, rate  33M/s  16M/s

pbzip2 -p1          : time 679.6s, size   4960156030   1738127829, comp  65%, rate   7M/s   2M/s (1 CPU)
pbzip2 -p8          : time  96.1s, size   4960156030   1738127829, comp  65%, rate  51M/s  18M/s (8 CPU)


As one can see, our compression ratio is poorer (due to the quasi random noise in our waveforms), but the
difference between gzip -1 and pbzip2 is larger (15% instead 10% for DEAP). The single CPU version of
pbzip cannot sustain our DAQ rate of 24 MB, but the parallel version can. Actually we have a somehow old
dual-core dual-CPU board 2.5 GHz Xenon box, and make 8 hyper-threading CPUs out of the total 4 cores.
Interestingly the compression rate scales with 7.3 for 8 virtual cores, so hyper-threading does its job.
So we take all our data with the pbzip2 compression. The additional 15% as compared with gzip does 
not sound much, but we produce raw 250 TB/year. So gzip gives us 132 TB/year and pbzip2 gives 
us 98 TB/year, and we save quite some disks.

Note that you can run bzip2 (as all the other methods) already now with the current logger, if you specify
an external compression program in the ODB using the pipe functionality:


local:MEG:S]/>cd Logger/Channels/0/Settings/
[local:MEG:S]Settings>ls
Active                          y
Type                            Disk
Filename                        |pbzip2>/megdata/run%06d.mid.bz2
Format                          MIDAS
Compression                     0
ODB dump                        y
Log messages                    0
Buffer                          SYSTEM
Event ID                        -1
Trigger mask                    -1
Event limit                     0
Byte limit                      0
Subrun Byte limit               0
Tape capacity                   0
Subdir format                   
Current filename                /megdata/run197090.mid.bz2
</pre>
  860   11 Feb 2013 Stefan RittForumsend_tcp error
> I am getting a series of errors from MIDAS that I do not understand, so I hope
> someone can help me figure this out.
> 
> I am attempting to run many frontends on one machine. I can run 8 with no
> problem, but if I try to add a 9th I get errors relating to send_tcp. 
> 
> I have tried adjusting the max event sizes and buffer sizes, but it has not
> resolved the problem. I also tried adjusting the data rates and the total data
> volume going through each frontend, but there was no change. And as far as I can
> tell I am not up against any hardware limits.
> 
> The errors are repeated continuously while a run is going. The three errors I
> get are:
> 
> 16:45:22 [FakeData09,ERROR] [midas.c:9958:rpc_client_call,ERROR] send_tcp() failed
> 16:45:22 [FakeData09,ERROR] [frontend_rpc.c:191:rpc_call,ERROR] No RPC to master
> 16:45:22 [FakeData09,ERROR] [system.c:4166:send_tcp,ERROR]
> send(socket=9,size=16) returned -1, errno: 32 (Broken pipe)
> 
> If you have any suggestions of how I can debug this, please let me know. Thanks!

Can you tell me

- why you need 9 frontends
- what kind of data your frontends produce
- how your event builder looks like and how you assemble the fragments
- what messages/errors you see when you run odbedit BEFORE the crash

/Stefan
  862   12 Feb 2013 Stefan RittForumsend_tcp error
Ok, now the picture is clearer. I have however no idea what the real problem is. The number of concurrent programs in midas is 64 as defined in midas.h (MAX_CLIENTS) so that should not be the problem. In our experiment we run 10 front-ends (but 
on 10 different machines) without problems. Other experiments used 27 front-ends.

The TCP error you see comes probably from the fact that the mserver side crashes or quits, then the socket gets broken. What you can try to debug this is to run mserver manually. Just remove mserver from inetd, and start it with "mserver -d" and 
watch what happens. Do you see any additional error messages. If the mserver segfaults, you should turn on core dumps and have a look there. Note that the mserver starts a child process on each incoming connection, so running mserver in gdb 
does not really help, since the child processes (which connect back to the front-ends) are not seen by gdb.

Have you tried to run the 9 front-ends on maybe two different PCs (5 and 4) to see if the problem is on the client side?


Best regards,
Stefan
  864   14 Feb 2013 Stefan RittInfoReview of github and bitbucket
Let me add my five cents:

We use bitbucket now since two months at PSI, and are very happy with it.

Pros:

- We like the GIT flow model (http://nvie.com/posts/a-successful-git-branching-model/). You can at the same time do hot fixes, have a "distribution 
version", and keep a development branch, where you can try new things without compromising the distribution.
- Nice and fast Web interface, especially the "blame" is lightning fast compared to SVN/CVS
- GIT is non-centralized, so your local clone of a repository contains everything. If bitbucket is down/asks for money, you can continue with your local 
repository and clone it to some other hosting service, or host it yourself
- SourceTree (http://www.sourcetreeapp.com/) is a nice GUI for Mac lovers. 
- Easy user management
- Free for academic use

Con:

- Wiki is limited as KO wrote, so it should not be used as a "full" wiki to replace Plone for example, just to annotate your project
- SVN revision number is gone. This is on purpose since it does not make sense any more if you keep several parallel branches (merging becomes a 
nightmare), so one has to use either the (random) commit-ID or start tagging again.

So I conclusion, I would say that it's time to switch MIDAS to GIT. We'll probably do that in July when I will be at TRIUMF.

/Stefan
ELOG V3.1.4-2e1708b5