ID |
Date |
Author |
Topic |
Subject |
787
|
19 Apr 2012 |
Stefan Ritt | Bug Report | Build error with mlogger: invalid conversion from ‘void*’ to ‘gzFile’ |
Exaos Lee wrote: | I tried to build MIDAS under ArchLinux, failed on errors as following:src/mlogger.cxx: In function ‘INT midas_flush_buffer(LOG_CHN*)’:
src/mlogger.cxx:1011:54: error: invalid conversion from ‘void*’ to ‘gzFile’ [-fpermissive]
In file included from src/mlogger.cxx:33:0:
/usr/include/zlib.h:1318:21: error: initializing argument 1 of ‘int gzwrite(gzFile, voidpc, unsigned int)’ [-fpermissive]
src/mlogger.cxx: In function ‘INT midas_log_open(LOG_CHN*, INT)’:
src/mlogger.cxx:1200:79: error: invalid conversion from ‘void*’ to ‘gzFile’ [-fpermissive]
In file included from src/mlogger.cxx:33:0: Please refer to attachment elog:786/1 for detail. There are also many warnings listed.
This error can be supressed by adding -fpermissive to CXXFLAGS. But the error message is correct."gzFile" is not equal to "void *"! C allows implicit casts between void* and any pointer type, C++ doesn't allow that. It's better to fix this error. A quick fix would be adding explicit casts. But I'm not sure what is the proper way to fix this. |
Ah, dumb gcc gets pickier and pickier. I added a case (gzFile)log_chn->gzfile which fixes the error. I cannot put gzFile already into the header file since the zlib header is included after the midas header, otherwise we get some other problems. The SVN version with the fix is 5275. |
789
|
27 Apr 2012 |
Stefan Ritt | Bug Report | Build error with mlogger: invalid conversion from ‘void*’ to ‘gzFile’ |
KO wrote: | BTW, I read the midas elog via email and if you post html or elcode messages, I receive complete
gibberish. For prompt service, please select message type "plain". (yes, you cannot use fancy colours and
blinking text, but better than me not reading your stuff at all).
BTW2, for easier reading, please include error messages as plain text in your message. As opposed to
compressed attachements.
K.O.
|
BTW3, if you use a real email program you don't get glibberish. I know some people prefer good-old-text-only pine, but I'm sure you do not use the ascii-only browser lynx to browse the internet, right? So if you browse the web in graphics, why not read your email in graphics as well. Better change yourself than the whole rest of the world |
799
|
14 Jun 2012 |
Stefan Ritt | Bug Report | Cannot start/stop run through mhttpd | > I found the problem only appears when I run mhttpd in scripts, whether bash or python.
> And I'm quite sure that the MIDAS environments (e.g. PATH, MIDAS_EXPTAB, MIDASSYS, etc.)
> are set in such scripts. If I start mhttpd in an xterm with or without "-D", it works
> fine. So, what's the difference between invoking mhttpd directly and through a script?
When you start it with "-D", then mhttpd become a daemon. According to linux rules, it has to "cd /", so it lives in the
root directory, in order not to block any NFS mount/unmount. If something with the path is not correct then, mhttpd
cannot find mtransition then. Once I fixed that problem my moving mtransition to /usr/bin.
Stefan |
807
|
21 Jun 2012 |
Stefan Ritt | Info | midas vme benchmarks | Just for completeness: Attached is the VME transfer speed I get with the SIS3100/SIS1100 interface using
2eVME transfer. This curve can be explained exactly with an overhead of 125 us per DMA transfer and a
continuous link speed of 83 MB/sec. |
Attachment 1: Screen_Shot_2012-06-21_at_10.14.09_.png
|
|
808
|
21 Jun 2012 |
Stefan Ritt | Bug Report | Cannot start/stop run through mhttpd | > I agree. Somehow mhttpd cannot run mtransition. I am not super happy with this dependance on user $PATH settings and the inability to capture error messages
> from attempts to start mtransition. I am now thinking in the direction of running mtransition code by forking. But remember that mlogger and the event builder also
> have to use mtransition to stop runs (otherwise they can dead-lock). So an mhttpd-only solution is not good enough...
The way to go is to make cm_transition multi-threaded. Like on thread for each client to be contacted. This way the transition can go in parallel when there are many frontend computers for example, which will speed up
transitions significantly. In addition, cm_transition should execute a callback whenever a client succeeded or failed, so to give immediate feedback to the user. I think of something like implementing WebSockets in mhttpd for that (http://en.wikipedia.org/wiki/WebSocket).
I have this in mind since many years, but did not have time to implement it yet. Maybe on my next visit to TRIUMF?
Stefan |
810
|
22 Jun 2012 |
Stefan Ritt | Info | midas vme benchmarks | > > Just for completeness: Attached is the VME transfer speed I get with the SIS3100/SIS1100 interface using
> > 2eVME transfer. This curve can be explained exactly with an overhead of 125 us per DMA transfer and a
> > continuous link speed of 83 MB/sec.
>
> What VME module is on the other end?
>
> K.O.
The PSI-built DRS4 board, where we implemented the 2eVME protocol in the Virtex II FPGA. The same speed can be obtained with the commercial
VME memory module CI-VME64 from Chrislin Industries (see http://www.controlled.com/vme/chinp1.html).
Stefan |
814
|
25 Jun 2012 |
Stefan Ritt | Info | midas vme benchmarks | > P.S. Observe the ever present unexplained event rate fluctuations between 130-140 event/sec.
An important aspect of optimizing your system is to keep the network traffic under control. I use GBit Ethernet between FE and BE, and make sure the switch
can accomodate all accumulated network traffic through its backplane. This way I do not have any TCP retransmits which kill you. Like if a single low-level
ethernet packet is lost due to collision, the TCP stack retransmits it. Depending on the local settings, this can be after a timeout of one (!) second, which
punches already a hole in your data rate. On the MSCB system actually I use UDP packets, where I schedule the retransmit myself. For a LAN, 10-100ms timeout
is there enough. The one second is optimized for a WAN (like between two continents) where this is fine, but it is not what you want on a LAN system. Also
make sure that the outgoing traffic (lazylogger) uses a different network card than the incoming traffic. I found that this also helps a lot.
- Stefan |
821
|
13 Jul 2012 |
Stefan Ritt | Bug Report | Crash after recursive use of rpc_execute() | > Then I realized that I see a recursive call to rpc_execute(): rpc_execute() calls tr_stop() calls cm_yield() calls
> ss_suspend() calls rpc_execute(). The second rpc_execute successfully completes, but leave corrupted
> data for the original rpc_execute(), which happily crashes. At the moment of the crash, recursive call to
> rpc_execute() is no longer visible.
This is really strange. I did not protect rpc_execute against recursive calls since this should not happen. rpc_server_receive() is linked to rpc_call() on the client side. So there cannot be
several rpc_call() since there I do the recursive checking (also multi-thread checking) via a mutex. See line 10142 in midas.c. So there CANNOT be recursive calls to rpc_execute() because
there cannot be recursive calls to rpc_server_receive(). But apparently there are, according to your stack trace.
So even if your patch works fine, I would like to know where the recursive calls to rpc_server_receive() come from. Since we have one subproces of mserver for each client, there should only
be one client connected to each mserver process, and the client is protected via the mutex in rpc_call(). Can you please debug this? I would like to understand what is going on there. Maybe
there is a deeper underlying problem, which we better solve, otherwise it might fall back on use in the future.
For debugging, you have to see what commands rpc_call() send and what rpc_server_receive() gets, maybe by writing this into a common file together with a time stamp.
SR |
833
|
05 Sep 2012 |
Stefan Ritt | Info | New pipe compression implemented in mlogger | A new pipe compression has been implemented in mlogger thanks to Fedor Ignatov from BINP
Novosibirsk. The way it works that the logger write into a pipe instead directly into a file. The pipe can
then be connected to any compression program without the need to copile against any additional C
library.
To use is, enter as the filename for example
|bzip2>run%05d.mid (note the pipe '|' in front of the bzip2)
This way the data stream is run through the bzip2 program, which is known to have better compression
ratio than gzip. Furthermore, the parallel version of bzip2 can be used, which spreads over all available
CPU cures and speeds up compression almost linearly with the number of cores. This parallel version
called pbzip2 can be found here:
http://compression.ca/pbzip2/
It can be easily compiled and installed. Using this method in the MEG experiment at PSI, we can compress
our waveform data to 37% or it's original size (49% with gzip), and on 8 cores we get a compression rate
of about 40 MBytes/sec (23 MBytes with gzip on a single core).
The disadvantage of that method is that one cannot see the compression ratio online, but this is not a big
deal I guess. The new version has been committed as rev. 5324.
/Stefan |
836
|
11 Sep 2012 |
Stefan Ritt | Info | MIDAS button to display image | > Hi,
>
> I've written a python script that reads some data from a file and generates a
> .png image. I want to have a button on my MIDAS status page that:
>
> - executes the script and waits for it to finish,
> - then displays the image
>
> How can I do that? I tried using the sequencer to just execute the script every
> 30 seconds, but I can't get it to work, and it would be better to only execute
> the script on demand anyway.
>
> I also am having trouble getting image display to work. I have the ODB keys set:
>
> [local:oven1:S]/Custom>ls
> Temperature Map& /home/deap/ovendaq/online/index.html
> Images
>
> [local:oven1:S]/Custom>ls Images/temps.png/
> Background /home/deap/ovendaq/online/temps.png
>
> And the HTML file is just this:
> <img src="temps.png">
>
> But the image won't display. It shows a "broken" picture, and when I try to view
> it directly it says: Invalid custom page: Page not found in ODB.
>
> Any help would be appreciated...
>
> Thanks
> Shaun
If you use the "custom" image system, you need to use GIF images. mhttpd can dynamically create GIF
images,
with a background image and overlaid labels, bar graphs etc. But mhttpd just contains a GIF library to do
that
in memory, but no PNG library.
Actually I would recommend you not to use a script to create an image, but use the custom image system
to
display temperatures. In the attachment you see an page from our experiment which contains a
background image (the greyish boxes), labels (white temperature boxes), bar graphs (blue level boxes)
and history pages (left side). This is all dynamically created inside mhttpd using the custom page system
without any external script. All you have to do is to get the temperatures and levels inside the ODB via the
slow control system. If you want, I can send you the full code for that page.
Cheers,
Stefan |
Attachment 1: Screen_Shot_2012-09-11_at_14.36.56_.png
|
|
839
|
09 Oct 2012 |
Stefan Ritt | Bug Fix | [PATCH] mana.c compile fix, gz files | > Hi,
>
> I had to apply the attached patch to convince SuSE Linux 12.2 to compile mana.c
> gcc version is "(SUSE Linux) 4.6.2"
>
> Problem is that gz{write,close, etc.} expect a 1st argument of type gzFile (see
> zlib.h), whereas out_file is FILE*. In fact, out_file is a cast to FILE*, even
> in the case when we work on a gzfile (HAVE_ZLIB).
>
> Could you please confirm that the patch is correct, and possibly apply it to trunk?
>
> I haven't checked if mana works as advertised now.
>
> Cheers,
>
>
> Randolf
I applied your patch to the trunk.
Best,
Stefan |
842
|
13 Dec 2012 |
Stefan Ritt | Bug Report | ss_thread_kill() kills entire program | The Linux thread functionality was introduced by Konstantin, so he might have a better idea about that.
What I usually do is a graceful thread shutdown just by a flag. Like
int stop_thread = 0;
INT f(void *param)
{
for (int x = 0; x < 100; x++) {
sleep(1);
if (stop_thread) {
// clean up things here...
return 0;
}
}
return 0;
}
int main()
{
printf("creating thread\n");
midas_thread_t thr = ss_thread_create(f, NULL);
sleep(2);
printf("killing thread\n");
stop_thread = 1;
sleep(2);
printf("success\n");
return 0;
}
This way I have a chance to clean up things in the thread, which otherwise I would not be able to. |
847
|
14 Dec 2012 |
Stefan Ritt | Suggestion | Midas + Elog with SSL | > I've been trying to set up midas to create an automatic elog entry at the end of
> each run and I've run into a problem. I've setup an elog on our server which
> uses SSL and it seems that the melog provided by midas to create logbook entries
> doesn't know any SSL.
>
> My solution to this was to copy the crypt.c from the elog package to the
> computer running midas and changed melog.c and the makefile to use SSL if a flag
> -s is used. Does this seem like a sensible solution or did I oversee the obvious
> and/or right way to do this?
Indeed melog.c is an old version of the elog.c utility in the elog package, which has not been maintained since a
long time. Can't you just use the recent elog.c utility from the elog package? |
850
|
20 Dec 2012 |
Stefan Ritt | Bug Report | MIDAS does not function correctly on F17 | If is not so easy to get out of zlib how many bytes have been written actually. I used an undocumented function,
which breaks down on 64-bit systems.
I now rewrote the code in mlogger.cxx to use lseek() to "measure" actually the output file and set the values
correctly. I tried on a few systems but am not 100% sure if it works everywhere. Can you please double check?
The fix is in SVN revision 5347.
/Stefan |
853
|
09 Jan 2013 |
Stefan Ritt | Bug Report | Outputting ADC and TDC data into ROOT tree with the MIDAS SVN Revision:5347. | Dear Bill,
the Midas analyzer "mana.c" is currently not maintained. At PSI we use the ROME framework (which might be too complicated for a
small experiment) and at TRIUMF the ROOTANA framework is used:
http://ladd00.triumf.ca/~olchansk/rootana/
You might be better off switching to that one.
Best regards,
Stefan |
857
|
01 Feb 2013 |
Stefan Ritt | Forum | analyzer cannot connect to the statistics database | > The simplest thing is probably to delete all files .[A-Z]*.SHM in the odb directory (the
> one you specified in /etc/exptab).
> This wipes the ODB, shared memory and all the other obscure stuff, giving you a clean,
> fresh start.
>
> Of course it wipes all the valuable stuff, too. That's why it's handy to sometimes open
> odbedit and "save odb_<yyyymmdd>.odb". You can reload the thing after such a fatal
> "rm .[A-Z]*.SHM"
Thanks Randolf for helping out, I was not in the office this week.
In addition of deleting the *SHM files, it's sometimes necessary to delete the shared memory. You do this with the
command line tools
ipcs -m
ipcrm -m <shmid>
/Stefan |
858
|
06 Feb 2013 |
Stefan Ritt | Info | Compression benchmarks | I redid the tests from Konstantin for our MEG experiment at PSI. The event structure is different, so it
is interesting how the two different experiments compare. We have an event size of 2.4 MB and a trigger
rate of ~10 Hz, so we produce a raw data rate of 24 MB/sec. A typical run contains 2000 events, so has a
size of 5 GB. Here are the results:
cat : time 7.8s, size 4960156030 4960156030, comp 0%, rate 639M/s 639M/s
gzip -1 : time 147.2s, size 4960156030 2468073901, comp 50%, rate 33M/s 16M/s
pbzip2 -p1 : time 679.6s, size 4960156030 1738127829, comp 65%, rate 7M/s 2M/s (1 CPU)
pbzip2 -p8 : time 96.1s, size 4960156030 1738127829, comp 65%, rate 51M/s 18M/s (8 CPU)
As one can see, our compression ratio is poorer (due to the quasi random noise in our waveforms), but the
difference between gzip -1 and pbzip2 is larger (15% instead 10% for DEAP). The single CPU version of
pbzip cannot sustain our DAQ rate of 24 MB, but the parallel version can. Actually we have a somehow old
dual-core dual-CPU board 2.5 GHz Xenon box, and make 8 hyper-threading CPUs out of the total 4 cores.
Interestingly the compression rate scales with 7.3 for 8 virtual cores, so hyper-threading does its job.
So we take all our data with the pbzip2 compression. The additional 15% as compared with gzip does
not sound much, but we produce raw 250 TB/year. So gzip gives us 132 TB/year and pbzip2 gives
us 98 TB/year, and we save quite some disks.
Note that you can run bzip2 (as all the other methods) already now with the current logger, if you specify
an external compression program in the ODB using the pipe functionality:
local:MEG:S]/>cd Logger/Channels/0/Settings/
[local:MEG:S]Settings>ls
Active y
Type Disk
Filename |pbzip2>/megdata/run%06d.mid.bz2
Format MIDAS
Compression 0
ODB dump y
Log messages 0
Buffer SYSTEM
Event ID -1
Trigger mask -1
Event limit 0
Byte limit 0
Subrun Byte limit 0
Tape capacity 0
Subdir format
Current filename /megdata/run197090.mid.bz2
</pre> |
860
|
11 Feb 2013 |
Stefan Ritt | Forum | send_tcp error | > I am getting a series of errors from MIDAS that I do not understand, so I hope
> someone can help me figure this out.
>
> I am attempting to run many frontends on one machine. I can run 8 with no
> problem, but if I try to add a 9th I get errors relating to send_tcp.
>
> I have tried adjusting the max event sizes and buffer sizes, but it has not
> resolved the problem. I also tried adjusting the data rates and the total data
> volume going through each frontend, but there was no change. And as far as I can
> tell I am not up against any hardware limits.
>
> The errors are repeated continuously while a run is going. The three errors I
> get are:
>
> 16:45:22 [FakeData09,ERROR] [midas.c:9958:rpc_client_call,ERROR] send_tcp() failed
> 16:45:22 [FakeData09,ERROR] [frontend_rpc.c:191:rpc_call,ERROR] No RPC to master
> 16:45:22 [FakeData09,ERROR] [system.c:4166:send_tcp,ERROR]
> send(socket=9,size=16) returned -1, errno: 32 (Broken pipe)
>
> If you have any suggestions of how I can debug this, please let me know. Thanks!
Can you tell me
- why you need 9 frontends
- what kind of data your frontends produce
- how your event builder looks like and how you assemble the fragments
- what messages/errors you see when you run odbedit BEFORE the crash
/Stefan |
862
|
12 Feb 2013 |
Stefan Ritt | Forum | send_tcp error | Ok, now the picture is clearer. I have however no idea what the real problem is. The number of concurrent programs in midas is 64 as defined in midas.h (MAX_CLIENTS) so that should not be the problem. In our experiment we run 10 front-ends (but
on 10 different machines) without problems. Other experiments used 27 front-ends.
The TCP error you see comes probably from the fact that the mserver side crashes or quits, then the socket gets broken. What you can try to debug this is to run mserver manually. Just remove mserver from inetd, and start it with "mserver -d" and
watch what happens. Do you see any additional error messages. If the mserver segfaults, you should turn on core dumps and have a look there. Note that the mserver starts a child process on each incoming connection, so running mserver in gdb
does not really help, since the child processes (which connect back to the front-ends) are not seen by gdb.
Have you tried to run the 9 front-ends on maybe two different PCs (5 and 4) to see if the problem is on the client side?
Best regards,
Stefan |
864
|
14 Feb 2013 |
Stefan Ritt | Info | Review of github and bitbucket | Let me add my five cents:
We use bitbucket now since two months at PSI, and are very happy with it.
Pros:
- We like the GIT flow model (http://nvie.com/posts/a-successful-git-branching-model/). You can at the same time do hot fixes, have a "distribution
version", and keep a development branch, where you can try new things without compromising the distribution.
- Nice and fast Web interface, especially the "blame" is lightning fast compared to SVN/CVS
- GIT is non-centralized, so your local clone of a repository contains everything. If bitbucket is down/asks for money, you can continue with your local
repository and clone it to some other hosting service, or host it yourself
- SourceTree (http://www.sourcetreeapp.com/) is a nice GUI for Mac lovers.
- Easy user management
- Free for academic use
Con:
- Wiki is limited as KO wrote, so it should not be used as a "full" wiki to replace Plone for example, just to annotate your project
- SVN revision number is gone. This is on purpose since it does not make sense any more if you keep several parallel branches (merging becomes a
nightmare), so one has to use either the (random) commit-ID or start tagging again.
So I conclusion, I would say that it's time to switch MIDAS to GIT. We'll probably do that in July when I will be at TRIUMF.
/Stefan |
|