ID |
Date |
Author |
Topic |
Subject |
806
|
20 Jun 2012 |
Konstantin Olchanski | Info | midas vme benchmarks | > I am recording here the results from a test VME system using two VF48 waveform digitizers
Note 1: data compression is about 89% (hence "data to disk" rate is much smaller than the "data from VME" rate)
Note 2: switch from VME MBLT64 block transfer to 2eVME block transfer:
- raises the VME data rate from 40 to 48 M/s
- event rate from 220/sec to 260/sec
- mlogger CPU use from 64% to about 80%
This is consistent with the measured VME block transfer rates for the VF48 module: MBLT64 is about 40 M/s, 2eVME is about 50 M/s (could be
80 M/s if no clock cycles were lost to sync VME signals with the VF48 clocks), 2eSST is implemented but impossible - VF48 cannot drive the
VME BERR and RETRY signals. Evil standards, grumble, grumble, grumble).
K.O. |
805
|
20 Jun 2012 |
Konstantin Olchanski | Info | midas vme benchmarks | I am recording here the results from a test VME system using two VF48 waveform digitizers and a 64-bit
dual-core VME processor (V7865). VF48 data suppression is off, VF48 modules set to read 48 channels,
1000 ADC samples each. mlogger data compression is enabled (gzip -1).
Event rate is about 200/sec
VME Data rate is about 40 Mbytes/sec
System is 100% busy (estimate)
System utilization of host computer (dual-core 2.2GHz, dual-channel DDR333 RAM):
(note high CPU use by mlogger for gzip compression of midas files)
top - 12:23:45 up 68 days, 20:28, 3 users, load average: 1.39, 1.22, 1.04
Tasks: 193 total, 3 running, 190 sleeping, 0 stopped, 0 zombie
Cpu(s): 32.1%us, 6.2%sy, 0.0%ni, 54.4%id, 2.7%wa, 0.1%hi, 4.5%si, 0.0%st
Mem: 3925556k total, 3797440k used, 128116k free, 1780k buffers
Swap: 32766900k total, 8k used, 32766892k free, 2970224k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
5169 trinat 20 0 246m 108m 97m R 64.3 2.8 29:36.86 mlogger
5771 trinat 20 0 119m 98m 97m R 14.9 2.6 139:34.03 mserver
6083 root 20 0 0 0 0 S 2.0 0.0 0:35.85 flush-9:3
1097 root 20 0 0 0 0 S 0.9 0.0 86:06.38 md3_raid1
System utilization of VME processor (dual-core 2.16 GHz, single-channel DDR2 RAM):
(note the more than 100% CPU use of multithreaded fevme)
top - 12:24:49 up 70 days, 19:14, 2 users, load average: 1.19, 1.05, 1.01
Tasks: 103 total, 1 running, 101 sleeping, 1 stopped, 0 zombie
Cpu(s): 6.3%us, 45.1%sy, 0.0%ni, 47.7%id, 0.0%wa, 0.2%hi, 0.6%si, 0.0%st
Mem: 1019436k total, 866672k used, 152764k free, 3576k buffers
Swap: 0k total, 0k used, 0k free, 20976k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
19740 trinat 20 0 177m 108m 984 S 104.5 10.9 1229:00 fevme_gef.exe
1172 ganglia 20 0 416m 99m 1652 S 0.7 10.0 1101:59 gmond
32353 olchansk 20 0 19240 1416 1096 R 0.2 0.1 0:00.05 top
146 root 15 -5 0 0 0 S 0.1 0.0 42:52.98 kslowd001
Attached are the CPU and network ganglia plots from lxdaq09 (VME) and ladd02 (host).
The regular bursts of "network out" on ladd02 is lazylogger writing mid.gz files to HADOOP HDFS.
K.O. |
Attachment 1: lxdaq09cpu.gif
|
|
Attachment 2: lxdaq09net.gif
|
|
Attachment 3: ladd02cpu.gif
|
|
Attachment 4: ladd02net.gif
|
|
804
|
20 Jun 2012 |
Konstantin Olchanski | Info | lazylogger write to HADOOP HDFS | I tried using the lazylogger "Disk" method to write into a HADOOP HDFS clustered filesystem and found a
number of problems. I ended up replacing the lazylogger lazy_copy() function that still uses former YBOS
code with a new lazy_disk_copy() function that uses generic fread/fwrite. Also fixed the situation where
lazylogger cannot cleanly stop from the mhttpd "programs/stop" button while it is busy writing (the fix
works only for the "Disk" method).
(Note that one can also use the "Script" method for writing into HDFS)
Anyhow, the new lazylogger writes into HDFS just fine and I expect that it would also work for writing into
DCACHE using PNFS (if ever we get the SL6 PNFS working with our DCACHE servers).
Writing into our test HDFS cluster runs at about 20 MiBytes/sec for 1GB files with replication set to 3.
svn rev 5295
K.O. |
803
|
15 Jun 2012 |
Konstantin Olchanski | Bug Report | _net_send_buffer realloc | > 2) cm_disconect_experiment() calls free(_net_send_buffer) but does not set its
> value to NULL.
Set pointer to NULL after free() in these files:
M odb.c
M sequencer.cxx
M mlogger.cxx
M mhttpd.cxx
M midas.c
svn rev 5294
K.O. |
802
|
15 Jun 2012 |
Konstantin Olchanski | Bug Report | bk_delete uses memcpy instead of memmove | > In midas.c, the bk_delete function removes a bank by decrementing the total
> event size and then copying the remaining banks into the location of the first
> using memcpy from string.h.
Replaced some memcpy() with memmove(), including bk_delete().
svn rev 5293
K.O. |
801
|
14 Jun 2012 |
Konstantin Olchanski | Bug Report | Cannot start/stop run through mhttpd | > > > Revision: r5286
> > > Platform: Debian Linux 6.0.5 AMD64, with packages from squeeze-backports
>
> I found the problem only appears when I run mhttpd in scripts, whether bash or python.
> And I'm quite sure that the MIDAS environments (e.g. PATH, MIDAS_EXPTAB, MIDASSYS, etc.)
> are set in such scripts. If I start mhttpd in an xterm with or without "-D", it works
> fine.
Right. I see Debian 6.0.5 just came out hot off the presses. Would be good to fix this problem.
As a work around, can you run mhttpd without "-D", but in the background, i.e. "mhttpd -p xxx >& mhttpd.log &"?
Also what are your $PATH settings?
> So, what's the difference between invoking mhttpd directly and through a script?
As Stefan mentioned, "-D" invokes some nasty unix magic to disconnect the process from the user login session. It is
possible that this magic breaks in the latest Debian.
MIDAS "-D" does roughly the same thing as "nohup".
K.O. |
800
|
14 Jun 2012 |
Konstantin Olchanski | Bug Report | Cannot start/stop run through mhttpd | > > I found the problem only appears when I run mhttpd in scripts, whether bash or python.
> > And I'm quite sure that the MIDAS environments (e.g. PATH, MIDAS_EXPTAB, MIDASSYS, etc.)
> > are set in such scripts. If I start mhttpd in an xterm with or without "-D", it works
> > fine. So, what's the difference between invoking mhttpd directly and through a script?
>
> When you start it with "-D", then mhttpd become a daemon. According to linux rules, it has to "cd /", so it lives in the
> root directory, in order not to block any NFS mount/unmount. If something with the path is not correct then, mhttpd
> cannot find mtransition then. Once I fixed that problem my moving mtransition to /usr/bin.
>
I agree. Somehow mhttpd cannot run mtransition. I am not super happy with this dependance on user $PATH settings and the inability to capture error messages
from attempts to start mtransition. I am now thinking in the direction of running mtransition code by forking. But remember that mlogger and the event builder also
have to use mtransition to stop runs (otherwise they can dead-lock). So an mhttpd-only solution is not good enough...
K.O. |
799
|
14 Jun 2012 |
Stefan Ritt | Bug Report | Cannot start/stop run through mhttpd | > I found the problem only appears when I run mhttpd in scripts, whether bash or python.
> And I'm quite sure that the MIDAS environments (e.g. PATH, MIDAS_EXPTAB, MIDASSYS, etc.)
> are set in such scripts. If I start mhttpd in an xterm with or without "-D", it works
> fine. So, what's the difference between invoking mhttpd directly and through a script?
When you start it with "-D", then mhttpd become a daemon. According to linux rules, it has to "cd /", so it lives in the
root directory, in order not to block any NFS mount/unmount. If something with the path is not correct then, mhttpd
cannot find mtransition then. Once I fixed that problem my moving mtransition to /usr/bin.
Stefan |
798
|
14 Jun 2012 |
Exaos Lee | Bug Report | Cannot start/stop run through mhttpd | > > Revision: r5286
> > Platform: Debian Linux 6.0.5 AMD64, with packages from squeeze-backports
> > Problem:
> > After building and installation, using the script 'start_daq.sh' to start
> > 'sampleexpt'. Everything seems fine. But I cannot start a run through web. Using
> > 'odbedit' and 'mtransition' to start/stop a run works fine. So, what may cause
> > such a problem?
>
> Well, it's mhttpd who cannot start the run, not you. So what happens when you press
> the "start run" button? Any errors in midas.log or in midas messages? Is mtransition
> in your PATH?
>
> K.O.
I found the problem only appears when I run mhttpd in scripts, whether bash or python.
And I'm quite sure that the MIDAS environments (e.g. PATH, MIDAS_EXPTAB, MIDASSYS, etc.)
are set in such scripts. If I start mhttpd in an xterm with or without "-D", it works
fine. So, what's the difference between invoking mhttpd directly and through a script? |
797
|
13 Jun 2012 |
Exaos Lee | Bug Report | Cannot start/stop run through mhttpd | > Well, it's mhttpd who cannot start the run, not you. So what happens when you press
> the "start run" button? Any errors in midas.log or in midas messages? Is mtransition
> in your PATH?
After pressing "start run", there is a message displayed: "Run start requested". There
is no error in midas.log. And mtransition is actually in my PATH. I even looked into
"mhttpd.cxx" and found where "cm_transition" is called for starting a run. I have no
clue to grasp the reason. |
796
|
13 Jun 2012 |
Konstantin Olchanski | Forum | ladd00.triumf.ca https ssl certificate update | The HTTPS SSL certificate on ladd00.triumf.ca has been updated. Same as the old
certificate, the new one is self-signed and your web browser may complain about
that and ask you to "save a security exception".
When you save the new certificate, you can verify that you are connected to the
real ladd00.triumf.ca by comparing the "SHA1 fingerprint" reported by your web
browser to the one given below (as reported by "svn update"):
Certificate information:
- Hostname: ladd00.triumf.ca
- Valid: from Wed, 13 Jun 2012 22:31:51 GMT until Thu, 13 Jun 2013 22:31:51 GMT
- Issuer: DAQ, TRIUMF, Vancouver, BC, CA
- Fingerprint: 82:95:78:cb:78:d3:93:1d:d4:c8:e8:1a:64:0f:62:04:2d:0e:c3:4a
K.O. |
795
|
13 Jun 2012 |
Konstantin Olchanski | Bug Report | Cannot start/stop run through mhttpd | > Revision: r5286
> Platform: Debian Linux 6.0.5 AMD64, with packages from squeeze-backports
> Problem:
> After building and installation, using the script 'start_daq.sh' to start
> 'sampleexpt'. Everything seems fine. But I cannot start a run through web. Using
> 'odbedit' and 'mtransition' to start/stop a run works fine. So, what may cause
> such a problem?
Well, it's mhttpd who cannot start the run, not you. So what happens when you press
the "start run" button? Any errors in midas.log or in midas messages? Is mtransition
in your PATH?
K.O. |
794
|
13 Jun 2012 |
Exaos Lee | Bug Report | Cannot start/stop run through mhttpd | Revision: r5286
Platform: Debian Linux 6.0.5 AMD64, with packages from squeeze-backports
Problem:
After building and installation, using the script 'start_daq.sh' to start
'sampleexpt'. Everything seems fine. But I cannot start a run through web. Using
'odbedit' and 'mtransition' to start/stop a run works fine. So, what may cause
such a problem? |
793
|
11 Jun 2012 |
Konstantin Olchanski | Bug Report | _net_send_buffer realloc | > > > In midas.c, ...
> > >
> > > 1) _net_send_buffer is not set to NULL when declared.
>
> Ah,okay. I was not aware of this feature of global variables.
>
RTFM K&R "The C programming language".
http://en.wikipedia.org/wiki/The_C_Programming_Language
>
> > > 2) cm_disconect_experiment() calls free(_net_send_buffer) but does not set
> its value to NULL.
>
Confirmed. Sorry for confusion in my previous message. Set the pointer to NULL after free() is good practice.
But note that calling cm_connect and cm_disconnect multiple times is unusual use of MIDAS and you will most
likely find more breakage.
K.O. |
792
|
10 Jun 2012 |
Greg Christian | Bug Report | _net_send_buffer realloc | > > In midas.c, ...
> >
> > 1) _net_send_buffer is not set to NULL when declared.
>
> _net_send_buffer is a global variable. All global variables are automatically
initialized to zero before the program
> starts.
>
> static char*x; // = NULL; is redundant
> char*y=realloc(x, 100); // x is NULL, usage is correct
>
Ah,okay. I was not aware of this feature of global variables.
> > 2) cm_disconect_experiment() calls free(_net_send_buffer) but does not set
its
> > value to NULL.
>
> My copy of midas.c (svn rev 5256) sets _net_send_buffer to NULL:
>
> if (_net_send_buffer_size > 0) {
> M_FREE(_net_send_buffer);
> _net_send_buffer_size = 0;
> }
>
> What version of midas do you have? (svn info .)
>
> K.O.
I have version 5256 also (matches what you posted), but I only see
_net_send_buffer_size being set to 0, not _net_send_buffer itself. In midas.h,
M_FREE(x) only expands to free(x) if _MEM_DBG is not defined. |
791
|
10 Jun 2012 |
Konstantin Olchanski | Bug Report | _net_send_buffer realloc | > In midas.c, ...
>
> 1) _net_send_buffer is not set to NULL when declared.
_net_send_buffer is a global variable. All global variables are automatically initialized to zero before the program
starts.
static char*x; // = NULL; is redundant
char*y=realloc(x, 100); // x is NULL, usage is correct
> 2) cm_disconect_experiment() calls free(_net_send_buffer) but does not set its
> value to NULL.
My copy of midas.c (svn rev 5256) sets _net_send_buffer to NULL:
if (_net_send_buffer_size > 0) {
M_FREE(_net_send_buffer);
_net_send_buffer_size = 0;
}
What version of midas do you have? (svn info .)
K.O. |
790
|
09 Jun 2012 |
Greg Christian | Bug Report | _net_send_buffer realloc | In midas.c, I noticed that memory is only allocated to the global buffer
_net_send_buffer by calling realloc() from within the function
resize_net_send_buffer() (at least this was the only place I could find
allocation to _net_send_buffer happening). This can cause problems for a couple
of reasons:
1) _net_send_buffer is not set to NULL when declared. To my understanding, this
makes the first call to realloc(_net_send_buffer, /*size*/) undefined. When
passed a pointer that has not previously been allocated, realloc() acts like
malloc() only if the pointer equal to NULL. Otherwise, the behavior is undefined
and usually causes a crash.
2) cm_disconect_experiment() calls free(_net_send_buffer) but does not set its
value to NULL. Thus if a client tries to include more than one
connect...disconnect cycle within an application, there is undefined behavior
the next time realloc(_net_send_buffer, ...) gets called.
I think that any potential allocation issues involving _net_send_buffer could be
solved by:
1) Initializing _net_send_buffer to NULL.
2) In cm_disconnect_experiment(), changing
> M_FREE(_net_send_buffer);
to
> M_FREE(_net_send_buffer);
> _net_send_buffer = NULL; |
789
|
27 Apr 2012 |
Stefan Ritt | Bug Report | Build error with mlogger: invalid conversion from ‘void*’ to ‘gzFile’ |
KO wrote: | BTW, I read the midas elog via email and if you post html or elcode messages, I receive complete
gibberish. For prompt service, please select message type "plain". (yes, you cannot use fancy colours and
blinking text, but better than me not reading your stuff at all).
BTW2, for easier reading, please include error messages as plain text in your message. As opposed to
compressed attachements.
K.O.
|
BTW3, if you use a real email program you don't get glibberish. I know some people prefer good-old-text-only pine, but I'm sure you do not use the ascii-only browser lynx to browse the internet, right? So if you browse the web in graphics, why not read your email in graphics as well. Better change yourself than the whole rest of the world |
788
|
25 Apr 2012 |
Konstantin Olchanski | Bug Report | Build error with mlogger: invalid conversion from ‘void*’ to ‘gzFile’ | Stefan's fix is incomplete - the "gzFile" cast is needed for all calls to zlib, not just those that some version
of GCC happens to complain about. Fixed.
svn rev 5286.
BTW, I read the midas elog via email and if you post html or elcode messages, I receive complete
gibberish. For prompt service, please select message type "plain". (yes, you cannot use fancy colours and
blinking text, but better than me not reading your stuff at all).
BTW2, for easier reading, please include error messages as plain text in your message. As opposed to
compressed attachements.
K.O. |
787
|
19 Apr 2012 |
Stefan Ritt | Bug Report | Build error with mlogger: invalid conversion from ‘void*’ to ‘gzFile’ |
Exaos Lee wrote: | I tried to build MIDAS under ArchLinux, failed on errors as following:src/mlogger.cxx: In function ‘INT midas_flush_buffer(LOG_CHN*)’:
src/mlogger.cxx:1011:54: error: invalid conversion from ‘void*’ to ‘gzFile’ [-fpermissive]
In file included from src/mlogger.cxx:33:0:
/usr/include/zlib.h:1318:21: error: initializing argument 1 of ‘int gzwrite(gzFile, voidpc, unsigned int)’ [-fpermissive]
src/mlogger.cxx: In function ‘INT midas_log_open(LOG_CHN*, INT)’:
src/mlogger.cxx:1200:79: error: invalid conversion from ‘void*’ to ‘gzFile’ [-fpermissive]
In file included from src/mlogger.cxx:33:0: Please refer to attachment elog:786/1 for detail. There are also many warnings listed.
This error can be supressed by adding -fpermissive to CXXFLAGS. But the error message is correct."gzFile" is not equal to "void *"! C allows implicit casts between void* and any pointer type, C++ doesn't allow that. It's better to fix this error. A quick fix would be adding explicit casts. But I'm not sure what is the proper way to fix this. |
Ah, dumb gcc gets pickier and pickier. I added a case (gzFile)log_chn->gzfile which fixes the error. I cannot put gzFile already into the header file since the zlib header is included after the midas header, otherwise we get some other problems. The SVN version with the fix is 5275. |
|