Back Midas Rome Roody Rootana
  Midas DAQ System, Page 94 of 146  Not logged in ELOG logo
    Reply  24 Sep 2006, Stefan Ritt, Bug Report, mhttpd elog corruption via double-edit 

K.O. wrote:
Aparently the mhttpd elog will corrupt the elog files if two (or more\?) elog entries are being edited at the same time. K.O.


That's strange. Since mhttpd is single threaded, there should not be any multi-thread/process conflict there, since the elog files cannot be written simultaneously from two different browser sessions. If entries are edited at the same time, they get then submitted one after the other. Of course it is possible to edit the same entry, in which case the second submission "wins", overwriting the first one without notification. Withing the standalone elog server there is the option to lock entries ("use lock = 1") to prevent this, but this feature is not present in the mhttpd elog.
    Reply  27 Sep 2006, Konstantin Olchanski, Bug Report, mhttpd elog corruption via double-edit 
[quote="Stefan Ritt"][Quote="K.O.]Aparently the mhttpd elog will corrupt the
elog files if two (or more\?) elog entries are being edited at the same time.
K.O.[/quote]

The corruption is very simple. mhttpd elog indexes the elog entries by the elog
file and offset inside the file, i.e. "http://ladd00:8088/EL/060927.318",
"060927" corresponds to log file "060927.log", "318" is the offset inside the
file where the message is located.

During "edit", the code "remembers" the offset of the original message and in
el_submit() blindly writes the edited message into the file at the remembered
offset.

If another message was edited before the edit of the first message is submitted,
the remembered offset becomes invalid (messages have shifted inside the file)
and el_submit() writes the edited text into the wrong place in the file,
corrupting it.

I have now added a check for this and we crash instead of corrupting the elog
file (midas.c rev 3340).

I do not know how to "properly" fix this bug without changing the indexing
scheme to something similar to what is used by elogd- message numbers instead of
file indices. In the existing scheme, message editing also breaks URLs shown in
the email notifications (they contain file indices that point to the wrong
places after messages are moved around by editing) and "reply threading" links.

Here is how I reproduce this bug:

1) start with an empty elog
2) create two messages
3) "edit" the second message, but do not submit it yet.
4) "edit" the first message, change the text to make sure the message size
becomes different; submit this change.
5) submit the "edit" of the first message. !!BOOM!!

K.O.
    Reply  28 Sep 2006, Stefan Ritt, Bug Report, mhttpd elog corruption via double-edit 
> I do not know how to "properly" fix this bug without changing the indexing
> scheme to something similar to what is used by elogd- message numbers instead of
> file indices. In the existing scheme, message editing also breaks URLs shown in
> the email notifications (they contain file indices that point to the wrong
> places after messages are moved around by editing) and "reply threading" links.

Well, the development of elogd with it's message numbers was actually stimulated by
the problem you mentioned. After that all those problems went away. Another
incarnation of that problem is if you edit an mhttpd log file manually. Afterwards
the file offsets are different and the system gets corrupted. To fix this properly,
one would have to backport the el_xxx functions from elogd to mhttpd, or, even
simpler, remove the elog functionality in mhttpd and "force" everybody to use elogd
(after doing elconv to convert the files into the new format).
Entry  21 Jan 2007, Denis Bilenko, Bug Report, buffer bugs 
Hello,

We've been using midas and have stumbled upon some inconsistent behaviour:
1. Blocking calls to midas api aren't usable when client is connected through
mserver. This is true at least for bm_receive_event, but seems to be a more
general problem - midas application has call cm_yield within 10 seconds (or
whatever timeout is set) to remain alive.
That not the case when RPC is not used.

2. On Windows, two processes on the same machine can send/receive events to
each other only if they both use midas locally (through shared mem) or they
both use midas via RPC (through mserver), but not if they use different ways.

3. Receiving/sending same events from the same process - was possible in
1.9.5-1, not so in the current version (revision 3501, mxml revision 45). Is this an intended behavior fix?
To explain how to reproduce bugs, I will use 2 helper programs evprint.py and
evsend.py - for receiving and sending events respectively. You don't need
them, just something to send and receive events. (These are part of pymidas, which will be
released to public any time soon, but is quite usable already).

They both accept
* --path option in "host/experiment" format (for cm_connect_experiment call)
* --log option which command them to trace all midas' calls to terminal

evprint.py have two ways of receiving events
1) via looping over bm_receive_event
2) via providing callback to bm_request_event and looping over cm_yield(400) call

Example of use:
first-console$ python evprint.py receive
second-console$ python evsend.py 123
[first console]
id=2007 mask=2007 serial=2007 time=1169372833 len=3 '123'
So,

1. Blocking calls to midas api aren't usable when client is connected through
mserver.
$ python evprint.py --log --path 127.0.0.1/online receive"
cm_connect_experiment('127.0.0.1', 'online', 'evprint.py', None)
bm_open_buffer('SYSTEM', 1048576, &c_long(2)) -> BM_CREATED
bm_request_event(2, -1, -1, 2, &c_long(0), None)
... wait for a couple of seconds ...
[midas.c:9348:rpc_call] rpc timeout, routine = "bm_receive_event"
[system.c:3570:send_tcp] send(socket=0,size=8) returned -1, errno: 88 (Socket 
operation on non-socket)
[midas.c:9326:rpc_call] send_tcp() failed

bm_receive_event(2, ...) -> RPC_TIMEOUT

bm_remove_event_request(2, 0) -> BM_INVALID_HANDLE
bm_close_buffer(2) -> BM_INVALID_HANDLE
cm_disconnect_experiment()

2. Missing events on windows
a) Both use midas locally - works
   1: python evprint.py receive
   2: python evsend.py 123
   1: id=2007 mask=2007 serial=2007 time=1169372833 len=3 '123'
b) Both use midas via RPC - works
   1: python evprint.py --path 127.0.0.1/ dispatch
   2: python evsend.py --path 127.0.0.1/ 123
   1: id=2007 mask=2007 serial=2007 time=1169373366 len=3 '123'
c) Receiver uses midas locally, sender uses mserver - doesn't work on windows
   1: python evprint.py dispatch
   2: python evsend.py --path 127.0.0.1/ 123
   1: (nothing printed)
d) The other way around - doesn't work on windows
   1: python evprint.py --path 127.0.0.1/ dispatch
   2: python evsend.py 123
   1: (nothing printed)
No such problem on linux.

3. Receiving/sending same events from the same process.
To reproduce this, just request events, send one and then try to receive
it – via cm_yield. I care for this, because I have a test in pymidas which
relies on this behavior.

hope this will help.
    Reply  22 Jan 2007, Stefan Ritt, Bug Report, buffer bugs 

Denis Bilenko wrote:
1. Blocking calls to midas api aren't usable when client is connected through mserver. This is true at least for bm_receive_event, but seems to be a more general problem - midas application has call cm_yield within 10 seconds (or whatever timeout is set) to remain alive.
That not the case when RPC is not used.


The 10 seconds timeout you see comes from the RPC layer. If you call bm_receive_event and it blocks, then the client will consider a RPC timeout after 10 seconds. Has nothing to do with cm_yield(). Calling a blocking function via a sever connection is not a good idea anyhow, since this process then cannot respond on anything else, like run transitions. That's why I never used it and that's why I have not realized that behaviour. I did change it however such that bm_receive_event, if called without the ASYNC flag, disables the RPC timeout for this call and restores it afterwards. This is now in midas.c revision 3502. You can try this with midas/examples/lowlevel/produce and consume easily.


Denis Bilenko wrote:
2. On Windows, two processes on the same machine can send/receive events to each other only if they both use midas locally (through shared mem) or they both use midas via RPC (through mserver), but not if they use different ways.


I just tried again and it did work. I used produce/consume. If you enter just <return> for the host name, these programs connect locally. So I tried both producer locally, consumer remote, and vice versa, and both worked. I did however use consume with the callback functionality. I did not try your Python programs however. If you find out that produce/consume does work and your Python program don't, then adapt your Python programs to resemble produce/consume.


Denis Bilenko wrote:
3. Receiving/sending same events from the same process - was possible in 1.9.5-1, not so in the current version (revision 3501, mxml revision 45). Is this an intended behavior fix?


Yes. It was introduced in revision 3186 on July 28th, 2006. It fixed a problem that the buffer level was always shown as 100% full, even if there were no other clients registered. By ignoring the own process, the buffer level now correctly shows the "contents" of a buffer from 0..100%. It also gave a small speed improvement. If you want to send events to the own process, you have to do it from the calling level. Like if you call bm_send_event(), you call manually process_event or however your event receiving routine is called. This is also much faster than going through the buffer.
    Reply  23 Jan 2007, Denis Bilenko, Bug Report, buffer bugs 
1 & 3 - thanks for the fix and the explanation, as for 2 - I've tried consume and produce
and still has a problem:

Config: GET_ALL, event id = 1, event size = 10, Receive via callback,
OS = Windows XP SP2
I restart mserver manually from command-line every time (not using system service).
I start produce first, then I start consume.
In two cases of four starting 'consume' causes 'produce' to exit immediatelly.
Guess which two Smile

both local or both remote - works (i.e. non-zero rates in both consoles)
produce local, consume via rpc and vice versa - 'produce' exits with error

1. produce via rpc, consume locally

first console:
D:\denis\cmd\midas\current\06jan21-export\midas\NT\bin>produce.exe
ID of event to produce: 1
Host to connect: 127.0.0.1
Event size: 10
Level:   0.0 %, Rate: 0.64 MB/sec
flush
Level:   0.0 %, Rate: 0.64 MB/sec
Level:   0.0 %, Rate: 0.63 MB/sec
Level:   0.0 %, Rate: 0.64 MB/sec
Level:   0.0 %, Rate: 0.61 MB/sec
Level:   0.0 %, Rate: 0.62 MB/sec
Level:   0.0 %, Rate: 0.62 MB/sec
Level:   0.0 %, Rate: 0.64 MB/sec
Level:   0.0 %, Rate: 0.63 MB/sec
Level:   0.0 %, Rate: 0.63 MB/sec
Level:   0.0 %, Rate: 0.64 MB/sec
flush
Level:   0.0 %, Rate: 0.62 MB/sec

## Now I've started consume in the other console ##

[system.c:3570:send_tcp] send(socket=1900,size=8136) returned -1, errno: 0 (No error)
send_tcp() returned -1
[midas.c:9669:rpc_send_event] send_tcp() failed
rpc_send_event returned error 503, event_size 10
second console:
D:\denis\cmd\midas\current\06jan21-export\midas\NT\bin>consume.exe
ID of event to request: 1
Host to connect:
Get all events (0/1): 1
Receive via callback ([y]/n):
Level:   0.0 %, Rate: 0.00 MB/sec, ser mismatches: 0
Level:   0.0 %, Rate: 0.00 MB/sec, ser mismatches: 0
Level:   0.0 %, Rate: 0.00 MB/sec, ser mismatches: 0
Received break. Aborting...
mserver's output:
D:\denis\cmd\midas\current\06jan21-export\midas\NT\bin\mserver.exe started interactively
[midas.c:2315:bm_validate_client_index] Invalid client index 0 in buffer 'SYSTEM'.
Client name 'Power Consumer', pid 1964 should be 3216
2. produce locally, consume via rpc
D:\denis\cmd\midas\current\06jan21-export\midas\NT\bin>produce.exe
ID of event to produce: 1
Host to connect:
Event size: 10
Client 'Producer' (PID 2584) on 'ODB' removed by cm_watchdog (idle 144.1s,TO 10s)
Level:   0.0 %, Rate: 3.20 MB/sec
flush
Level:   0.0 %, Rate: 3.20 MB/sec
Level:   0.0 %, Rate: 3.11 MB/sec
Level:   0.0 %, Rate: 3.13 MB/sec
Level:   0.0 %, Rate: 3.06 MB/sec
Level:   0.0 %, Rate: 3.20 MB/sec
Level:   0.0 %, Rate: 2.96 MB/sec
Level:   0.0 %, Rate: 3.11 MB/sec
Level:   0.0 %, Rate: 3.18 MB/sec
Level:   0.0 %, Rate: 3.13 MB/sec
Level:   0.0 %, Rate: 3.17 MB/sec
flush
Level:   0.0 %, Rate: 3.19 MB/sec
Level:   0.0 %, Rate: 3.08 MB/sec
Level:   0.0 %, Rate: 3.06 MB/sec

## Now I've started consume ##

[midas.c:2315:bm_validate_client_index] Invalid client index 0 in buffer 'SYSTEM'. Client name '', pid 0 should be 760
Second console:
D:\denis\cmd\midas\current\06jan21-export\midas\NT\bin>consume.exe
ID of event to request: 1
Host to connect: 127.0.0.1
Get all events (0/1): 1
Receive via callback ([y]/n):
Level:   0.0 %, Rate: 0.00 MB/sec, ser mismatches: 0
Level:   0.0 %, Rate: 0.00 MB/sec, ser mismatches: 0
Received break. Aborting...
Level:   0.0 %, Rate: 0.00 MB/sec, ser mismatches: 0
mserver haven't said anything.

3. Both remote (just for comparison)
D:\denis\cmd\midas\current\06jan21-export\midas\NT\bin>produce.exe
ID of event to produce: 1
Host to connect: 127.0.0.1
Event size: 10
Level:   0.0 %, Rate: 0.65 MB/sec
flush
Level:   0.0 %, Rate: 0.66 MB/sec
Level:   0.0 %, Rate: 0.65 MB/sec
Level:   0.0 %, Rate: 0.60 MB/sec
Level:   0.0 %, Rate: 0.64 MB/sec
Level:   0.0 %, Rate: 0.63 MB/sec
Level:   0.0 %, Rate: 0.61 MB/sec
Level:   0.0 %, Rate: 0.63 MB/sec
Level:   0.0 %, Rate: 0.65 MB/sec
Level:   0.0 %, Rate: 0.65 MB/sec
Level:   0.0 %, Rate: 0.67 MB/sec
flush
Level:   0.0 %, Rate: 0.66 MB/sec
Level:   0.0 %, Rate: 0.65 MB/sec
Level:   0.0 %, Rate: 0.65 MB/sec
Level:   0.0 %, Rate: 0.66 MB/sec
Level:   0.0 %, Rate: 0.66 MB/sec
Level:   0.0 %, Rate: 0.65 MB/sec
Level:   0.0 %, Rate: 0.66 MB/sec
Level:   0.0 %, Rate: 0.66 MB/sec
Level:   0.0 %, Rate: 0.66 MB/sec
Level:  66.8 %, Rate: 0.66 MB/sec
flush
Level:   0.0 %, Rate: 0.00 MB/sec
Level:  66.8 %, Rate: 0.31 MB/sec
Level:  57.2 %, Rate: 0.15 MB/sec
Level:  57.3 %, Rate: 0.14 MB/sec
Level:  57.3 %, Rate: 0.15 MB/sec
Level:  57.3 %, Rate: 0.14 MB/sec
Level:  57.3 %, Rate: 0.14 MB/sec
Level:  57.3 %, Rate: 0.14 MB/sec
Received break. Aborting...
Received 2nd break. Hard abort.
[midas.c:1581:] cm_disconnect_experiment not called at end of program
Second console:
D:\denis\cmd\midas\current\06jan21-export\midas\NT\bin>consume.exe
ID of event to request: 1
Host to connect: 127.0.0.1
Get all events (0/1): 1
Receive via callback ([y]/n):
[consume.c:73:process_event] Serial number mismatch: Ser: 1397076, OldSer: 0, ID: 1, size: 10
Level:  37.1 %, Rate: 0.00 MB/sec, ser mismatches: 1
Level:   0.0 %, Rate: 0.15 MB/sec, ser mismatches: 1
Level:  95.4 %, Rate: 0.08 MB/sec, ser mismatches: 1
Level:  66.8 %, Rate: 0.14 MB/sec, ser mismatches: 1
Level:  66.8 %, Rate: 0.12 MB/sec, ser mismatches: 1
Level:  76.3 %, Rate: 0.12 MB/sec, ser mismatches: 1
Level:  95.4 %, Rate: 0.11 MB/sec, ser mismatches: 1
Level:  57.3 %, Rate: 0.15 MB/sec, ser mismatches: 1
Level:  66.8 %, Rate: 0.11 MB/sec, ser mismatches: 1
Level:  85.9 %, Rate: 0.11 MB/sec, ser mismatches: 1
Level:  95.5 %, Rate: 0.12 MB/sec, ser mismatches: 1
Level:  57.4 %, Rate: 0.15 MB/sec, ser mismatches: 1
Level:   9.7 %, Rate: 0.15 MB/sec, ser mismatches: 1
[Producer] [midas.c:1581:] cm_disconnect_experiment not called at end of program
Level:   0.0 %, Rate: 0.03 MB/sec, ser mismatches: 1
Level:   0.0 %, Rate: 0.00 MB/sec, ser mismatches: 1
Received break. Aborting...
    Reply  23 Jan 2007, Stefan Ritt, Bug Report, buffer bugs 

Denis Bilenko wrote:
1 & 3 - thanks for the fix and the explanation, as for 2 - I've tried consume and produce
and still has a problem


Acknowledged. I could reproduce it with the information you supplied, thank you very much. Also the data rate is slower than what I expect. I will investigate and fix this, but it could take some time.
    Reply  24 Jan 2007, Stefan Ritt, Bug Report, buffer bugs 
I tried again and could not reproduce the problem. Last time I was probably confused by some old mserver.exe executable I had lying around. I updated to the most recent version (3516) and did a C:\midas> nmake -f makefile.nt. Last time I was also confused about the low rate, but that was caused by a mserver.exe executable which was not compiled with optimization. For small event sizes (such as 10 bytes) there is a big difference between optimized and non-optimized code. So I got:


First Console wrote:
ID of event to produce: 1
Host to connect: localhost
Event size: 10
Level:   0.0 %, Rate: 0.46 MB/sec
flush
Level:   0.0 %, Rate: 0.43 MB/sec
Level:   0.0 %, Rate: 0.43 MB/sec
Level:   0.0 %, Rate: 0.42 MB/sec
Level:   0.0 %, Rate: 0.42 MB/sec
Level:   0.0 %, Rate: 0.43 MB/sec
Level:   0.0 %, Rate: 0.43 MB/sec
Level:   0.0 %, Rate: 0.44 MB/sec
Level:   0.0 %, Rate: 0.42 MB/sec
Level:   0.0 %, Rate: 0.43 MB/sec
Level:   0.0 %, Rate: 0.43 MB/sec
flush
Level:   0.0 %, Rate: 0.44 MB/sec
Level:   0.0 %, Rate: 0.44 MB/sec
Level:   0.0 %, Rate: 0.40 MB/sec
Level:   0.0 %, Rate: 0.42 MB/sec
Level:   0.0 %, Rate: 0.43 MB/sec
Level:   0.0 %, Rate: 0.43 MB/sec
Level:   0.0 %, Rate: 0.44 MB/sec
Level:   0.0 %, Rate: 0.43 MB/sec
Level:   0.0 %, Rate: 0.43 MB/sec
Level:   0.0 %, Rate: 0.43 MB/sec
flush


and


Second Console wrote:
C:\midas\NT\bin>.\consume
ID of event to request: 1
Host to connect:
Get all events (0/1): 1
Receive via callback ([y]/n):
[consume.c:73:process_event] Serial number mismatch: Ser: 1169666, OldSer: 0, ID
: 1, size: 10
Level:   0.0 %, Rate: 0.00 MB/sec, ser mismatches: 1
Level:   0.0 %, Rate: 0.42 MB/sec, ser mismatches: 1
Level:   0.0 %, Rate: 0.41 MB/sec, ser mismatches: 1
Level:   0.0 %, Rate: 0.41 MB/sec, ser mismatches: 1
Level:   0.0 %, Rate: 0.42 MB/sec, ser mismatches: 1
Level:   0.0 %, Rate: 0.41 MB/sec, ser mismatches: 1
Level:   0.0 %, Rate: 0.41 MB/sec, ser mismatches: 1
Level:   2.4 %, Rate: 0.35 MB/sec, ser mismatches: 1
Level:   0.0 %, Rate: 0.50 MB/sec, ser mismatches: 1
Level:   0.0 %, Rate: 0.41 MB/sec, ser mismatches: 1
Level:   0.0 %, Rate: 0.41 MB/sec, ser mismatches: 1
Level:   0.0 %, Rate: 0.41 MB/sec, ser mismatches: 1
Level:   0.0 %, Rate: 0.41 MB/sec, ser mismatches: 1
Level:   0.0 %, Rate: 0.41 MB/sec, ser mismatches: 1
Level:   0.0 %, Rate: 0.41 MB/sec, ser mismatches: 1
Level:   0.0 %, Rate: 0.41 MB/sec, ser mismatches: 1
Level:   0.0 %, Rate: 0.41 MB/sec, ser mismatches: 1
Level:   0.0 %, Rate: 0.40 MB/sec, ser mismatches: 1
Received break. Aborting...


Actually sending remote and receiving local is a very common thing. Most experiments use that. They have a remote frontend, and the logger and analyzer work locally. If that would not work, all these experiments would have a problem. So I only can encourage you to try again, make sure to update and recompile the executables. Maybe delete any old *.SHM file. Maybe try on another PC or under Linux.
Entry  30 Jan 2007, Stefan Ritt, Bug Report, Large files under Windows XP 
Hello,

We have problems analyzing large files under Windows XP. For small file sizes,
everything is ok. We have events of 2.8 MB each, and we can read ~30 events per
second. But if the file gets larger than typically 600-800 MB, then access
becomes very slow, about 1 event per second. This is not the case under Linux,
where it stays at 30 Hz (~90 MB/sec). 

Looking at the low level file access, it is obvious that this has nothing to do
with midas, this problem can be reproduced with a simple program reading chunks
of 3MB from a 1GB file. The Windows XP file system is NTFS, default formatting.
Does anyone else have observed a similar problem or maybe even have some
suggestions? Unfortunately many people here want to analyze midas data under
Windows...

Stefan Ritt
Entry  02 Feb 2007, Exaos Lee, Bug Report, Compiling failed with SVN3562 under Ubuntu 6.10 
The error log is as the following:
cc -c -g -O2 -Wall -Wuninitialized -Iinclude -Idrivers -I../mxml -Llinux/lib -DINCLUDE_FTPLIB   -D_LARGEFILE64_SOURCE -DHAVE_MYSQL -DHAVE_ROOT -pthread -I/opt/root/current/include -DOS_LINUX -fPIC -Wno-unused-function -o linux/lib/system.o src/system.c
src/system.c:958: error: expected declaration specifiers or ‘...’ before ‘gettid’
src/system.c:958: warning: data definition has no type or storage class
src/system.c:958: warning: type defaults to ‘int’ in declaration of ‘_syscall0’
src/system.c: In function ‘ss_gettid’:
src/system.c:1005: warning: implicit declaration of function ‘gettid’
src/system.c: In function ‘ss_suspend_init_ipc’:
src/system.c:2948: warning: pointer targets in passing argument 3 of ‘getsockname’ differ in signedness
src/system.c: In function ‘ss_suspend’:
src/system.c:3414: warning: pointer targets in passing argument 6 of ‘recvfrom’ differ in signedness
src/system.c:3441: warning: pointer targets in passing argument 6 of ‘recvfrom’ differ in signedness
make: *** [linux/lib/system.o] &#38169;&#35823; 1

The error might be here:
void ss_force_single_thread()
{
   _single_thread = TRUE;
}

#if defined(OS_DARWIN)
// blank
#elif defined(OS_LINUX)
_syscall0(pid_t,gettid);
#endif

INT ss_gettid(void)

I have no idea about the usage of _syscall0(...).
Entry  02 Feb 2007, Exaos Lee, Bug Report, Compiling failed with SVN3562 under Ubuntu 6.10 err.log
I tried to solve the problem by adding a ";". It was wrong. In fact, the macro "_syscall0(..)" doesn't need the ";".
I searched and found that somebody said "the overall _syscall$magicnumber will disappear". I don't mind whether the "_syscall" disappear or not. I just want to compile the code and do my job. I deleted the additional ";" and recompiled. The error output is as the attachment [elog:335/1].
Entry  05 Feb 2007, Fedor Ignatov, Bug Report, segmentation violation of analyzer on a x86_64 
Hello,

When I  connect to analyzer on a x86_64 processor(with Roody),  
a analyzer break with segmentation violation in the root_server_thread  function.
Same code are working fine on a 32bit processor.
As I found the problem are in exchanging of pointers between analyzer and client.
Before to send a pointer, it is saved a pointer in int (size=4, instead of 8) at
this place:
Index: src/mana.c
===================================================================
--- src/mana.c  (revision 3498)
+++ src/mana.c  (working copy)
@@ -5386,7 +5386,7 @@

             //write pointer
             message->Reset(kMESS_ANY);
-            int p = (POINTER_T) obj;
+            POINTER_T p = (POINTER_T) obj;
             *message << p;
             sock->Send(*message);


Sincerely Yours,
Fedor Ignatov 
Entry  05 Feb 2007, Konstantin Olchanski, Bug Report, wrong version in include/midas.h? 
The present .../include/midas.h contains
[alpha@laddvme06 ~/online]$ grep 1.9.5 /home/alpha/packages/midas/include/*
/home/alpha/packages/midas/include/midas.h:#define MIDAS_VERSION "1.9.5"

All MIDAS utilities (odbedit ver) presently report version 1.9.5, even for svn
trunk, and this may confuse people as to what version of midas they are using,
and may complicate reporting of bugs.

Perhaps the trunk version should say something like "svn-22233344" (the svn
revision number)? The present "1.9.5" is wrong...

K.O.
    Reply  06 Feb 2007, Stefan Ritt, Bug Report, wrong version in include/midas.h? 
> The present .../include/midas.h contains
> [alpha@laddvme06 ~/online]$ grep 1.9.5 /home/alpha/packages/midas/include/*
> /home/alpha/packages/midas/include/midas.h:#define MIDAS_VERSION "1.9.5"
> 
> All MIDAS utilities (odbedit ver) presently report version 1.9.5, even for svn
> trunk, and this may confuse people as to what version of midas they are using,
> and may complicate reporting of bugs.
> 
> Perhaps the trunk version should say something like "svn-22233344" (the svn
> revision number)? The present "1.9.5" is wrong...

Fully agree. I added a svn_revision string into midas.h, which gets reported now
by "odbedit ver". Unfortunately this reflects only changes in midas.c. If one
changes odb.c for example, the svn revision in midas.c does not get modified by
the SVN system. In addition I changed the present version 1.9.5 to 2.0.0. I made
the tar and zip files. After some internal testing, it will be announced
officially in a few days.
    Reply  06 Feb 2007, Stefan Ritt, Bug Report, segmentation violation of analyzer on a x86_64 
> Hello,
> 
> When I  connect to analyzer on a x86_64 processor(with Roody),  
> a analyzer break with segmentation violation in the root_server_thread  function.
> Same code are working fine on a 32bit processor.
> As I found the problem are in exchanging of pointers between analyzer and client.
> Before to send a pointer, it is saved a pointer in int (size=4, instead of 8) at
> this place:
> Index: src/mana.c
> ===================================================================
> --- src/mana.c  (revision 3498)
> +++ src/mana.c  (working copy)
> @@ -5386,7 +5386,7 @@
> 
>              //write pointer
>              message->Reset(kMESS_ANY);
> -            int p = (POINTER_T) obj;
> +            POINTER_T p = (POINTER_T) obj;
>              *message << p;
>              sock->Send(*message);
> 
> 
> Sincerely Yours,
> Fedor Ignatov 

Do I understand you right? With your patch it works even on 64 bit, right? Or do you
mean there is still a segmentation violation? Anyhow I committed your patch since the
"int" is clearly incorrect.

- Stefan
    Reply  06 Feb 2007, Fedor Ignatov, Bug Report, segmentation violation of analyzer on a x86_64 
Yes right, Problem of a segmentation violation is solved with this patch. Now it works
fine on x86_64.

Fedor 

> Do I understand you right? With your patch it works even on 64 bit, right? Or do you
> mean there is still a segmentation violation? Anyhow I committed your patch since the
> "int" is clearly incorrect.
> 
> - Stefan
    Reply  17 Feb 2007, Konstantin Olchanski, Bug Report, segmentation violation of analyzer on a x86_64 
> Yes right, Problem of a segmentation violation is solved with this patch. Now it works
> fine on x86_64.

Right. I confirm this. I have this exact same fix in my stand-alone copy of the midas
histogram server, and should commit it to MIDAS CVS as well.

K.O.
Entry  22 May 2007, Randolf Pohl, Bug Report, analyzer_init called by odb_load 
Hi,

I wonder why mana.c:odb_load() calls analyzer_init(). This way analyzer_init 
is called TWICE or more times:
first from mana.c:mana_init(), for each invocation of the analyzer, and 
second from mana.c:odb_load(), for each run to be analyzed

Isn't this a bug? It can mess up several things (like mallocs) if you don't 
take the necessary precautions. Other module_init functions are correctly 
called only once, before all runs are analyzed.

I have the feeling, that odb_load should NOT call analyzer_init. Or am I wrong 
(probably, but please explain to me)? Do I have to live with it and make sure 
that my beautiful global initialization in analyzer_init is only done once?
:-)

Cheers,

Randolf

And here is the annotated log using the ROOT example experiment 
(several modules changed/added to print their respective names)

:~/midas/examples/root> ./analyzer -e exa_root -i run%05d.mid -r 1 3
 
analyzer_init        <-- ok

Root server listening on port 9090...
adc_calib_init       <-- ok
adc_summing_init     <-- ok
scaler_init          <-- ok
Running analyzer offline. Stop with "!"
Set run number 1 in ODB
Load ODB from run 1...
analyzer_init        <-- not ok, or is it?

OK
run00001.mid:777  events, 0.00s
Set run number 2 in ODB
Load ODB from run 2...
analyzer_init        <-- not ok, or is it?

OK
run00002.mid:7227  events, 0.03s
Set run number 3 in ODB
Load ODB from run 3...
analyzer_init        <-- not ok, or is it?

OK
run00003.mid:13866  events, 0.06s
adc_calib_exit
adc_summing_exit
scaler_exit

analyzer_exit
    Reply  22 May 2007, Stefan Ritt, Bug Report, analyzer_init called by odb_load 
The reason to call analyzer_init in odb_load is the following:

Assume you run the analyzer offline, analyzing many files in series. Then assume
that you have /Experiment/Run Parameters, which is actively used by the analyzer
(like beam settings etc.). In this case you do a db_open_record() to map
/Experiment/Run Parameters to the exp_param C structure. For this mapping to work,
the ODB structure and the C structure have to be exactly the same. Now assume that
you changed your run parameters over time, like you added some comment later. Now
you want to analyzer several runs, some before and some after the modification.
Both sets have a different structure in /Experiment/Run Parameters, which is a
problem, since the compiled analyzer can only have a single C structure. My "poor"
solution was to call analyzer_init after each loading of the ODB from the *.mid
file. The db_create_record() call matches the C structure to the ODB structure by
modifying the ODB structure if necessary. So if you added one parameter later, this
(modified) structure gets loaded by odb_load, but then it gets adjusted in
analyzer_init().

I understand now that this case might not happen so often, and you are more
bothered by the fact that analyzer_init gets called several time. There must
however be a hook for offline analysis that the user code can correct the ODB
structure. So I propose to add a flag to analyzer_init, such as

INT analyzer_init(BOOL bFirst)
{
}

If bFirst equals TRUE, the function got called from mana_init(), if FALSE, it got
called from odb_load. Then you can put code like

INT analyzer_init(BOOL bFirst)
{
   if (bFirst) {
      p = malloc()
      ...
   }
}

If you agree, I will modify the code and commit the change.

- Stefan
    Reply  22 May 2007, Randolf Pohl, Bug Report, analyzer_init called by odb_load 
Thanks for the quick reply, Stefan.

Please don't change anything in the code unless you find it really important. I guess 
changing the analyzer_init prototype will break a lot of code out there?

In fact, I think I do understand this behavior now.
And even without your suggested fix there is a simple workaround: I add a static 
variable to my analyzer_init.cxx file, and do something similar to your bFirst fix.

In conclusion, commit your fix if it does not harm others. Postpone this commit to a 
future new version of midas which breaks a lot of things anyway...

A last question, for me to understand: Why not call db_open_record in 
ana_begin_of_run then?

Cheers,

Randolf
ELOG V3.1.4-2e1708b5