ID |
Date |
Author |
Topic |
Subject |
791
|
10 Jun 2012 |
Konstantin Olchanski | Bug Report | _net_send_buffer realloc | > In midas.c, ...
>
> 1) _net_send_buffer is not set to NULL when declared.
_net_send_buffer is a global variable. All global variables are automatically initialized to zero before the program
starts.
static char*x; // = NULL; is redundant
char*y=realloc(x, 100); // x is NULL, usage is correct
> 2) cm_disconect_experiment() calls free(_net_send_buffer) but does not set its
> value to NULL.
My copy of midas.c (svn rev 5256) sets _net_send_buffer to NULL:
if (_net_send_buffer_size > 0) {
M_FREE(_net_send_buffer);
_net_send_buffer_size = 0;
}
What version of midas do you have? (svn info .)
K.O. |
792
|
10 Jun 2012 |
Greg Christian | Bug Report | _net_send_buffer realloc | > > In midas.c, ...
> >
> > 1) _net_send_buffer is not set to NULL when declared.
>
> _net_send_buffer is a global variable. All global variables are automatically
initialized to zero before the program
> starts.
>
> static char*x; // = NULL; is redundant
> char*y=realloc(x, 100); // x is NULL, usage is correct
>
Ah,okay. I was not aware of this feature of global variables.
> > 2) cm_disconect_experiment() calls free(_net_send_buffer) but does not set
its
> > value to NULL.
>
> My copy of midas.c (svn rev 5256) sets _net_send_buffer to NULL:
>
> if (_net_send_buffer_size > 0) {
> M_FREE(_net_send_buffer);
> _net_send_buffer_size = 0;
> }
>
> What version of midas do you have? (svn info .)
>
> K.O.
I have version 5256 also (matches what you posted), but I only see
_net_send_buffer_size being set to 0, not _net_send_buffer itself. In midas.h,
M_FREE(x) only expands to free(x) if _MEM_DBG is not defined. |
793
|
11 Jun 2012 |
Konstantin Olchanski | Bug Report | _net_send_buffer realloc | > > > In midas.c, ...
> > >
> > > 1) _net_send_buffer is not set to NULL when declared.
>
> Ah,okay. I was not aware of this feature of global variables.
>
RTFM K&R "The C programming language".
http://en.wikipedia.org/wiki/The_C_Programming_Language
>
> > > 2) cm_disconect_experiment() calls free(_net_send_buffer) but does not set
> its value to NULL.
>
Confirmed. Sorry for confusion in my previous message. Set the pointer to NULL after free() is good practice.
But note that calling cm_connect and cm_disconnect multiple times is unusual use of MIDAS and you will most
likely find more breakage.
K.O. |
794
|
13 Jun 2012 |
Exaos Lee | Bug Report | Cannot start/stop run through mhttpd | Revision: r5286
Platform: Debian Linux 6.0.5 AMD64, with packages from squeeze-backports
Problem:
After building and installation, using the script 'start_daq.sh' to start
'sampleexpt'. Everything seems fine. But I cannot start a run through web. Using
'odbedit' and 'mtransition' to start/stop a run works fine. So, what may cause
such a problem? |
795
|
13 Jun 2012 |
Konstantin Olchanski | Bug Report | Cannot start/stop run through mhttpd | > Revision: r5286
> Platform: Debian Linux 6.0.5 AMD64, with packages from squeeze-backports
> Problem:
> After building and installation, using the script 'start_daq.sh' to start
> 'sampleexpt'. Everything seems fine. But I cannot start a run through web. Using
> 'odbedit' and 'mtransition' to start/stop a run works fine. So, what may cause
> such a problem?
Well, it's mhttpd who cannot start the run, not you. So what happens when you press
the "start run" button? Any errors in midas.log or in midas messages? Is mtransition
in your PATH?
K.O. |
797
|
13 Jun 2012 |
Exaos Lee | Bug Report | Cannot start/stop run through mhttpd | > Well, it's mhttpd who cannot start the run, not you. So what happens when you press
> the "start run" button? Any errors in midas.log or in midas messages? Is mtransition
> in your PATH?
After pressing "start run", there is a message displayed: "Run start requested". There
is no error in midas.log. And mtransition is actually in my PATH. I even looked into
"mhttpd.cxx" and found where "cm_transition" is called for starting a run. I have no
clue to grasp the reason. |
798
|
14 Jun 2012 |
Exaos Lee | Bug Report | Cannot start/stop run through mhttpd | > > Revision: r5286
> > Platform: Debian Linux 6.0.5 AMD64, with packages from squeeze-backports
> > Problem:
> > After building and installation, using the script 'start_daq.sh' to start
> > 'sampleexpt'. Everything seems fine. But I cannot start a run through web. Using
> > 'odbedit' and 'mtransition' to start/stop a run works fine. So, what may cause
> > such a problem?
>
> Well, it's mhttpd who cannot start the run, not you. So what happens when you press
> the "start run" button? Any errors in midas.log or in midas messages? Is mtransition
> in your PATH?
>
> K.O.
I found the problem only appears when I run mhttpd in scripts, whether bash or python.
And I'm quite sure that the MIDAS environments (e.g. PATH, MIDAS_EXPTAB, MIDASSYS, etc.)
are set in such scripts. If I start mhttpd in an xterm with or without "-D", it works
fine. So, what's the difference between invoking mhttpd directly and through a script? |
799
|
14 Jun 2012 |
Stefan Ritt | Bug Report | Cannot start/stop run through mhttpd | > I found the problem only appears when I run mhttpd in scripts, whether bash or python.
> And I'm quite sure that the MIDAS environments (e.g. PATH, MIDAS_EXPTAB, MIDASSYS, etc.)
> are set in such scripts. If I start mhttpd in an xterm with or without "-D", it works
> fine. So, what's the difference between invoking mhttpd directly and through a script?
When you start it with "-D", then mhttpd become a daemon. According to linux rules, it has to "cd /", so it lives in the
root directory, in order not to block any NFS mount/unmount. If something with the path is not correct then, mhttpd
cannot find mtransition then. Once I fixed that problem my moving mtransition to /usr/bin.
Stefan |
800
|
14 Jun 2012 |
Konstantin Olchanski | Bug Report | Cannot start/stop run through mhttpd | > > I found the problem only appears when I run mhttpd in scripts, whether bash or python.
> > And I'm quite sure that the MIDAS environments (e.g. PATH, MIDAS_EXPTAB, MIDASSYS, etc.)
> > are set in such scripts. If I start mhttpd in an xterm with or without "-D", it works
> > fine. So, what's the difference between invoking mhttpd directly and through a script?
>
> When you start it with "-D", then mhttpd become a daemon. According to linux rules, it has to "cd /", so it lives in the
> root directory, in order not to block any NFS mount/unmount. If something with the path is not correct then, mhttpd
> cannot find mtransition then. Once I fixed that problem my moving mtransition to /usr/bin.
>
I agree. Somehow mhttpd cannot run mtransition. I am not super happy with this dependance on user $PATH settings and the inability to capture error messages
from attempts to start mtransition. I am now thinking in the direction of running mtransition code by forking. But remember that mlogger and the event builder also
have to use mtransition to stop runs (otherwise they can dead-lock). So an mhttpd-only solution is not good enough...
K.O. |
801
|
14 Jun 2012 |
Konstantin Olchanski | Bug Report | Cannot start/stop run through mhttpd | > > > Revision: r5286
> > > Platform: Debian Linux 6.0.5 AMD64, with packages from squeeze-backports
>
> I found the problem only appears when I run mhttpd in scripts, whether bash or python.
> And I'm quite sure that the MIDAS environments (e.g. PATH, MIDAS_EXPTAB, MIDASSYS, etc.)
> are set in such scripts. If I start mhttpd in an xterm with or without "-D", it works
> fine.
Right. I see Debian 6.0.5 just came out hot off the presses. Would be good to fix this problem.
As a work around, can you run mhttpd without "-D", but in the background, i.e. "mhttpd -p xxx >& mhttpd.log &"?
Also what are your $PATH settings?
> So, what's the difference between invoking mhttpd directly and through a script?
As Stefan mentioned, "-D" invokes some nasty unix magic to disconnect the process from the user login session. It is
possible that this magic breaks in the latest Debian.
MIDAS "-D" does roughly the same thing as "nohup".
K.O. |
802
|
15 Jun 2012 |
Konstantin Olchanski | Bug Report | bk_delete uses memcpy instead of memmove | > In midas.c, the bk_delete function removes a bank by decrementing the total
> event size and then copying the remaining banks into the location of the first
> using memcpy from string.h.
Replaced some memcpy() with memmove(), including bk_delete().
svn rev 5293
K.O. |
803
|
15 Jun 2012 |
Konstantin Olchanski | Bug Report | _net_send_buffer realloc | > 2) cm_disconect_experiment() calls free(_net_send_buffer) but does not set its
> value to NULL.
Set pointer to NULL after free() in these files:
M odb.c
M sequencer.cxx
M mlogger.cxx
M mhttpd.cxx
M midas.c
svn rev 5294
K.O. |
808
|
21 Jun 2012 |
Stefan Ritt | Bug Report | Cannot start/stop run through mhttpd | > I agree. Somehow mhttpd cannot run mtransition. I am not super happy with this dependance on user $PATH settings and the inability to capture error messages
> from attempts to start mtransition. I am now thinking in the direction of running mtransition code by forking. But remember that mlogger and the event builder also
> have to use mtransition to stop runs (otherwise they can dead-lock). So an mhttpd-only solution is not good enough...
The way to go is to make cm_transition multi-threaded. Like on thread for each client to be contacted. This way the transition can go in parallel when there are many frontend computers for example, which will speed up
transitions significantly. In addition, cm_transition should execute a callback whenever a client succeeded or failed, so to give immediate feedback to the user. I think of something like implementing WebSockets in mhttpd for that (http://en.wikipedia.org/wiki/WebSocket).
I have this in mind since many years, but did not have time to implement it yet. Maybe on my next visit to TRIUMF?
Stefan |
819
|
04 Jul 2012 |
Konstantin Olchanski | Bug Report | Crash after recursive use of rpc_execute() | I am looking at a MIDAS kaboom when running out of space on the data disk - everything was freezing
up, even the VME frontend crashed sometimes.
The freeze was traced to ROOT use in mlogger - it turns out that ROOT intercepts many signal handlers,
including SIGSEGV - but instead of crashing the program as God intended, ROOT SEGV handler just hangs,
and the rest of MIDAS hangs with it. One solution is to always build mlogger without ROOT support -
does anybody use this feature anymore? Or reset the signal handlers back to the default setting somehow.
Freeze fixed, now I see a crash (seg fault) inside mlogger, in the newly introduced memmove() function
inside the MIDAS RPC code rpc_execute(). memmove() replaced memcpy() in the same place and I am
surprised we did not see this crash with memcpy().
The crash is caused by crazy arguments passed to memmove() - looks like corrupted RPC arguments
data.
Then I realized that I see a recursive call to rpc_execute(): rpc_execute() calls tr_stop() calls cm_yield() calls
ss_suspend() calls rpc_execute(). The second rpc_execute successfully completes, but leave corrupted
data for the original rpc_execute(), which happily crashes. At the moment of the crash, recursive call to
rpc_execute() is no longer visible.
Note that rpc_execute() cannot be called recursively - it is not re-entrant as it uses a global buffer for RPC
argument processing. (global tls_buffer structure).
Here is the mlogger stack trace:
#0 0x00000032a8032885 in raise () from /lib64/libc.so.6
#1 0x00000032a8034065 in abort () from /lib64/libc.so.6
#2 0x00000032a802b9fe in __assert_fail_base () from /lib64/libc.so.6
#3 0x00000032a802bac0 in __assert_fail () from /lib64/libc.so.6
#4 0x000000000041d3e6 in rpc_execute (sock=14, buffer=0x7ffff73fc010 "\340.", convert_flags=0) at
src/midas.c:11478
#5 0x0000000000429e41 in rpc_server_receive (idx=1, sock=<value optimized out>, check=<value
optimized out>) at src/midas.c:12955
#6 0x0000000000433fcd in ss_suspend (millisec=0, msg=0) at src/system.c:3927
#7 0x0000000000429b12 in cm_yield (millisec=100) at src/midas.c:4268
#8 0x00000000004137c0 in close_channels (run_number=118, p_tape_flag=0x7fffffffcd34) at
src/mlogger.cxx:3705
#9 0x000000000041390e in tr_stop (run_number=118, error=<value optimized out>) at
src/mlogger.cxx:4148
#10 0x000000000041cd42 in rpc_execute (sock=12, buffer=0x7ffff73fc010 "\340.", convert_flags=0) at
src/midas.c:11626
#11 0x0000000000429e41 in rpc_server_receive (idx=0, sock=<value optimized out>, check=<value
optimized out>) at src/midas.c:12955
#12 0x0000000000433fcd in ss_suspend (millisec=0, msg=0) at src/system.c:3927
#13 0x0000000000429b12 in cm_yield (millisec=1000) at src/midas.c:4268
#14 0x0000000000416c50 in main (argc=<value optimized out>, argv=<value optimized out>) at
src/mlogger.cxx:4431
K.O. |
820
|
04 Jul 2012 |
Konstantin Olchanski | Bug Report | Crash after recursive use of rpc_execute() | > ... I see a recursive call to rpc_execute(): rpc_execute() calls tr_stop() calls cm_yield() calls
> ss_suspend() calls rpc_execute()
> ... rpc_execute() cannot be called recursively - it is not re-entrant as it uses a global buffer
It turns out that rpc_server_receive() also need protection against recursive calls - it also uses
a global buffer to receive network data.
My solution is to protect rpc_server_receive() against recursive calls by detecting recursion and returning SS_SUCCESS (to ss_suspend()).
I was worried that this would cause a tight loop inside ss_suspend() but in practice, it looks like ss_suspend() tries to call
us about once per second. I am happy with this solution. Here is the diff:
@@ -12813,7 +12815,7 @@
/********************************************************************/
-INT rpc_server_receive(INT idx, int sock, BOOL check)
+INT rpc_server_receive1(INT idx, int sock, BOOL check)
/********************************************************************\
Routine: rpc_server_receive
@@ -13047,7 +13049,28 @@
return status;
}
+/********************************************************************/
+INT rpc_server_receive(INT idx, int sock, BOOL check)
+{
+ static int level = 0;
+ int status;
+ // Provide protection against recursive calls to rpc_server_receive() and rpc_execute()
+ // via rpc_execute() calls tr_stop() calls cm_yield() calls ss_suspend() calls rpc_execute()
+
+ if (level != 0) {
+ //printf("*** enter rpc_server_receive level %d, idx %d sock %d %d -- protection against recursive use!\n", level, idx, sock, check);
+ return SS_SUCCESS;
+ }
+
+ level++;
+ //printf(">>> enter rpc_server_receive level %d, idx %d sock %d %d\n", level, idx, sock, check);
+ status = rpc_server_receive1(idx, sock, check);
+ //printf("<<< exit rpc_server_receive level %d, idx %d sock %d %d, status %d\n", level, idx, sock, check, status);
+ level--;
+ return status;
+}
+
/********************************************************************/
INT rpc_server_shutdown(void)
/********************************************************************\
ladd02:trinat~/packages/midas>svn info src/midas.c
Path: src/midas.c
Name: midas.c
URL: svn+ssh://svn@savannah.psi.ch/repos/meg/midas/trunk/src/midas.c
Repository Root: svn+ssh://svn@savannah.psi.ch/repos/meg/midas
Repository UUID: 050218f5-8902-0410-8d0e-8a15d521e4f2
Revision: 5297
Node Kind: file
Schedule: normal
Last Changed Author: olchanski
Last Changed Rev: 5294
Last Changed Date: 2012-06-15 10:45:35 -0700 (Fri, 15 Jun 2012)
Text Last Updated: 2012-06-29 17:05:14 -0700 (Fri, 29 Jun 2012)
Checksum: 8d7907bd60723e401a3fceba7cd2ba29
K.O. |
821
|
13 Jul 2012 |
Stefan Ritt | Bug Report | Crash after recursive use of rpc_execute() | > Then I realized that I see a recursive call to rpc_execute(): rpc_execute() calls tr_stop() calls cm_yield() calls
> ss_suspend() calls rpc_execute(). The second rpc_execute successfully completes, but leave corrupted
> data for the original rpc_execute(), which happily crashes. At the moment of the crash, recursive call to
> rpc_execute() is no longer visible.
This is really strange. I did not protect rpc_execute against recursive calls since this should not happen. rpc_server_receive() is linked to rpc_call() on the client side. So there cannot be
several rpc_call() since there I do the recursive checking (also multi-thread checking) via a mutex. See line 10142 in midas.c. So there CANNOT be recursive calls to rpc_execute() because
there cannot be recursive calls to rpc_server_receive(). But apparently there are, according to your stack trace.
So even if your patch works fine, I would like to know where the recursive calls to rpc_server_receive() come from. Since we have one subproces of mserver for each client, there should only
be one client connected to each mserver process, and the client is protected via the mutex in rpc_call(). Can you please debug this? I would like to understand what is going on there. Maybe
there is a deeper underlying problem, which we better solve, otherwise it might fall back on use in the future.
For debugging, you have to see what commands rpc_call() send and what rpc_server_receive() gets, maybe by writing this into a common file together with a time stamp.
SR |
827
|
16 Aug 2012 |
Cheng-Ju Lin | Bug Report | launching roody kills the analyzer | Hi All,
I've installed midas (Rev:5294) on SLC6.3 (64bit), along with recent trunk versions of rootana and roody.
All the packages compiled OK. The example code in $MIDASSYS/examples/experiment also runs OK
provided that I don't launch roody. If I try to launch roody, then it immediately crashes the analyzer with
the following trace:
#6 root_server_thread (arg=ox7f54fc001150) at src/mana.c:5154
#7 0x0000003219a1e13a in TThread::Function(void*) () from /usr/lib64/root/libThread.so.5.28
#8 0x0000003dd1207851 in start_thread () from /lib64/libpthread.so.0
#9 0x0000003dd0ee76dd in clone () from /lib64/libc.so.6
The line src/mana.c:5154 points to the following:
TObject *obj;
if (strncmp(request + 10, "Any", 3) == 0)
obj = folder->FindObjectAny(request + 14);
else
obj = folder->FindObject(request + 11); // LINE 5154
Any suggestions on what may be going on here? Thanks.
Cheng-Ju |
829
|
17 Aug 2012 |
Konstantin Olchanski | Bug Report | launching roody kills the analyzer | > I've installed midas (Rev:5294) on SLC6.3 (64bit), along with recent trunk versions of rootana and roody.
>
> #6 root_server_thread (arg=ox7f54fc001150) at src/mana.c:5154
You are connecting to mana, the old midas analyzer. The code for connecting to it is still present in roody,
but I cannot support the matching server code in mana.c - it is 2 revolutions behind the current state of
the ROOT object server (look in ROOTANA - the NetDirectory stuff and the latest is the XmlServer stuff).
I can offer 2 solutions - switch from mana.c to a ROOTANA based analyzer or graft the XmlServer code
into your analyzer (it is very simple - you need to create an XmlServer object and tell it which ROOT
containers you want to make visible to ROODY).
I guess you can also debug the old midas server code inside mana.c...
K.O. |
830
|
17 Aug 2012 |
Cheng-Ju Lin | Bug Report | launching roody kills the analyzer | Hi Konstantin,
Many thanks for your feedback. I was able to keep the analyzer from exiting when launching roody by making some changes in the roody code.
This at least allows me to keep moving forward. I will look into your suggestion of converting to ROOTANA based analyzer as well.
Regards,
Cheng-Ju
> > I've installed midas (Rev:5294) on SLC6.3 (64bit), along with recent trunk versions of rootana and roody.
> >
> > #6 root_server_thread (arg=ox7f54fc001150) at src/mana.c:5154
>
> You are connecting to mana, the old midas analyzer. The code for connecting to it is still present in roody,
> but I cannot support the matching server code in mana.c - it is 2 revolutions behind the current state of
> the ROOT object server (look in ROOTANA - the NetDirectory stuff and the latest is the XmlServer stuff).
>
> I can offer 2 solutions - switch from mana.c to a ROOTANA based analyzer or graft the XmlServer code
> into your analyzer (it is very simple - you need to create an XmlServer object and tell it which ROOT
> containers you want to make visible to ROODY).
>
> I guess you can also debug the old midas server code inside mana.c...
>
> K.O. |
834
|
06 Sep 2012 |
shaun | Bug Report | "cannot find recent history file" | Hi, when attempting to access a history window the following message is repeated
over and over in the MIDAS message log:
Thu Sep 6 11:37:16 2012 [mhttpd,ERROR] [history.c:886:hs_count_events,ERROR]
cannot find recent history file
Thu Sep 6 11:38:16 2012 [mhttpd,ERROR] [history.c:886:hs_count_events,ERROR]
cannot find recent history file
Thu Sep 6 11:38:16 2012 [mhttpd,ERROR] [history.c:886:hs_count_events,ERROR]
cannot find recent history file
Thu Sep 6 11:39:16 2012 [mhttpd,ERROR] [history.c:886:hs_count_events,ERROR]
cannot find recent history file
Thu Sep 6 11:39:16 2012 [mhttpd,ERROR] [history.c:886:hs_count_events,ERROR]
cannot find recent history file
It appears to be related to attempting to display a history graph that includes
some time periods that have no recorded history data. When I zoom in so that the
whole graph has data the error message goes away.
The graph displays fine either way, so this error message seems useless. Is
there a way to suppress it?
Thanks
Shaun |
|