Back Midas Rome Roody Rootana
  Midas DAQ System, Page 133 of 146  Not logged in ELOG logo
ID Date Author Topicdown Subject
  572   07 May 2009 Konstantin OlchanskiBug Fixmhttpd "Names" length
mhttpd did not like it when the equipment "Names" arrays had different length compared to the 
corresponding "Variables" arrays. These limitations are now removed.
svn rev 4469
K.O.
  573   07 May 2009 Konstantin OlchanskiBug FixFixed mlogger run start and stop
Fixed problems with mlogger starting and stopping runs.

Basic difficulty was with the mlogger using ASYNC transitions, which did not implement proper 
transition sequencing according to transition sequence numbers. Basically all clients were called at the 
same time, regardless of how long they took to process the transitions.

Switching from ASYNC to SYNC transitions introduces a deadlock between mlogger (not reading data 
from SYSTEM buffer while inside cm_transition) and any program trying to write into the SYSTEM buffer 
(buffer is full, does not listen for transition requests while waiting for mlogger which tries to call it's 
transition handler).

Then we invented the mtransition helper program. In the original implemtation for t2k it was spawned 
directly from the mlogger to stop the run (avoiding the deadlock). Then cm_transition(DETACHED) was 
introduced, but the mlogger start/stop/restart run logic became broken. One problem was with when 
auto restart delay is zero, mtransition tries to restart the run before previous run is stopped (instead, 
mlogger should restart the run from it's tr_stop() handler). Another problem was with the auto restart 
delay counting from the time when we start stopping the run - because stopping the run can take an 
unpredictable time, depending on when various frontends have to do - it is impossible to have a 
predictable delay between runs (again this is fixed by restarting the run from mlogger.c::tr_stop()).

All this has been straightened out by svn revision 4484. Basically the old run stop/restart logic was 
restored in mlogger.c, using cm_transition(DETACH) to avoid the deadlocks.

To remind all, these are the present controls for transitions initiated by mlogger:

/experiment/transition debug flag - set to "2" to capture transition sequences into midas.log
/experiment/transition timeout and transition connect timeout - one can change default timeouts as 
needed to accommodate non cooperative frontends.
/logger/async transitions - do not use mtransition - do ASYNC transitions, as before.
/logger/auto restart delay - delay between stopping the run (mlogger.c::tr_stop) and starting the next 
run.

svn rev 4484
K.O.
  587   03 Jun 2009 Konstantin OlchanskiBug FixFix db_open_record() error return
The odb hot-link function db_open_record() did not return an error when the system limit for hotlinks is 
exceeded and no more hot links could be added (silent failure). This is now fixed.
odb.c svn rev 4500
K.O.
  637   06 Sep 2009 Exaos LeeBug FixMaybe a fix
Changing "SQLINTEGER" to "SQLLEN" maybe let the compiling pass. See the attached diff.

But I failed in another error. It was the problem in CMakeLists.txt. (FIXED)
Attachment 1: history_odbc.cxx.diff
diff --git a/src/history_odbc.cxx b/src/history_odbc.cxx
index 5f00016..392062f 100644
--- a/src/history_odbc.cxx
+++ b/src/history_odbc.cxx
@@ -584,7 +584,7 @@ int SqlODBC::Exec(const char* sql)
 
 int SqlODBC::GetNumRows()
 {
-   SQLINTEGER nrows = 0;
+   SQLLEN nrows = 0;
    /* How many rows are there */
    int status = SQLRowCount(fStmt, &nrows);
    if (!SQL_SUCCEEDED(status)) {
@@ -634,7 +634,7 @@ int SqlODBC::Done()
 const char* SqlODBC::GetColumn(int icol)
 {
   static char buf[1024];
-  SQLINTEGER indicator;
+  SQLLEN indicator;
   int status = SQLGetData(fStmt, icol, SQL_C_CHAR, buf, sizeof(buf), &indicator);
 
   if (!SQL_SUCCEEDED(status)) {
  667   02 Nov 2009 Exaos LeeBug FixBuild error due to missing header
I encountered a build error as "sort undefined...". It is caused by missing C++ header <algorithm> in which "sort" is defined. It can be fixed as the attachment.

Environment:
G++: 4.3.4
Platform: Debian Linux testing

> I committed an updated lazylogger with updated documentation. The new version supports subruns and
> can save to external storage arbitrary files (i.e. odb dump files). It also moves most book keeping out of
> odb to permit handling more files on bigger storage disks.
>
> Example lazylogger scripts for castor (CERN) and dcache (TRIUMF) are in the directory "utils".
>
> The lazylogger documentation was updated to remove obsolete information and to describe the new
> functions. As usual "make dox; cd doxfiles/html; firefox index.html" or see my copy at:
>
> http://ladd00.triumf.ca/~olchansk/midas/Utilities.html#lazylogger_task
>
> svn rev 4615, 4616.
> K.O.
Attachment 1: lazylogger.diff
diff --git a/src/lazylogger.c b/src/lazylogger.c
index 34be17a..a7ba4a3 100644
--- a/src/lazylogger.c
+++ b/src/lazylogger.c
@@ -17,6 +17,7 @@ $Id$
 
 #include <vector>
 #include <string>
+#include <algorithm>
 
 #define NOTHING_TODO  0
 #define FORCE_EXIT    1
  671   20 Nov 2009 Konstantin OlchanskiBug Fixfix odb corruption from too long client names
odb.c rev 4622 fixes ODB corruption by db_connect_database() if client_name is
too long. Also fixed is potential ODB corruption by too long key names in
db_create_key(). Problem kindly reported by Tim Nichols of T2K/ND280 experiment.
K.O.
  672   20 Nov 2009 Konstantin OlchanskiBug Fixdisallow client names with slash '/' characters
> odb.c rev 4622 fixes ODB corruption by db_connect_database() if client_name is
> too long. Also fixed is potential ODB corruption by too long key names in
> db_create_key(). Problem kindly reported by Tim Nichols of T2K/ND280 experiment.


Related bug fix - db_connect_database() should not permit client names that contain
the slash (/) character. Names like "aaa/bbb" create entries /Programs/aaa/bbb (aaa
is a subdirectory) and names like "../aaa" create entries in the ODB root directory.

svn rev 4623.
K.O.
  675   25 Nov 2009 Konstantin OlchanskiBug Fixsubrun file size
Please be aware of mlogger.c update rev 4566 on Sept 23rd 2009, when Stefan
fixed a buglet in the subrun file size computations. Before this fix, the first
subrun could be of a short length. If you use subruns, please update your
mlogger to at least rev 4566 (or newer, Stefan added the run and subrun time
limits just recently).
K.O.
  677   26 Nov 2009 Konstantin OlchanskiBug Fixmdump max number of banks and dump of 32-bit banks
By request from Renee, I increased the MIDAS BANKLIST_MAX from 64 to 1024 and
after fixing a few buglets where YB_BANKLIST_MAX is used instead of (now bigger)
BANKLIST_MAX, I can do a full dump of ND280 FGD events (96 banks).

I also noticed that "mdump -b BANK" did not work, it turns out that it could not
handle 32bit-banks at all. This is now fixed, too.

svn rev 4624
K.O.
  679   26 Nov 2009 Konstantin OlchanskiBug Fixmserver network routing fix
mserver update svn rev 4625 fixes an anomaly in the MIDAS RPC network code where
in some network configurations MIDAS mserver connections work, but some RPC
transactions, such as starting and stopping runs, do not (use the wrong network
names or are routed over the wrong network).

The problem is a possible discrepancy between network addresses used to
establish the mserver connection and the value of "/System/Clients/xxx/Host"
which is ultimately set to the value of "hostname" of the remote client. This
ODB setting is then used to establish additional network connections, for
example to start or stop runs.

Use the client "hostname" setting works well for standard configurations, when
there is only one network interface in the machine, with only one IP address,
and with "hostname" set to the value that this IP address resolves to using DNS.

However, if there are private networks, multiple network interfaces, or multiple
network routes between machines, "/System/Clients/xxx/Host" may become set to an
undesirable value resulting in asymmetrical network routing or complete failure
to establish RPC connections.

Svn rev 4625 updates mserver.c to automatically set "/System/clients/xxx/Host"
to the same network name as was used to establish the original mserver connection.

As always with networking, any fix always breaks something somewhere for
somebody, in which case the old behavior can be restored by "setenv
MIDAS_MSERVER_DO_NOT_USE_CALLBACK_ADDR 1" before starting mserver.

The specific problem fixed by this change is when the MIDAS client and server
are on machines connected by 2 separate networks ("client.triumf.ca" and
"client.daq"; "server.triumf.ca" and "server.daq"). The ".triumf.ca" network
carries the normal SSH, NFS, etc traffic, and the ".daq" network carries MIDAS
data traffic.

The client would use the "server.daq" name to connect to the server and this
traffic would go over the data network (good).

However, previously, the client "/System/Clients/xxx/Host" would be set to
"client.triumf.ca" and any reverse connections (i.e. RPC to start/stop runs)
would go over the normal ".triumf.ca" network (bad).

With this modification, mserver will set "/System/Clients/xxx/Host" to
"client.daq" (the IP address of the interface on the ".daq" network) and all
reverse connections would also go over the ".daq" network (good).

P.S. This modification definitely works only for the default "mserver -m" mode,
but I do not think this is a problem as using "-s" and "-t" modes is not
recommended, and the "-s" mode is definitely broken (see my previous message).

svn rev 4625
K.O.
  744   15 Feb 2011 Konstantin OlchanskiBug Fixmlogger stop run on disk full!
The mlogger has a function for detecting when the output disk becomes full - when this condition is 
detected, the run should be stopped. But this did not work if disk is already full and the user tries to start 
a run - the "disk full?" check happened too early and the attempt to stop the run was not succeeding 
because the original start-run transition is still running. Now if "disk full" condition is detected, mlogger 
tries to stop the run every 10 seconds until the run is finally stopped (or dies because disk is full).

mlogger.c svn rev 4976
K.O.
  775   10 Jul 2011 Konstantin OlchanskiBug Fixmidas shared memory changes
> > 2) the shared memory type used by an experiment is recorded in the file .SHM_TYPE.TXT.
> > 3) the hostname of the computer where the ODB shared memory is meant to reside is now
> > recorded in the file .SHM_HOST.TXT.

Due to a typo in src/system.c svn rev 5125, ss_shm_delete() did not work at all. This broke "odbedit -R", "odbedit -s 5000000" (to change ODB size), etc. 
Fixed in src/system.c svn rev 5134. (It is safe to update just tis one file to fix this problem).

Sorry for the inconvenience,
K.O.
  776   11 Jul 2011 Konstantin OlchanskiBug Fixmidas shared memory changes
> > > 2) the shared memory type used by an experiment is recorded in the file .SHM_TYPE.TXT.
> > > 3) the hostname of the computer where the ODB shared memory is meant to reside is now
> > > recorded in the file .SHM_HOST.TXT.


Because the mserver did not setup correct experiment name and path, POSIX shared memory did not work at all when used with the mserver. Fixed in mserver.c rev 5135


Sorry for the inconvenience,
K.O.
  828   16 Aug 2012 Cheng-Ju LinBug Fixlaunching roody kills the analyzer
OK, I've found the solution in the roody forum.  The solution for 64bit machine is to replace
   uint32_t p =0;
   with
   uintptr_t p =0;

in the roody header file roody/include/DataSourceTNetFolder.h

Cheng-Ju



> Hi All,
> 
> I've installed midas (Rev:5294) on SLC6.3 (64bit), along with recent trunk versions of rootana and roody. 
> All the packages compiled OK. The example code in $MIDASSYS/examples/experiment also runs OK 
> provided that I don't launch roody. If I try to launch roody, then it immediately crashes the analyzer with 
> the following trace:
> 
> #6 root_server_thread (arg=ox7f54fc001150) at src/mana.c:5154
> #7 0x0000003219a1e13a in TThread::Function(void*) () from /usr/lib64/root/libThread.so.5.28
> #8 0x0000003dd1207851 in start_thread () from /lib64/libpthread.so.0
> #9 0x0000003dd0ee76dd in clone () from /lib64/libc.so.6
> 
> The line src/mana.c:5154 points to the following:
> 
> TObject *obj;
>             if (strncmp(request + 10, "Any", 3) == 0)
>                obj = folder->FindObjectAny(request + 14);
>             else
>                obj = folder->FindObject(request + 11);    // LINE 5154
> 
> 
> Any suggestions on what may be going on here?  Thanks.
> 
> 
> Cheng-Ju
  838   27 Sep 2012 Randolf PohlBug Fix[PATCH] mana.c compile fix, gz files
Hi,

I had to apply the attached patch to convince SuSE Linux 12.2 to compile mana.c
gcc version is "(SUSE Linux) 4.6.2"

Problem is that gz{write,close, etc.} expect a 1st argument of type gzFile (see
zlib.h), whereas out_file is FILE*. In fact, out_file is a cast to FILE*, even
in the case when we work on a gzfile (HAVE_ZLIB).

Could you please confirm that the patch is correct, and possibly apply it to trunk?

I haven't checked if mana works as advertised now.

Cheers,


Randolf
Attachment 1: diff.mana
Index: src/mana.c
===================================================================
--- src/mana.c	(revision 5334)
+++ src/mana.c	(working copy)
@@ -1987,7 +1987,7 @@
       } else {
 #ifdef HAVE_ZLIB
          if (out_gzip)
-            gzclose(out_file);
+            gzclose((gzFile)out_file);
          else
 #endif
             fclose(out_file);
@@ -2311,7 +2311,7 @@
    /* write record to device */
 #ifdef HAVE_ZLIB
    if (out_gzip)
-      status = gzwrite(file, buffer, size) == size ? SS_SUCCESS : SS_FILE_ERROR;
+      status = gzwrite((gzFile)file, buffer, size) == size ? SS_SUCCESS : SS_FILE_ERROR;
    else
 #endif
       status =
@@ -2430,7 +2430,7 @@
    /* write record to device */
 #ifdef HAVE_ZLIB
    if (out_gzip)
-      status = gzwrite(file, pevent_copy, size) == size ? SUCCESS : SS_FILE_ERROR;
+      status = gzwrite((gzFile)file, pevent_copy, size) == size ? SUCCESS : SS_FILE_ERROR;
    else
 #endif
       status =
@@ -4119,7 +4119,7 @@
             size = pevent->data_size + sizeof(EVENT_HEADER);
 #ifdef HAVE_ZLIB
             if (out_gzip)
-               status = gzwrite(out_file, pevent, size) == size ? SUCCESS : SS_FILE_ERROR;
+               status = gzwrite((gzFile)out_file, pevent, size) == size ? SUCCESS : SS_FILE_ERROR;
             else
 #endif
                status =
@@ -4390,7 +4390,7 @@
       } else {
 #ifdef HAVE_ZLIB
          if (out_gzip)
-            gzclose(out_file);
+            gzclose((gzFile)out_file);
          else
 #endif
             fclose(out_file);
  839   09 Oct 2012 Stefan RittBug Fix[PATCH] mana.c compile fix, gz files
> Hi,
> 
> I had to apply the attached patch to convince SuSE Linux 12.2 to compile mana.c
> gcc version is "(SUSE Linux) 4.6.2"
> 
> Problem is that gz{write,close, etc.} expect a 1st argument of type gzFile (see
> zlib.h), whereas out_file is FILE*. In fact, out_file is a cast to FILE*, even
> in the case when we work on a gzfile (HAVE_ZLIB).
> 
> Could you please confirm that the patch is correct, and possibly apply it to trunk?
> 
> I haven't checked if mana works as advertised now.
> 
> Cheers,
> 
> 
> Randolf

I applied your patch to the trunk.

Best,
Stefan
  885   10 May 2013 Konstantin OlchanskiBug FixFixed: crash if alarm "write elog message" is enabled
If the MIDAS Alarm property "write elog message" is enabled, an uninitialized variable "tag" is passed to 
el_submit() and depending on your luck, cause a crash. "tag" is supposed to be and is now a NUL-
terminated string. The only other use of el_submit() is in mhttpd.cxx and mserver.c, where it is called 
correctly.

alarm.c svn rev 5361
K.O.
  897   02 Aug 2013 Konstantin OlchanskiBug Fixmultithreaded run transitions work!
As of commit
https://bitbucket.org/tmidas/midas/commits/dfa5fb1a93cae11a2960d441044c7fd277e1f0ec
(we are now liberated from the tyranny of SVN IDs),
multithreaded run transitions seem to work reliably and are now the default in mhttpd.

In odbedit and mtransition, the default is the old sequential transitions. "-m" and "-a" flags activate 
the new multithread run transitions. mhttpd now uses the equivalent of "mtransition -a" 
(mutithreaded asynchronous).

This is one of the new features implemented by Stefan while at TRIUMF.

K.O.

(We hope to write up all the recent changes soon).
  900   26 Aug 2013 Konstantin OlchanskiBug FixEnable cross-site requests in mhttpd
Javascript "AJAX" functions (and their MIDAS wrappers - ODBGet/ODBSet) are subject to something called 
"same origin policy" intended to prevent something called "cross-site scripting attacks", i.e. see
http://en.wikipedia.org/wiki/Same-origin_policy

In practice it means that if you load the MIDAS custom web page from test.foo.com and try to access 
mhttpd at midas.foo.com, ODBSet/ODBGet will not work.

I always thought that this meant that the requests are blocked by the browser and are a form of 
protection of the web server - only scripts loaded from mhttpd can do AJAX (ODBGet/ODBSet) to mhttpd.

It turns out that I was wrong. This is what actually happens: the "cross-site" requests are still sent to the 
server (mhttpd), the response it received, parsed and discarded if "same origin" conditions are not met.

This means that the "same origin" policy does not protect mhttpd at all - any script from any page 
anywhere can issue AJAX requests into any mhttpd, these requests will be successfully sent, received
and processed by mhttpd, including requests for writing into ODB ("jset" command using the HTTP GET 
method).

So for the case of MIDAS, "same origin" does not prevent malicious (or buggy) scripts from writing into the 
wrong mhttpd of the wrong experiment.

All it does is prevent desired and intentional access to mhttpd (ODBGet) from scripts that happen to have 
been loaded outside of mhttpd (i.e. from a developer own test page).

Then it turns out that there is an "official" way to disable this unwanted protection policy, called CORS, see
http://www.w3.org/TR/cors/

I have now implemented this in mhttpd and added an mhttpd.js function ODBSetURL() to explicitly set the 
URL of mhttpd that we want to talk to.

This work is on the feature/ajax branch, to be merged soon. For the impatient, here is what you need to 
do in mhttpd:

diff --git a/src/mhttpd.cxx b/src/mhttpd.cxx
index 1d9d1cc..0460cec 100755
--- a/src/mhttpd.cxx
+++ b/src/mhttpd.cxx
@@ -1070,6 +1070,7 @@ void show_text_header()
 {
    rsprintf("HTTP/1.0 200 Document follows\r\n");
    rsprintf("Server: MIDAS HTTP %d\r\n", mhttpd_revision());
+   rsprintf("Access-Control-Allow-Origin: *\r\n");
    rsprintf("Pragma: no-cache\r\n");
    rsprintf("Expires: Fri, 01 Jan 1983 00:00:00 GMT\r\n");
    rsprintf("Content-Type: text/plain; charset=iso-8859-1\r\n\r\n");

K.O.
  921   25 Oct 2013 Konstantin OlchanskiBug Fixfixed mlogger run auto restart bug
A problem existed in midas for some time: when recording long data sets of time (or event) limited runs 
with logger run auto restart set to "yes", the runs will automatically stop and restart as expected, but 
sometimes the run will stop and never restart and beam will be lost until the experiment operator on shift 
wakes up and restarts the run manually.

I have now traced this problem to a race condition inside the mlogger - when a run is being stopped from 
the mlogger, the mlogger run transition handler (tr_stop) triggers an immediate attempt to start the next 
run, without waiting for the run-stop transition to actually complete. If the run-stop transition does not 
finish quickly enough, a safety check in start_the_run() will cause the run restart attempt to silently fail 
without any error message.

This race condition is pretty rare but somehow I managed to replicate it while debugging the 
multithreaded transitions. It is fixed by making mlogger wait until the run-stop transition completes.

https://bitbucket.org/tmidas/midas/commits/b2631fbed5f7b1ec80e8a6c8781ada0baed7702b

K.O.
ELOG V3.1.4-2e1708b5