Back Midas Rome Roody Rootana
  Midas DAQ System, Page 103 of 146  Not logged in ELOG logo
New entries since:Wed Dec 31 16:00:00 1969
Entry  07 Jul 2014, Ryu Sawada, Bug Report, mhist does not show history when -s option is used 
When I use -s option of mhist, it does not show history, for example.
mhist -s 140705 -p 140707 -e "HV".

And if I remove a line like,
diff --git a/utils/mhist.cxx b/utils/mhist.cxx
index 930de3b..10cc6ad 100755
--- a/utils/mhist.cxx
+++ b/utils/mhist.cxx
@@ -652,7 +652,6 @@ int main(int argc, char *argv[])
             else if (strncmp(argv[i], "-s", 2) == 0) {
                strcpy(start_name, argv[++i]);
                start_time = convert_time(argv[i]);
-               do_hst_file = true;
             } else if (strncmp(argv[i], "-p", 2) == 0)
                end_time = convert_time(argv[++i]);
             else if (strncmp(argv[i], "-t", 2) == 0)

It works.

Ryu Sawada
Entry  14 Oct 2014, Konstantin Olchanski, Bug Report, Problem in mfe multithread equipments 
In the ALPHA experiment at CERN I found a problem in mfe.c handling of multithreaded equipments. This problem was in 
some forms introduced around May 2013 and around Aug 2013 (commit 
https://bitbucket.org/tmidas/midas/src/45984c35b4f7/src/mfe.c) (I hope I got it right).

The effect was very odd - if event rate of multithreaded equipment was more than 100 Hz, the event counters on the midas 
status page would not increment and the frontend will crash on end of run. Other than that, all the events from the 
multithreaded equipment seem to appear in the SYSTEM buffer and in the data file normally.

This happened: in mfe.c::receive_trigger_event() a loop was introduced (previously,
there was no loop there - there was and still is a loop outside of receive_trigger_event()):

while (1)
   wait 10 ms for an event
   process event, loop back
   if there is no event, exit
}

Obviously, if the event rate is more than 100 Hz (repetition rate less than 10 ms),
the 10 ms wait will always return an event and we will never exit this loop.

So the mfe.c main loop is now stuck here and will not process any periodic activity
such as updating the equipment statistics (event counters on the midas status page)
or running periodic equipments in the same front end program.

The crash at the end of run will be caused by a timeout in responding to the "end of run" RPC call.

I have a patch in testing that solves this problem by restoring receive_trigger_event() to the original configuration, i.e. 
https://bitbucket.org/tmidas/midas/src/6899b96a4f8177d4af92035cd84aadf5a7cbc875/src/mfe.c?at=develop

K.O.
Entry  14 Oct 2014, Konstantin Olchanski, Bug Report, Hostile network scans against MIDAS RPC ports 
At CERN I see a large number of hostile network scans that seem to be injecting HTTP requests into the 
MIDAS RPC ports. So far, all these requests seem to be successfully rejected without crashing anything, but 
they do clog up midas.log.

The main problem here is that all MIDAS programs have at least one TCP socket open where they listen for 
RPC commands, such as "start of run", "please shutdown", etc. The port numbers of these sockets are 
randomized and that makes them difficult to protect them with firewall rules (firewall rules like fixed port 
numbers).

Note that this is different from the hostile network scans that I have first seen maybe 5 years ago that 
affected the mserver main listener socket. Then, as a solution, I hardened the RPC receiver code against 
bad data (and happy to see that this hardening is still holding up) and implemented the mserver "-A" 
command switch to specify a list of permitted peers. Also mserver uses a fixed port number ("-p" switch) 
and is easy to protect with firewall rules.

Since these ports cannot be protected by OS means (firewall, etc), we have to protect them in MIDAS.

One solution is to reject all connections from unauthorized peers.

One way to use this is to implement the "-A" switch to explicitely list all permitted peers, these switch will 
ave to be added to all long running midas programs (mhttpd, mlogger, mfe.c, etc). Not very practical, IMO.

Another way is to read the list of permitted peers from ODB, at startup time, or each time a new connection 
is made.

In the latter case, care needs to be taken to avoid deadlocks. For example remote programs that read ODB 
through the mserver may deadlock if the same mserver is the one trying to establish the RPC connection. 
Or if ODB is somehow locked.

NB - we already keep a list of permitted peers in ODB /Experiment/Security.

K.O.
    Reply  14 Oct 2014, Konstantin Olchanski, Bug Report, Problem in mfe multithread equipments 
For my reference:
good version: https://bitbucket.org/tmidas/midas/src/6899b96a4f8177d4af92035cd84aadf5a7cbc875/src/mfe.c?at=develop
first breakage: https://bitbucket.org/tmidas/midas/src/c60259d9a244bdcd296a8c5c6ab0b91de27f9905/src/mfe.c?at=develop
second breakage: https://bitbucket.org/tmidas/midas/src/45984c35b4f7257f90515f29116dec6fb46f2ebc/src/mfe.c?at=develop

The "first breakage" may actually be okey, because there the badnik loop loops over ring buffers, not infinite. But I cannot test it anymore.
K.O.
Entry  14 Oct 2014, Konstantin Olchanski, Bug Report, Problem with EQ_USER 
If you use EQ_USER in mfe.c and have multiple threads writing into the ring buffer, you will have a big 
problem - the thread locking in the ring buffer code only works for a single writer thread and a single 
reader thread.

Presently, it is not clear how to have multiple multithreaded equipments inside one frontend.

During the Summer of 2013 code briefly existed in mfe.c to have an array of ring buffers and each 
multithreaded equipment could write into it's own buffer.

But this code is now removed and mfe.c can only read from a single ring buffer and as I noted above, ring 
buffer locking requires that only a single thread writes into it.
K.O.
    Reply  14 Oct 2014, Stefan Ritt, Bug Report, Hostile network scans against MIDAS RPC ports 
Doing this through the ODB seems ok to me. If the ODB cannot be accessed, you can fall back to no protection.

At PSI we fortunately do not have these network scans because PSI uses a institute-wide firewall. So you can connect from outside PSI to inside PSI only 
on certain well-defined ports (like SSH to certain machines). You can do the same in Alpha. Use one computer as a router with two network cards, where 
the DAQ network runs on the second card as a private network. Then program the routing tables in that gateway such that only certain ports can be 
accessed from outside, like port 8080 to mhttpd. This way you block all except the things which are needed.

/Stefan
    Reply  15 Oct 2014, Stefan Ritt, Bug Report, Problem in mfe multithread equipments 
You are absolutely correct, the code is certainly wrong. It looks to me like the 

while (rbh)

was put in there for some testing, and I forgot to remove it. The only thing I could imagine is that we want to have a while loop there for performance reason. Like

readout_start = ss_millitime();
while (ss_millitime - readout_start < (DWORD) eq_info->period) {
  read event
  return 0 if no event found
}

You find this code also in the check_polled_events() routine. It ensures that the routine does not return after every single event, but after the period defined in the 
equipment (which is usually 100 ms for polled events). This way the code is more efficiently, since we do not check for RPC calls between every event, but just 10 times 
per second. This way you can shovel more events through the system, while still being responsive to run stops.

I don't have any hardware right now to test this, so please put my code above into the routine and commit it if it works.

I notice also a difference in both codes concerning the read buffer handles. The old code uses rbh2, while the new (wrong) code uses rbh. In your case probably both 
handles are the same, so it works, but in other experiments, which might use several ring buffers, it will fail. So please use rbh instead rbh2.

Let me know if it works for you, and if you see any difference in speed between the versions with and without the while loop (actually you will see this only if your trigger 
rate maxes out the DAQ).

Cheers,
Stefan
    Reply  15 Oct 2014, Stefan Ritt, Bug Report, Problem in mfe multithread equipments 
Please disregard my previous posting, you don't need the while loop, since it's already in the scheduler (around lines 2160 under /*---- send interrupt events ----*/). 

But now I remember the rationale behind it. The loop over the rb[i] is because in MEG I have n calibration threads, each one running on a separate CPU core. So the receive_trigger_event() routine has to collect events from all the 
threads, each of them having one ring buffer. In the process of implementing EQ_USER, I changed this somehow, and apparently broke the code by making the while() loop looping forever if the event rate is over 100 Hz.

So for the moment please remove the while loop completely, and I will worry later of putting it back correctly when MEG will start again next year.

/Stefan
    Reply  15 Oct 2014, Stefan Ritt, Bug Report, Problem with EQ_USER 
Sure, each thread needs its own ring buffer for writing.

So I see that we need back the multiple-ring-buffer-readout-scheme even before MEG will start. So what you need is something like

for (i=0 ; rb[i] != 0 ; i++) {
  read event from rb[i];
}

as it was before. What I do not like is that rb is a global variable, we should better use the encapsulation functions and extend get_event_rb() to 
get_event_rb(i) so you can have n ring buffers.

Give me one day, I will extend the current code to make it work again and to implement N threads.

Cheers,
Stefan
    Reply  16 Oct 2014, Stefan Ritt, Bug Report, Problem with EQ_USER 
I restructured the front-end code to enable multiple readout threads for EQ_USER equipment. Last summer I was definitively interrupted during 
that work and left it in an half finished state, sorry for that.

The way it works now is illustrated in mtfe.c. You create N ring buffers and N threads via

   for (int i=0 ; i<N ; i++) {
      create_event_rb(i);
      ss_thread_create(trigger_thread, (void*)(PTYPE)i);
   }

then each readout thread accesses its own readout buffer

thread(...)
{
   index = (int)(PTYPE)param;
   signal_readout_thread_active(index, TRUE);
   rbh = get_event_rbh(index);
   
   while (is_readout_thread_enabled()) {
      ... read event and put it into ring buffer ...
   }

   signal_readout_thread_active(index, FALSE);
}

The is_readout_thread_enabled() and signal_readout_thread_active() are used by the framework to shut down gracefully threads correct at the end 
of the program. This way each thread can close any hardware correctly.

Note that no other thread management is done by the framework. In the old days with interrupt equipment, the framework disabled interrupts 
when reading out periodic events, since that was necessary when using a single CAMAC crate for ADCs and scalers. This is obsolete now and not 
needed any longer. It is now the responsibility of the user code to resolve hardware access conflicts between different threads (like using a local 
mutex to access the same hardware). There is also no "readout when running" handling. If events should not be read out when the run is stopped, 
the readout thread has to check to run status, or better the EOR routine should disable the hardware trigger and the BOR routine should re-enable 
it. The readout threads will then poll for new events and just go to sleep if nothing is there.

I testes the mtfe.c program with 100 Hz and 1 MHz event rate on a dummy experiment (no hardware access) and it worked without problem.

Let me know if there is any issue left over.

/Stefan 
    Reply  16 Oct 2014, Stefan Ritt, Bug Report, Problem in mfe multithread equipments 
> while (1)
>    wait 10 ms for an event
>    process event, loop back
>    if there is no event, exit
> }


This code has been rewritten now and should work for event rates >100 Hz.

/Stefan
    Reply  16 Oct 2014, Konstantin Olchanski, Bug Report, Hostile network scans against MIDAS RPC ports 
> Doing this through the ODB seems ok to me. If the ODB cannot be accessed, you can fall back to no protection.
> 
> At PSI we fortunately do not have these network scans because PSI uses a institute-wide firewall.
>

Same here at TRIUMF, no problems with hostile network activity. Only see this trouble at CERN. Nominally CERN also have
everything behind the CERN firewall, that is why I tend to think that I am seeing network scans done by CERN security people,
or some badniks on the CERN local network (PC malware, etc).

> So you can connect from outside PSI to inside PSI only 
> on certain well-defined ports (like SSH to certain machines). You can do the same in Alpha. Use one computer as a router with two network cards, where 
> the DAQ network runs on the second card as a private network. Then program the routing tables in that gateway such that only certain ports can be 
> accessed from outside, like port 8080 to mhttpd. This way you block all except the things which are needed.

Yes, this is how we did it for DEAP at SNOLAB. No network trouble there.

But generically for MIDAS, I think we should have built-in capability for MIDAS to protect itself without reliance on OS-level means (local firewall)
or network-level means ("site firewalls").

Sometimes we have very small MIDAS installations, i.e. just one machine by itself, and such setups should be secure/secured easily -
too much work to setup an external firewall box just for one machine and OS-level firewall rules sometimes conflict
with some OS services (i.e. NIS) (I am still waiting for the "NIS to LDAP migration for dummies" guide).

K.O.
    Reply  16 Oct 2014, Stefan Ritt, Bug Report, Hostile network scans against MIDAS RPC ports 
> Sometimes we have very small MIDAS installations, i.e. just one machine by itself, and such setups should be secure/secured easily -
> too much work to setup an external firewall box just for one machine and OS-level firewall rules sometimes conflict
> with some OS services (i.e. NIS) (I am still waiting for the "NIS to LDAP migration for dummies" guide).

I fully agree with you. So if you find time to implement this, I will be more than happy.

/Stefan
Entry  27 Jan 2015, Konstantin Olchanski, Bug Report, getaddrinfo() 
To support IPV6, we need to migrate MIDAS from gethostbyname() to getaddrinfo(). (Thanks to 
http://www.openwall.com/lists/oss-security/2015/01/27/9). K.O.
Entry  16 Jul 2015, Thomas Lindner, Bug Report, jset/ODBSet using true/false for booleans 
MIDAS does not seem to be consistent (or at least convenient) with how it
handles booleans in AJAX functions.

When you request an ODB value that is a boolean with AJAX call like

http://neut14.triumf.ca:8081/?cmd=jcopy&odb=/Equipment/DCRC/Common/Hidden&format=json-nokeys

then you get

{ "Hidden/last_written" : 1437065425, "Hidden" : false }

This seems correct, since the JSON convention has booleans encoded as true/false.

But this convention does not work when trying to set the boolean value. For instance

http://neut14.triumf.ca:8081/?cmd=jset&odb=/Equipment/DCRC/Common/Hidden&format=json-nokeys&value=true

does not set the variable to true. To make this work you need to use the
characters y/n

http://neut14.triumf.ca:8081/?cmd=jset&odb=/Equipment/DCRC/Common/Hidden&format=json-nokeys&value=y

I tested this with ajax/jset, but the same problem seems to occur when using the
javascript function ODBSet. The documentation doesn't say what sort of encoding
to use when using these functions, so I guess the idea is that these functions
use MIDAS encoding for booleans. But it seems to me that it would be more
convenient if jset/ODBSet allowed the option to use json/javascript encoding for
boolean values; or at least had that as a format option for jset/ODBSet.  That
way my javascript could look like

var mybool = true;
URI_command =
"?cmd=jset&odb=/Equipment/DCRC/Common/Hidden&format=json-nokeys&value=" + mybool;

instead of

var mybool = true;
URI_command = ""
if(mybool){
  URI_command =
"?cmd=jset&odb=/Equipment/DCRC/Common/Hidden&format=json-nokeys&value=y";
else
  URI_command =
"?cmd=jset&odb=/Equipment/DCRC/Common/Hidden&format=json-nokeys&value=n";


__________________________________________________________
Cross-posting from bitbucket issue tracker:

https://bitbucket.org/tmidas/midas/issues/29/jset-odbset-using-true-false-for-booleans
    Reply  29 Jul 2015, Stefan Ritt, Bug Report, jset/ODBSet using true/false for booleans 
See bitbucket for the solution.

https://bitbucket.org/tmidas/midas/issues/29/jset-odbset-using-true-false-for-booleans#comment-20550474
Entry  19 Aug 2015, Pierre Gorel, Bug Report, Sequencer limits 
While I know some of those limits/problems have been already been reported from
DEAP (and maybe corrected in the last version), I am recording them here:

Bugs (not working as it should): 
- "SCRIPT" does not seem to take the parameters into account
- The operators for WAIT are incorrectly set:
the default ">=" and ">" are correct, but "<=", "<", "==" and "!=" are all using
">=" for the test. 

Possible improvements:
- in LOOP, how can I get the index of the LOOP? I used an extra variable that I
increment, but it there a better way?
- PARAM is giving "string" (or a bool) whose size is set by the user input. The
side effect is that if I am making a loop starting at "1", the incrementation
will loop at "9" -> "1". If I start at "01", the incrementation will give "2.",
"3.",... "9.", "10"... The later is probably what most people would use.
- ODBGet (and ODBSet?) does seem to be able to take a variable as a path... I
was trying to use an array whose index would be incremented.
    Reply  19 Aug 2015, Pierre-Andre Amaudruz, Bug Report, Sequencer limits 
These issues have been addressed by Stefan during his visit at Triumf last month.
The latest git has those fixes.

> While I know some of those limits/problems have been already been reported from
> DEAP (and maybe corrected in the last version), I am recording them here:
> 
> Bugs (not working as it should): 
> - "SCRIPT" does not seem to take the parameters into account

Fixed

> - The operators for WAIT are incorrectly set:
> the default ">=" and ">" are correct, but "<=", "<", "==" and "!=" are all using
> ">=" for the test. 

Fixed

> 
> Possible improvements:
> - in LOOP, how can I get the index of the LOOP? I used an extra variable that I
> increment, but it there a better way?

See LOOP doc
 LOOP cnt, 10
   ODBGET /foo/bflag, bb 
   IF $bb==1 THEN
     SET cnt, 10
   ELSE 
     ... 

> - PARAM is giving "string" (or a bool) whose size is set by the user input. The
> side effect is that if I am making a loop starting at "1", the incrementation
> will loop at "9" -> "1". If I start at "01", the incrementation will give "2.",
> "3.",... "9.", "10"... The later is probably what most people would use.

Fixed

> - ODBGet (and ODBSet?) does seem to be able to take a variable as a path... I
> was trying to use an array whose index would be incremented.

To be checked.
    Reply  19 Aug 2015, Konstantin Olchanski, Bug Report, Sequencer limits 
> 
> See LOOP doc
>  LOOP cnt, 10
>    ODBGET /foo/bflag, bb 
>    IF $bb==1 THEN
>      SET cnt, 10
>    ELSE 
>      ... 
> 

Looks like we have PE
    Reply  20 Aug 2015, Stefan Ritt, Bug Report, Sequencer limits 
> > - ODBGet (and ODBSet?) does seem to be able to take a variable as a path... I
> > was trying to use an array whose index would be incremented.
> 
> To be checked.

It does not take a variable as a path, but as an index. So you can do

LOOP i, 5

  WAIT seconds, 3
  ODBSET /System/Tmp/Test[$i], $i

ENDLOOP


And you will get

/System/Tmp/Test
[1] 1
[2] 2
[3] 3
[4] 4
[5] 5


/Stefan 
ELOG V3.1.4-2e1708b5