07 Jul 2014, Ryu Sawada, Bug Report, mhist does not show history when -s option is used
|
When I use -s option of mhist, it does not show history, for example.
mhist -s 140705 -p 140707 -e "HV".
And if I remove a line like,
diff --git a/utils/mhist.cxx b/utils/mhist.cxx
index 930de3b..10cc6ad 100755
--- a/utils/mhist.cxx
+++ b/utils/mhist.cxx
@@ -652,7 +652,6 @@ int main(int argc, char *argv[])
else if (strncmp(argv[i], "-s", 2) == 0) {
strcpy(start_name, argv[++i]);
start_time = convert_time(argv[i]);
- do_hst_file = true;
} else if (strncmp(argv[i], "-p", 2) == 0)
end_time = convert_time(argv[++i]);
else if (strncmp(argv[i], "-t", 2) == 0)
It works.
Ryu Sawada |
14 Oct 2014, Konstantin Olchanski, Bug Report, Problem in mfe multithread equipments
|
In the ALPHA experiment at CERN I found a problem in mfe.c handling of multithreaded equipments. This problem was in
some forms introduced around May 2013 and around Aug 2013 (commit
https://bitbucket.org/tmidas/midas/src/45984c35b4f7/src/mfe.c) (I hope I got it right).
The effect was very odd - if event rate of multithreaded equipment was more than 100 Hz, the event counters on the midas
status page would not increment and the frontend will crash on end of run. Other than that, all the events from the
multithreaded equipment seem to appear in the SYSTEM buffer and in the data file normally.
This happened: in mfe.c::receive_trigger_event() a loop was introduced (previously,
there was no loop there - there was and still is a loop outside of receive_trigger_event()):
while (1)
wait 10 ms for an event
process event, loop back
if there is no event, exit
}
Obviously, if the event rate is more than 100 Hz (repetition rate less than 10 ms),
the 10 ms wait will always return an event and we will never exit this loop.
So the mfe.c main loop is now stuck here and will not process any periodic activity
such as updating the equipment statistics (event counters on the midas status page)
or running periodic equipments in the same front end program.
The crash at the end of run will be caused by a timeout in responding to the "end of run" RPC call.
I have a patch in testing that solves this problem by restoring receive_trigger_event() to the original configuration, i.e.
https://bitbucket.org/tmidas/midas/src/6899b96a4f8177d4af92035cd84aadf5a7cbc875/src/mfe.c?at=develop
K.O. |
14 Oct 2014, Konstantin Olchanski, Bug Report, Hostile network scans against MIDAS RPC ports
|
At CERN I see a large number of hostile network scans that seem to be injecting HTTP requests into the
MIDAS RPC ports. So far, all these requests seem to be successfully rejected without crashing anything, but
they do clog up midas.log.
The main problem here is that all MIDAS programs have at least one TCP socket open where they listen for
RPC commands, such as "start of run", "please shutdown", etc. The port numbers of these sockets are
randomized and that makes them difficult to protect them with firewall rules (firewall rules like fixed port
numbers).
Note that this is different from the hostile network scans that I have first seen maybe 5 years ago that
affected the mserver main listener socket. Then, as a solution, I hardened the RPC receiver code against
bad data (and happy to see that this hardening is still holding up) and implemented the mserver "-A"
command switch to specify a list of permitted peers. Also mserver uses a fixed port number ("-p" switch)
and is easy to protect with firewall rules.
Since these ports cannot be protected by OS means (firewall, etc), we have to protect them in MIDAS.
One solution is to reject all connections from unauthorized peers.
One way to use this is to implement the "-A" switch to explicitely list all permitted peers, these switch will
ave to be added to all long running midas programs (mhttpd, mlogger, mfe.c, etc). Not very practical, IMO.
Another way is to read the list of permitted peers from ODB, at startup time, or each time a new connection
is made.
In the latter case, care needs to be taken to avoid deadlocks. For example remote programs that read ODB
through the mserver may deadlock if the same mserver is the one trying to establish the RPC connection.
Or if ODB is somehow locked.
NB - we already keep a list of permitted peers in ODB /Experiment/Security.
K.O. |
14 Oct 2014, Konstantin Olchanski, Bug Report, Problem in mfe multithread equipments
|
For my reference:
good version: https://bitbucket.org/tmidas/midas/src/6899b96a4f8177d4af92035cd84aadf5a7cbc875/src/mfe.c?at=develop
first breakage: https://bitbucket.org/tmidas/midas/src/c60259d9a244bdcd296a8c5c6ab0b91de27f9905/src/mfe.c?at=develop
second breakage: https://bitbucket.org/tmidas/midas/src/45984c35b4f7257f90515f29116dec6fb46f2ebc/src/mfe.c?at=develop
The "first breakage" may actually be okey, because there the badnik loop loops over ring buffers, not infinite. But I cannot test it anymore.
K.O. |
14 Oct 2014, Konstantin Olchanski, Bug Report, Problem with EQ_USER
|
If you use EQ_USER in mfe.c and have multiple threads writing into the ring buffer, you will have a big
problem - the thread locking in the ring buffer code only works for a single writer thread and a single
reader thread.
Presently, it is not clear how to have multiple multithreaded equipments inside one frontend.
During the Summer of 2013 code briefly existed in mfe.c to have an array of ring buffers and each
multithreaded equipment could write into it's own buffer.
But this code is now removed and mfe.c can only read from a single ring buffer and as I noted above, ring
buffer locking requires that only a single thread writes into it.
K.O. |
14 Oct 2014, Stefan Ritt, Bug Report, Hostile network scans against MIDAS RPC ports
|
Doing this through the ODB seems ok to me. If the ODB cannot be accessed, you can fall back to no protection.
At PSI we fortunately do not have these network scans because PSI uses a institute-wide firewall. So you can connect from outside PSI to inside PSI only
on certain well-defined ports (like SSH to certain machines). You can do the same in Alpha. Use one computer as a router with two network cards, where
the DAQ network runs on the second card as a private network. Then program the routing tables in that gateway such that only certain ports can be
accessed from outside, like port 8080 to mhttpd. This way you block all except the things which are needed.
/Stefan |
15 Oct 2014, Stefan Ritt, Bug Report, Problem in mfe multithread equipments
|
You are absolutely correct, the code is certainly wrong. It looks to me like the
while (rbh)
was put in there for some testing, and I forgot to remove it. The only thing I could imagine is that we want to have a while loop there for performance reason. Like
readout_start = ss_millitime();
while (ss_millitime - readout_start < (DWORD) eq_info->period) {
read event
return 0 if no event found
}
You find this code also in the check_polled_events() routine. It ensures that the routine does not return after every single event, but after the period defined in the
equipment (which is usually 100 ms for polled events). This way the code is more efficiently, since we do not check for RPC calls between every event, but just 10 times
per second. This way you can shovel more events through the system, while still being responsive to run stops.
I don't have any hardware right now to test this, so please put my code above into the routine and commit it if it works.
I notice also a difference in both codes concerning the read buffer handles. The old code uses rbh2, while the new (wrong) code uses rbh. In your case probably both
handles are the same, so it works, but in other experiments, which might use several ring buffers, it will fail. So please use rbh instead rbh2.
Let me know if it works for you, and if you see any difference in speed between the versions with and without the while loop (actually you will see this only if your trigger
rate maxes out the DAQ).
Cheers,
Stefan |
15 Oct 2014, Stefan Ritt, Bug Report, Problem in mfe multithread equipments
|
Please disregard my previous posting, you don't need the while loop, since it's already in the scheduler (around lines 2160 under /*---- send interrupt events ----*/).
But now I remember the rationale behind it. The loop over the rb[i] is because in MEG I have n calibration threads, each one running on a separate CPU core. So the receive_trigger_event() routine has to collect events from all the
threads, each of them having one ring buffer. In the process of implementing EQ_USER, I changed this somehow, and apparently broke the code by making the while() loop looping forever if the event rate is over 100 Hz.
So for the moment please remove the while loop completely, and I will worry later of putting it back correctly when MEG will start again next year.
/Stefan |
15 Oct 2014, Stefan Ritt, Bug Report, Problem with EQ_USER
|
Sure, each thread needs its own ring buffer for writing.
So I see that we need back the multiple-ring-buffer-readout-scheme even before MEG will start. So what you need is something like
for (i=0 ; rb[i] != 0 ; i++) {
read event from rb[i];
}
as it was before. What I do not like is that rb is a global variable, we should better use the encapsulation functions and extend get_event_rb() to
get_event_rb(i) so you can have n ring buffers.
Give me one day, I will extend the current code to make it work again and to implement N threads.
Cheers,
Stefan |
16 Oct 2014, Stefan Ritt, Bug Report, Problem with EQ_USER
|
I restructured the front-end code to enable multiple readout threads for EQ_USER equipment. Last summer I was definitively interrupted during
that work and left it in an half finished state, sorry for that.
The way it works now is illustrated in mtfe.c. You create N ring buffers and N threads via
for (int i=0 ; i<N ; i++) {
create_event_rb(i);
ss_thread_create(trigger_thread, (void*)(PTYPE)i);
}
then each readout thread accesses its own readout buffer
thread(...)
{
index = (int)(PTYPE)param;
signal_readout_thread_active(index, TRUE);
rbh = get_event_rbh(index);
while (is_readout_thread_enabled()) {
... read event and put it into ring buffer ...
}
signal_readout_thread_active(index, FALSE);
}
The is_readout_thread_enabled() and signal_readout_thread_active() are used by the framework to shut down gracefully threads correct at the end
of the program. This way each thread can close any hardware correctly.
Note that no other thread management is done by the framework. In the old days with interrupt equipment, the framework disabled interrupts
when reading out periodic events, since that was necessary when using a single CAMAC crate for ADCs and scalers. This is obsolete now and not
needed any longer. It is now the responsibility of the user code to resolve hardware access conflicts between different threads (like using a local
mutex to access the same hardware). There is also no "readout when running" handling. If events should not be read out when the run is stopped,
the readout thread has to check to run status, or better the EOR routine should disable the hardware trigger and the BOR routine should re-enable
it. The readout threads will then poll for new events and just go to sleep if nothing is there.
I testes the mtfe.c program with 100 Hz and 1 MHz event rate on a dummy experiment (no hardware access) and it worked without problem.
Let me know if there is any issue left over.
/Stefan |
16 Oct 2014, Stefan Ritt, Bug Report, Problem in mfe multithread equipments
|
> while (1)
> wait 10 ms for an event
> process event, loop back
> if there is no event, exit
> }
This code has been rewritten now and should work for event rates >100 Hz.
/Stefan |
16 Oct 2014, Konstantin Olchanski, Bug Report, Hostile network scans against MIDAS RPC ports
|
> Doing this through the ODB seems ok to me. If the ODB cannot be accessed, you can fall back to no protection.
>
> At PSI we fortunately do not have these network scans because PSI uses a institute-wide firewall.
>
Same here at TRIUMF, no problems with hostile network activity. Only see this trouble at CERN. Nominally CERN also have
everything behind the CERN firewall, that is why I tend to think that I am seeing network scans done by CERN security people,
or some badniks on the CERN local network (PC malware, etc).
> So you can connect from outside PSI to inside PSI only
> on certain well-defined ports (like SSH to certain machines). You can do the same in Alpha. Use one computer as a router with two network cards, where
> the DAQ network runs on the second card as a private network. Then program the routing tables in that gateway such that only certain ports can be
> accessed from outside, like port 8080 to mhttpd. This way you block all except the things which are needed.
Yes, this is how we did it for DEAP at SNOLAB. No network trouble there.
But generically for MIDAS, I think we should have built-in capability for MIDAS to protect itself without reliance on OS-level means (local firewall)
or network-level means ("site firewalls").
Sometimes we have very small MIDAS installations, i.e. just one machine by itself, and such setups should be secure/secured easily -
too much work to setup an external firewall box just for one machine and OS-level firewall rules sometimes conflict
with some OS services (i.e. NIS) (I am still waiting for the "NIS to LDAP migration for dummies" guide).
K.O. |
16 Oct 2014, Stefan Ritt, Bug Report, Hostile network scans against MIDAS RPC ports
|
> Sometimes we have very small MIDAS installations, i.e. just one machine by itself, and such setups should be secure/secured easily -
> too much work to setup an external firewall box just for one machine and OS-level firewall rules sometimes conflict
> with some OS services (i.e. NIS) (I am still waiting for the "NIS to LDAP migration for dummies" guide).
I fully agree with you. So if you find time to implement this, I will be more than happy.
/Stefan |
27 Jan 2015, Konstantin Olchanski, Bug Report, getaddrinfo()
|
To support IPV6, we need to migrate MIDAS from gethostbyname() to getaddrinfo(). (Thanks to
http://www.openwall.com/lists/oss-security/2015/01/27/9). K.O. |
16 Jul 2015, Thomas Lindner, Bug Report, jset/ODBSet using true/false for booleans
|
MIDAS does not seem to be consistent (or at least convenient) with how it
handles booleans in AJAX functions.
When you request an ODB value that is a boolean with AJAX call like
http://neut14.triumf.ca:8081/?cmd=jcopy&odb=/Equipment/DCRC/Common/Hidden&format=json-nokeys
then you get
{ "Hidden/last_written" : 1437065425, "Hidden" : false }
This seems correct, since the JSON convention has booleans encoded as true/false.
But this convention does not work when trying to set the boolean value. For instance
http://neut14.triumf.ca:8081/?cmd=jset&odb=/Equipment/DCRC/Common/Hidden&format=json-nokeys&value=true
does not set the variable to true. To make this work you need to use the
characters y/n
http://neut14.triumf.ca:8081/?cmd=jset&odb=/Equipment/DCRC/Common/Hidden&format=json-nokeys&value=y
I tested this with ajax/jset, but the same problem seems to occur when using the
javascript function ODBSet. The documentation doesn't say what sort of encoding
to use when using these functions, so I guess the idea is that these functions
use MIDAS encoding for booleans. But it seems to me that it would be more
convenient if jset/ODBSet allowed the option to use json/javascript encoding for
boolean values; or at least had that as a format option for jset/ODBSet. That
way my javascript could look like
var mybool = true;
URI_command =
"?cmd=jset&odb=/Equipment/DCRC/Common/Hidden&format=json-nokeys&value=" + mybool;
instead of
var mybool = true;
URI_command = ""
if(mybool){
URI_command =
"?cmd=jset&odb=/Equipment/DCRC/Common/Hidden&format=json-nokeys&value=y";
else
URI_command =
"?cmd=jset&odb=/Equipment/DCRC/Common/Hidden&format=json-nokeys&value=n";
__________________________________________________________
Cross-posting from bitbucket issue tracker:
https://bitbucket.org/tmidas/midas/issues/29/jset-odbset-using-true-false-for-booleans |
29 Jul 2015, Stefan Ritt, Bug Report, jset/ODBSet using true/false for booleans
|
See bitbucket for the solution.
https://bitbucket.org/tmidas/midas/issues/29/jset-odbset-using-true-false-for-booleans#comment-20550474 |
19 Aug 2015, Pierre Gorel, Bug Report, Sequencer limits
|
While I know some of those limits/problems have been already been reported from
DEAP (and maybe corrected in the last version), I am recording them here:
Bugs (not working as it should):
- "SCRIPT" does not seem to take the parameters into account
- The operators for WAIT are incorrectly set:
the default ">=" and ">" are correct, but "<=", "<", "==" and "!=" are all using
">=" for the test.
Possible improvements:
- in LOOP, how can I get the index of the LOOP? I used an extra variable that I
increment, but it there a better way?
- PARAM is giving "string" (or a bool) whose size is set by the user input. The
side effect is that if I am making a loop starting at "1", the incrementation
will loop at "9" -> "1". If I start at "01", the incrementation will give "2.",
"3.",... "9.", "10"... The later is probably what most people would use.
- ODBGet (and ODBSet?) does seem to be able to take a variable as a path... I
was trying to use an array whose index would be incremented. |
19 Aug 2015, Pierre-Andre Amaudruz, Bug Report, Sequencer limits
|
These issues have been addressed by Stefan during his visit at Triumf last month.
The latest git has those fixes.
> While I know some of those limits/problems have been already been reported from
> DEAP (and maybe corrected in the last version), I am recording them here:
>
> Bugs (not working as it should):
> - "SCRIPT" does not seem to take the parameters into account
Fixed
> - The operators for WAIT are incorrectly set:
> the default ">=" and ">" are correct, but "<=", "<", "==" and "!=" are all using
> ">=" for the test.
Fixed
>
> Possible improvements:
> - in LOOP, how can I get the index of the LOOP? I used an extra variable that I
> increment, but it there a better way?
See LOOP doc
LOOP cnt, 10
ODBGET /foo/bflag, bb
IF $bb==1 THEN
SET cnt, 10
ELSE
...
> - PARAM is giving "string" (or a bool) whose size is set by the user input. The
> side effect is that if I am making a loop starting at "1", the incrementation
> will loop at "9" -> "1". If I start at "01", the incrementation will give "2.",
> "3.",... "9.", "10"... The later is probably what most people would use.
Fixed
> - ODBGet (and ODBSet?) does seem to be able to take a variable as a path... I
> was trying to use an array whose index would be incremented.
To be checked. |
19 Aug 2015, Konstantin Olchanski, Bug Report, Sequencer limits
|
>
> See LOOP doc
> LOOP cnt, 10
> ODBGET /foo/bflag, bb
> IF $bb==1 THEN
> SET cnt, 10
> ELSE
> ...
>
Looks like we have PE |
20 Aug 2015, Stefan Ritt, Bug Report, Sequencer limits
|
> > - ODBGet (and ODBSet?) does seem to be able to take a variable as a path... I
> > was trying to use an array whose index would be incremented.
>
> To be checked.
It does not take a variable as a path, but as an index. So you can do
LOOP i, 5
WAIT seconds, 3
ODBSET /System/Tmp/Test[$i], $i
ENDLOOP
And you will get
/System/Tmp/Test
[1] 1
[2] 2
[3] 3
[4] 4
[5] 5
/Stefan |
|