ID |
Date |
Author |
Topic |
Subject |
945
|
15 Jan 2014 |
Konstantin Olchanski | Bug Report | MIDAS password protection is broken | If you follow the MIDAS documentation for setting up password protection, you will get strange messages:
ladd00:midas$ ./linux/bin/odbedit
[local:testexpt:S]/>passwd <---- setup a password
Password:
Retype password:
[local:testexpt:S]/> exit
ladd00:midas$ odbedit
Password: <---- enter correct password here
ss_semaphore_wait_for: semop/semtimedop(21135376) returned -1, errno 22 (Invalid argument)
ss_semaphore_release: semop/semtimedop(21135376) returned -1, errno 22 (Invalid argument)
[local:testexpt:S]/>ss_semaphore_wait_for: semop/semtimedop(21037069) returned -1, errno 43 (Identifier removed)
The same messages will appear from all other programs - mhttpd, etc. They will be printed about every 1 second.
So what do they mean? They mean what they say - the semaphore is not there, it is easy to check using "ipcs" that semaphores with
those ids do not exist. In fact all the semaphores are missing (the ODB semaphore is eventually recreated, so at least ODB works
correctly).
In this situation, MIDAS will not work correctly.
What is happening?
- cm_connect_experiment1() creates all the semaphores and remembers them in cm_set_experiment_semaphore()
- calls cm_set_client_info()
- cm_set_client_info() finds ODB /expt/sec/password, and returns CM_WRONG_PASSWORD
- before returning, it calls db_close_all_databases() and bm_close_all_buffers(), which delete all semaphores (put a print statement in
ss_semaphore_delete() to see this).
- (values saved by cm_set_experiment_semaphore() are stale now).
- (if by luck you have other midas programs still running, the semaphores will not be deleted)
- we are back to cm_connect_experiment1() which will ask for the password, call cm_set_client_info() again and continue as usual
- it will reopen ODB, recreating the ODB semaphore
- (but all the other semaphores are still deleted and values saved by cm_set_experiment_semaphore() are stale)
I through to improve this by fixing a bug in cm_msg_log() (where the messages are coming from) - it tries to lock the "MSG"
semaphore, but even if it could not lock it, it continues as usual and even calls an unlock at the end. (very bad). For catastrophic
locking failures like this (semaphore is deleted), we usually abort. But if I abort here, I get completely locked out from odb - odbedit
crashes right away and there is no way to do any corrective action other than delete odb and reload it from an xml file.
I know that some experiments use this password protection - why/how does it work there?
I think they are okey because they put critical programs like odbedit, mserver, mlogger and mhttpd into "/expt/sec/allowed
programs". In this case the pass the password check in cm_set_client_info() and the semaphores are not deleted. If any subsequent
program asks for the password, the semaphores survive because mlogger or mhttpd is already running and keeps semaphores from
being deleted.
What a mess.
K.O. |
946
|
15 Jan 2014 |
Konstantin Olchanski | Bug Report | MIDAS Web password broken | The MIDAS Web password function is broken - with the web password enabled, I am not prompted for a
password when editing ODB. The password still partially works - I am prompted for the web password
when starting a run. K.O.
P.S. https://midas.triumf.ca/MidasWiki/index.php/Security says "web password" needed for "write access",
but does not specify if this includes editing odb. (I would think so, and I think I remember that it used to). |
947
|
15 Jan 2014 |
Konstantin Olchanski | Bug Report | MIDAS password protection is broken | > I through to improve this by fixing a bug in cm_msg_log() (where the messages are coming from)
The periodic messages about broken semaphore actually come from al_check(). I put some whining there, too.
K.O. |
953
|
05 Feb 2014 |
Stefan Ritt | Bug Report | MIDAS Web password broken | > The MIDAS Web password function is broken - with the web password enabled, I am not prompted for a
> password when editing ODB. The password still partially works - I am prompted for the web password
> when starting a run. K.O.
>
> P.S. https://midas.triumf.ca/MidasWiki/index.php/Security says "web password" needed for "write access",
> but does not specify if this includes editing odb. (I would think so, and I think I remember that it used to).
Didn't we agree to put those issues into the bitbucket issue tracker?
This functionality got broken when implementing the new inline edit functionality. Actually one has to "manually" check for the password. The old way
was that there web page asking for the web password, but if we do ODBSet via Ajax there is nobody who could fill out that form. So I added a
"manual" checking into ODBCheckWebPassword(). This will not work for custom pages, but they have their own way to define passwords.
/Stefan |
954
|
05 Feb 2014 |
Stefan Ritt | Bug Report | MIDAS password protection is broken | > If you follow the MIDAS documentation for setting up password protection, you will get strange messages:
This is interesting. When I used it last time (some years ago...) it worked fine. I did not touch this, and now it's broken. Must be related to some modifications of the semaphore system.
Well, anyhow, the problem seems to me the db_close_all_databses() and the re-opening of the ODB. Apparently the db_close_database() call does not clean up the semaphores properly.
Actually there is absolutely no need to close and re-open the ODB upon a wrong password, so I just removed that code and now it works again.
/Stefan |
955
|
11 Feb 2014 |
Andreas Suter | Bug Report | mhttpd, etc. | I found a couple of bugs in the current mhttpd, midas version: "93fa5ed"
This concerns all browser I checked (firefox, chrome, internet explorer, opera)
1) When trying to change a value of a frontend using a multi class driver (we
have a lot of them), the field for changing appears, but I cannot get it set!
Neither via the two set buttons (why 2?) nor via return.

It also would be nice, if the css could be changed such that input/output for
multi-driver would be better separated; something along as suggested in

2) If I changing a value (generic/hv class driver), the index of the array
remains when chaning a value until the next update of the page

3) We are using a web-password. In the current version the password is plain visible when entering.
4) I just copied the header as described here: https://midas.triumf.ca/elog/Midas/908, but I get another result:

It looks like as a wrong cookie is filtered? |
958
|
11 Feb 2014 |
Stefan Ritt | Bug Report | mhttpd, etc. |
Andreas Suter wrote: | I found a couple of bugs in the current mhttpd, midas version: "93fa5ed" |
See my reply on the issue tracker:
https://bitbucket.org/tmidas/midas/issue/18/mhttpd-bugs |
967
|
23 Feb 2014 |
Andre Frankenthal | Bug Report | Installation failing on Mac OS X 10.9 -- related to strlcat and strlcpy | Hi,
I don't know if this actually fits the Bug Report category. I've been trying to install Midas on my Mac OS
Mavericks and I keep getting errors like "conflicting types for '___builtin____strlcpy_chk' ..." and similarly for
strlcat. I googled a bit and I think the problem might be that in Mavericks strlcat and strlcpy are already
defined in string.h, and so there might be a redundant definition somewhere. I'm not sure what the best
way to fix this would be though. Any help would be appreciated.
Thanks,
Andre |
971
|
27 Feb 2014 |
Konstantin Olchanski | Bug Report | Installation failing on Mac OS X 10.9 -- related to strlcat and strlcpy | >
> I don't know if this actually fits the Bug Report category. I've been trying to install Midas on my Mac OS
> Mavericks and I keep getting errors like "conflicting types for '___builtin____strlcpy_chk' ..." and similarly for
> strlcat. I googled a bit and I think the problem might be that in Mavericks strlcat and strlcpy are already
> defined in string.h, and so there might be a redundant definition somewhere. I'm not sure what the best
> way to fix this would be though. Any help would be appreciated.
>
We have run into this problem - MacOS 10.9 plays funny games with definitions of strlcpy() & co - and it has been fixed since last Summer.
For the record, current MIDAS builds just fine on MacOS 10.9.2.
For a pure test, try the instructions posted at midas.triumf.ca:
cd $HOME
mkdir packages
cd packages
git clone https://bitbucket.org/tmidas/midas
git clone https://bitbucket.org/tmidas/mscb
git clone https://bitbucket.org/tmidas/mxml
cd midas
make
K.O. |
972
|
27 Feb 2014 |
Andre Frankenthal | Bug Report | Installation failing on Mac OS X 10.9 -- related to strlcat and strlcpy | > >
> > I don't know if this actually fits the Bug Report category. I've been trying to install Midas on my Mac OS
> > Mavericks and I keep getting errors like "conflicting types for '___builtin____strlcpy_chk' ..." and similarly for
> > strlcat. I googled a bit and I think the problem might be that in Mavericks strlcat and strlcpy are already
> > defined in string.h, and so there might be a redundant definition somewhere. I'm not sure what the best
> > way to fix this would be though. Any help would be appreciated.
> >
>
> We have run into this problem - MacOS 10.9 plays funny games with definitions of strlcpy() & co - and it has been fixed since last Summer.
>
> For the record, current MIDAS builds just fine on MacOS 10.9.2.
>
> For a pure test, try the instructions posted at midas.triumf.ca:
>
> cd $HOME
> mkdir packages
> cd packages
> git clone https://bitbucket.org/tmidas/midas
> git clone https://bitbucket.org/tmidas/mscb
> git clone https://bitbucket.org/tmidas/mxml
> cd midas
> make
>
> K.O.
Thanks, it works like a charm now! I must have obtained an outdated version of Midas.
Andre |
1011
|
07 Jul 2014 |
Ryu Sawada | Bug Report | mhist does not show history when -s option is used | When I use -s option of mhist, it does not show history, for example.
mhist -s 140705 -p 140707 -e "HV".
And if I remove a line like,
diff --git a/utils/mhist.cxx b/utils/mhist.cxx
index 930de3b..10cc6ad 100755
--- a/utils/mhist.cxx
+++ b/utils/mhist.cxx
@@ -652,7 +652,6 @@ int main(int argc, char *argv[])
else if (strncmp(argv[i], "-s", 2) == 0) {
strcpy(start_name, argv[++i]);
start_time = convert_time(argv[i]);
- do_hst_file = true;
} else if (strncmp(argv[i], "-p", 2) == 0)
end_time = convert_time(argv[++i]);
else if (strncmp(argv[i], "-t", 2) == 0)
It works.
Ryu Sawada |
1021
|
14 Oct 2014 |
Konstantin Olchanski | Bug Report | Problem in mfe multithread equipments | In the ALPHA experiment at CERN I found a problem in mfe.c handling of multithreaded equipments. This problem was in
some forms introduced around May 2013 and around Aug 2013 (commit
https://bitbucket.org/tmidas/midas/src/45984c35b4f7/src/mfe.c) (I hope I got it right).
The effect was very odd - if event rate of multithreaded equipment was more than 100 Hz, the event counters on the midas
status page would not increment and the frontend will crash on end of run. Other than that, all the events from the
multithreaded equipment seem to appear in the SYSTEM buffer and in the data file normally.
This happened: in mfe.c::receive_trigger_event() a loop was introduced (previously,
there was no loop there - there was and still is a loop outside of receive_trigger_event()):
while (1)
wait 10 ms for an event
process event, loop back
if there is no event, exit
}
Obviously, if the event rate is more than 100 Hz (repetition rate less than 10 ms),
the 10 ms wait will always return an event and we will never exit this loop.
So the mfe.c main loop is now stuck here and will not process any periodic activity
such as updating the equipment statistics (event counters on the midas status page)
or running periodic equipments in the same front end program.
The crash at the end of run will be caused by a timeout in responding to the "end of run" RPC call.
I have a patch in testing that solves this problem by restoring receive_trigger_event() to the original configuration, i.e.
https://bitbucket.org/tmidas/midas/src/6899b96a4f8177d4af92035cd84aadf5a7cbc875/src/mfe.c?at=develop
K.O. |
1022
|
14 Oct 2014 |
Konstantin Olchanski | Bug Report | Hostile network scans against MIDAS RPC ports | At CERN I see a large number of hostile network scans that seem to be injecting HTTP requests into the
MIDAS RPC ports. So far, all these requests seem to be successfully rejected without crashing anything, but
they do clog up midas.log.
The main problem here is that all MIDAS programs have at least one TCP socket open where they listen for
RPC commands, such as "start of run", "please shutdown", etc. The port numbers of these sockets are
randomized and that makes them difficult to protect them with firewall rules (firewall rules like fixed port
numbers).
Note that this is different from the hostile network scans that I have first seen maybe 5 years ago that
affected the mserver main listener socket. Then, as a solution, I hardened the RPC receiver code against
bad data (and happy to see that this hardening is still holding up) and implemented the mserver "-A"
command switch to specify a list of permitted peers. Also mserver uses a fixed port number ("-p" switch)
and is easy to protect with firewall rules.
Since these ports cannot be protected by OS means (firewall, etc), we have to protect them in MIDAS.
One solution is to reject all connections from unauthorized peers.
One way to use this is to implement the "-A" switch to explicitely list all permitted peers, these switch will
ave to be added to all long running midas programs (mhttpd, mlogger, mfe.c, etc). Not very practical, IMO.
Another way is to read the list of permitted peers from ODB, at startup time, or each time a new connection
is made.
In the latter case, care needs to be taken to avoid deadlocks. For example remote programs that read ODB
through the mserver may deadlock if the same mserver is the one trying to establish the RPC connection.
Or if ODB is somehow locked.
NB - we already keep a list of permitted peers in ODB /Experiment/Security.
K.O. |
1023
|
14 Oct 2014 |
Konstantin Olchanski | Bug Report | Problem in mfe multithread equipments | For my reference:
good version: https://bitbucket.org/tmidas/midas/src/6899b96a4f8177d4af92035cd84aadf5a7cbc875/src/mfe.c?at=develop
first breakage: https://bitbucket.org/tmidas/midas/src/c60259d9a244bdcd296a8c5c6ab0b91de27f9905/src/mfe.c?at=develop
second breakage: https://bitbucket.org/tmidas/midas/src/45984c35b4f7257f90515f29116dec6fb46f2ebc/src/mfe.c?at=develop
The "first breakage" may actually be okey, because there the badnik loop loops over ring buffers, not infinite. But I cannot test it anymore.
K.O. |
1024
|
14 Oct 2014 |
Konstantin Olchanski | Bug Report | Problem with EQ_USER | If you use EQ_USER in mfe.c and have multiple threads writing into the ring buffer, you will have a big
problem - the thread locking in the ring buffer code only works for a single writer thread and a single
reader thread.
Presently, it is not clear how to have multiple multithreaded equipments inside one frontend.
During the Summer of 2013 code briefly existed in mfe.c to have an array of ring buffers and each
multithreaded equipment could write into it's own buffer.
But this code is now removed and mfe.c can only read from a single ring buffer and as I noted above, ring
buffer locking requires that only a single thread writes into it.
K.O. |
1025
|
14 Oct 2014 |
Stefan Ritt | Bug Report | Hostile network scans against MIDAS RPC ports | Doing this through the ODB seems ok to me. If the ODB cannot be accessed, you can fall back to no protection.
At PSI we fortunately do not have these network scans because PSI uses a institute-wide firewall. So you can connect from outside PSI to inside PSI only
on certain well-defined ports (like SSH to certain machines). You can do the same in Alpha. Use one computer as a router with two network cards, where
the DAQ network runs on the second card as a private network. Then program the routing tables in that gateway such that only certain ports can be
accessed from outside, like port 8080 to mhttpd. This way you block all except the things which are needed.
/Stefan |
1026
|
15 Oct 2014 |
Stefan Ritt | Bug Report | Problem in mfe multithread equipments | You are absolutely correct, the code is certainly wrong. It looks to me like the
while (rbh)
was put in there for some testing, and I forgot to remove it. The only thing I could imagine is that we want to have a while loop there for performance reason. Like
readout_start = ss_millitime();
while (ss_millitime - readout_start < (DWORD) eq_info->period) {
read event
return 0 if no event found
}
You find this code also in the check_polled_events() routine. It ensures that the routine does not return after every single event, but after the period defined in the
equipment (which is usually 100 ms for polled events). This way the code is more efficiently, since we do not check for RPC calls between every event, but just 10 times
per second. This way you can shovel more events through the system, while still being responsive to run stops.
I don't have any hardware right now to test this, so please put my code above into the routine and commit it if it works.
I notice also a difference in both codes concerning the read buffer handles. The old code uses rbh2, while the new (wrong) code uses rbh. In your case probably both
handles are the same, so it works, but in other experiments, which might use several ring buffers, it will fail. So please use rbh instead rbh2.
Let me know if it works for you, and if you see any difference in speed between the versions with and without the while loop (actually you will see this only if your trigger
rate maxes out the DAQ).
Cheers,
Stefan |
1027
|
15 Oct 2014 |
Stefan Ritt | Bug Report | Problem in mfe multithread equipments | Please disregard my previous posting, you don't need the while loop, since it's already in the scheduler (around lines 2160 under /*---- send interrupt events ----*/).
But now I remember the rationale behind it. The loop over the rb[i] is because in MEG I have n calibration threads, each one running on a separate CPU core. So the receive_trigger_event() routine has to collect events from all the
threads, each of them having one ring buffer. In the process of implementing EQ_USER, I changed this somehow, and apparently broke the code by making the while() loop looping forever if the event rate is over 100 Hz.
So for the moment please remove the while loop completely, and I will worry later of putting it back correctly when MEG will start again next year.
/Stefan |
1028
|
15 Oct 2014 |
Stefan Ritt | Bug Report | Problem with EQ_USER | Sure, each thread needs its own ring buffer for writing.
So I see that we need back the multiple-ring-buffer-readout-scheme even before MEG will start. So what you need is something like
for (i=0 ; rb[i] != 0 ; i++) {
read event from rb[i];
}
as it was before. What I do not like is that rb is a global variable, we should better use the encapsulation functions and extend get_event_rb() to
get_event_rb(i) so you can have n ring buffers.
Give me one day, I will extend the current code to make it work again and to implement N threads.
Cheers,
Stefan |
1029
|
16 Oct 2014 |
Stefan Ritt | Bug Report | Problem with EQ_USER | I restructured the front-end code to enable multiple readout threads for EQ_USER equipment. Last summer I was definitively interrupted during
that work and left it in an half finished state, sorry for that.
The way it works now is illustrated in mtfe.c. You create N ring buffers and N threads via
for (int i=0 ; i<N ; i++) {
create_event_rb(i);
ss_thread_create(trigger_thread, (void*)(PTYPE)i);
}
then each readout thread accesses its own readout buffer
thread(...)
{
index = (int)(PTYPE)param;
signal_readout_thread_active(index, TRUE);
rbh = get_event_rbh(index);
while (is_readout_thread_enabled()) {
... read event and put it into ring buffer ...
}
signal_readout_thread_active(index, FALSE);
}
The is_readout_thread_enabled() and signal_readout_thread_active() are used by the framework to shut down gracefully threads correct at the end
of the program. This way each thread can close any hardware correctly.
Note that no other thread management is done by the framework. In the old days with interrupt equipment, the framework disabled interrupts
when reading out periodic events, since that was necessary when using a single CAMAC crate for ADCs and scalers. This is obsolete now and not
needed any longer. It is now the responsibility of the user code to resolve hardware access conflicts between different threads (like using a local
mutex to access the same hardware). There is also no "readout when running" handling. If events should not be read out when the run is stopped,
the readout thread has to check to run status, or better the EOR routine should disable the hardware trigger and the BOR routine should re-enable
it. The readout threads will then poll for new events and just go to sleep if nothing is there.
I testes the mtfe.c program with 100 Hz and 1 MHz event rate on a dummy experiment (no hardware access) and it worked without problem.
Let me know if there is any issue left over.
/Stefan |
|