ID |
Date |
Author |
Topic |
Subject |
1038
|
12 Nov 2014 |
Robert Pattie | Forum | struct mismatch | Hi all,
I've started receiving the following error that I can't track down. Does
anyone have a suggestion for where to start looking for the cause of this?
[Analyzer,ERROR] [odb.c:9460:db_open_record,ERROR] struct size mismatch for "/"
(expected size: 576, size in ODB: 0)
This error prevents me from running two runs in a row. I have to close the DAQ
and restart to take multiple runs. Also it prevents me from running the analyzer
in offline mode.
I also noticed that several for the ODB directories no longer have the same html
format when viewed through the browser. I've attached a screen print of the
"/Logger/Channels" page.
Thanks,
Robert |
1037
|
06 Nov 2014 |
Stefan Ritt | Forum | Weird problem on new installation |
Razvan Stefan Gornea wrote: | In my case I have the following structure in ODB right before the framework calls frontend_init():
/Equipment/CAEN_V1740 [CLOSE]
/Equipment/CAEN_V1740/Variables [CLOSE]
/Equipment/CAEN_V1740/Common [OPEN]
/Equipment/CAEN_V1740/Statistics [OPEN]
/Equipmemt/CANE_V1740/Settings [CLOSE]
|
Sorry my late reply, but I could only find today to have a look at this.
It is absolutely ok to have the Common and Statistics subtrees in the ODB open. So if anybody modifies anything in the Common tree for example, the frontend gets notified directly via the hot link mechanism. Having a subtree "open" however means that the structure of that tree may not be changed, since it's directly mapped onto a fixed C structure. If you create a subtree via the db_create_record() function, you modify the structure of that tree, and thus it may not be open by other clients.
Your problem can be fixed if you create the /Equipment/CAEN_V1740/Settings tree (which is not open), instead the full /Equipment/CAEN_V1740 tree, which contains the open Common and Statistics subtrees.
Best regards,
Stefan |
1036
|
02 Nov 2014 |
Stefan Ritt | Forum | Running a frontend on Arduino Yun | > With the correct definition, you should get a compile error (type mismatch).
>
> With the wrong current definition, you should have gotten a warning about "use of uninitialized variable 'data'", but some compilers with some settings do not generate this warning.
I redefined the definition of the bk_create function to contain a void **pdate pointer, but that did not really help. Now I get a compiler error:
"Incompatible pointer type passing 'DWORD **' to parameter of type 'void **', so I need an explicit cast each time
bk_create(... (void **)&pdata);
But I think this is better than what we had before so I leave it. Please note that all front-ends using bk_create need to be modified accordingly to suppress this warning.
/Stefan |
1035
|
24 Oct 2014 |
Konstantin Olchanski | Forum | Running a frontend on Arduino Yun | > INT read_event(char *pevent, INT off)
> {
> WORD *data;
> bk_create(pevent, "TEM0", TID_WORD, data); // <= we are dieing at this line
> }
The declaration of bk_create() in midas.h is wrong:
void EXPRT bk_create(void *pbh, const char *name, WORD type, void *pdata);
should be
void EXPRT bk_create(void *pbh, const char *name, WORD type, void **pdata);
Notice the extra "*" in "void**pdata" to indicate that it takes a pointer to the pointer to the data.
With the correct definition, you should get a compile error (type mismatch).
With the wrong current definition, you should have gotten a warning about "use of uninitialized variable 'data'", but some compilers with some settings do not generate this warning.
As it is, without looking at an example (highly recommended) and reading documentation (do we even have a "frontend writing guide"?!?) you have
no way to tell if you should pass "data" or "&data" to bk_create().
Thank you for reporting this problem.
P.S. As for running on Arduino, for slow controls type application, any CPU and network speed should be okey,
but memory use is always a concern, so please speak up if you run into problems. We routinely run MIDAS frontends
on linux machines with 512M and 128M RAM (1GHz CPU, 100 and 1000 M/s ethernet).
K.O. |
1034
|
24 Oct 2014 |
Stefan Ritt | Forum | Running a frontend on Arduino Yun | > Hello,
>
> I'm currently trying to create a midas bank for basic temperature reading from the Arduino Yun, but when creating a bank the frontend crashed with a segfault, my
> code currently looks like this:
>
> INT read_event(char *pevent, INT off)
> {
> WORD *data;
> //printf("before init\n");
> bk_init(pevent);
> //printf("after init\n");
> bk_create(pevent, "TEM0", TID_WORD, data); // <= we are dieing at this line
> //printf("after create\n");
>
> bk_close(pevent, data);
>
> return bk_size(pevent);
> }
>
> Does anyone have an Idea how to tackle this problem down? running a debugger is a little bit tricky on a this processor..
>
> Thanks!
Two bugs:
bk_create(pevent, "TEMO0", TID_WORD, &data);
note the "&" in front of data. Then you have to increment the pointer for each byte you add to the bank:
*data = <temp>;
data++;
bk_close(pevent, data);
this way the bk_close() function know how much data you added to the bank.
Cheers,
Stefan |
1033
|
24 Oct 2014 |
Clemens Sauerzopf | Forum | Running a frontend on Arduino Yun | Hello,
I'm currently trying to create a midas bank for basic temperature reading from the Arduino Yun, but when creating a bank the frontend crashed with a segfault, my
code currently looks like this:
INT read_event(char *pevent, INT off)
{
WORD *data;
//printf("before init\n");
bk_init(pevent);
//printf("after init\n");
bk_create(pevent, "TEM0", TID_WORD, data); // <= we are dieing at this line
//printf("after create\n");
bk_close(pevent, data);
return bk_size(pevent);
}
Does anyone have an Idea how to tackle this problem down? running a debugger is a little bit tricky on a this processor..
Thanks! |
1032
|
16 Oct 2014 |
Stefan Ritt | Bug Report | Hostile network scans against MIDAS RPC ports | > Sometimes we have very small MIDAS installations, i.e. just one machine by itself, and such setups should be secure/secured easily -
> too much work to setup an external firewall box just for one machine and OS-level firewall rules sometimes conflict
> with some OS services (i.e. NIS) (I am still waiting for the "NIS to LDAP migration for dummies" guide).
I fully agree with you. So if you find time to implement this, I will be more than happy.
/Stefan |
1031
|
16 Oct 2014 |
Konstantin Olchanski | Bug Report | Hostile network scans against MIDAS RPC ports | > Doing this through the ODB seems ok to me. If the ODB cannot be accessed, you can fall back to no protection.
>
> At PSI we fortunately do not have these network scans because PSI uses a institute-wide firewall.
>
Same here at TRIUMF, no problems with hostile network activity. Only see this trouble at CERN. Nominally CERN also have
everything behind the CERN firewall, that is why I tend to think that I am seeing network scans done by CERN security people,
or some badniks on the CERN local network (PC malware, etc).
> So you can connect from outside PSI to inside PSI only
> on certain well-defined ports (like SSH to certain machines). You can do the same in Alpha. Use one computer as a router with two network cards, where
> the DAQ network runs on the second card as a private network. Then program the routing tables in that gateway such that only certain ports can be
> accessed from outside, like port 8080 to mhttpd. This way you block all except the things which are needed.
Yes, this is how we did it for DEAP at SNOLAB. No network trouble there.
But generically for MIDAS, I think we should have built-in capability for MIDAS to protect itself without reliance on OS-level means (local firewall)
or network-level means ("site firewalls").
Sometimes we have very small MIDAS installations, i.e. just one machine by itself, and such setups should be secure/secured easily -
too much work to setup an external firewall box just for one machine and OS-level firewall rules sometimes conflict
with some OS services (i.e. NIS) (I am still waiting for the "NIS to LDAP migration for dummies" guide).
K.O. |
1030
|
16 Oct 2014 |
Stefan Ritt | Bug Report | Problem in mfe multithread equipments | > while (1)
> wait 10 ms for an event
> process event, loop back
> if there is no event, exit
> }
This code has been rewritten now and should work for event rates >100 Hz.
/Stefan |
1029
|
16 Oct 2014 |
Stefan Ritt | Bug Report | Problem with EQ_USER | I restructured the front-end code to enable multiple readout threads for EQ_USER equipment. Last summer I was definitively interrupted during
that work and left it in an half finished state, sorry for that.
The way it works now is illustrated in mtfe.c. You create N ring buffers and N threads via
for (int i=0 ; i<N ; i++) {
create_event_rb(i);
ss_thread_create(trigger_thread, (void*)(PTYPE)i);
}
then each readout thread accesses its own readout buffer
thread(...)
{
index = (int)(PTYPE)param;
signal_readout_thread_active(index, TRUE);
rbh = get_event_rbh(index);
while (is_readout_thread_enabled()) {
... read event and put it into ring buffer ...
}
signal_readout_thread_active(index, FALSE);
}
The is_readout_thread_enabled() and signal_readout_thread_active() are used by the framework to shut down gracefully threads correct at the end
of the program. This way each thread can close any hardware correctly.
Note that no other thread management is done by the framework. In the old days with interrupt equipment, the framework disabled interrupts
when reading out periodic events, since that was necessary when using a single CAMAC crate for ADCs and scalers. This is obsolete now and not
needed any longer. It is now the responsibility of the user code to resolve hardware access conflicts between different threads (like using a local
mutex to access the same hardware). There is also no "readout when running" handling. If events should not be read out when the run is stopped,
the readout thread has to check to run status, or better the EOR routine should disable the hardware trigger and the BOR routine should re-enable
it. The readout threads will then poll for new events and just go to sleep if nothing is there.
I testes the mtfe.c program with 100 Hz and 1 MHz event rate on a dummy experiment (no hardware access) and it worked without problem.
Let me know if there is any issue left over.
/Stefan |
1028
|
15 Oct 2014 |
Stefan Ritt | Bug Report | Problem with EQ_USER | Sure, each thread needs its own ring buffer for writing.
So I see that we need back the multiple-ring-buffer-readout-scheme even before MEG will start. So what you need is something like
for (i=0 ; rb[i] != 0 ; i++) {
read event from rb[i];
}
as it was before. What I do not like is that rb is a global variable, we should better use the encapsulation functions and extend get_event_rb() to
get_event_rb(i) so you can have n ring buffers.
Give me one day, I will extend the current code to make it work again and to implement N threads.
Cheers,
Stefan |
1027
|
15 Oct 2014 |
Stefan Ritt | Bug Report | Problem in mfe multithread equipments | Please disregard my previous posting, you don't need the while loop, since it's already in the scheduler (around lines 2160 under /*---- send interrupt events ----*/).
But now I remember the rationale behind it. The loop over the rb[i] is because in MEG I have n calibration threads, each one running on a separate CPU core. So the receive_trigger_event() routine has to collect events from all the
threads, each of them having one ring buffer. In the process of implementing EQ_USER, I changed this somehow, and apparently broke the code by making the while() loop looping forever if the event rate is over 100 Hz.
So for the moment please remove the while loop completely, and I will worry later of putting it back correctly when MEG will start again next year.
/Stefan |
1026
|
15 Oct 2014 |
Stefan Ritt | Bug Report | Problem in mfe multithread equipments | You are absolutely correct, the code is certainly wrong. It looks to me like the
while (rbh)
was put in there for some testing, and I forgot to remove it. The only thing I could imagine is that we want to have a while loop there for performance reason. Like
readout_start = ss_millitime();
while (ss_millitime - readout_start < (DWORD) eq_info->period) {
read event
return 0 if no event found
}
You find this code also in the check_polled_events() routine. It ensures that the routine does not return after every single event, but after the period defined in the
equipment (which is usually 100 ms for polled events). This way the code is more efficiently, since we do not check for RPC calls between every event, but just 10 times
per second. This way you can shovel more events through the system, while still being responsive to run stops.
I don't have any hardware right now to test this, so please put my code above into the routine and commit it if it works.
I notice also a difference in both codes concerning the read buffer handles. The old code uses rbh2, while the new (wrong) code uses rbh. In your case probably both
handles are the same, so it works, but in other experiments, which might use several ring buffers, it will fail. So please use rbh instead rbh2.
Let me know if it works for you, and if you see any difference in speed between the versions with and without the while loop (actually you will see this only if your trigger
rate maxes out the DAQ).
Cheers,
Stefan |
1025
|
14 Oct 2014 |
Stefan Ritt | Bug Report | Hostile network scans against MIDAS RPC ports | Doing this through the ODB seems ok to me. If the ODB cannot be accessed, you can fall back to no protection.
At PSI we fortunately do not have these network scans because PSI uses a institute-wide firewall. So you can connect from outside PSI to inside PSI only
on certain well-defined ports (like SSH to certain machines). You can do the same in Alpha. Use one computer as a router with two network cards, where
the DAQ network runs on the second card as a private network. Then program the routing tables in that gateway such that only certain ports can be
accessed from outside, like port 8080 to mhttpd. This way you block all except the things which are needed.
/Stefan |
1024
|
14 Oct 2014 |
Konstantin Olchanski | Bug Report | Problem with EQ_USER | If you use EQ_USER in mfe.c and have multiple threads writing into the ring buffer, you will have a big
problem - the thread locking in the ring buffer code only works for a single writer thread and a single
reader thread.
Presently, it is not clear how to have multiple multithreaded equipments inside one frontend.
During the Summer of 2013 code briefly existed in mfe.c to have an array of ring buffers and each
multithreaded equipment could write into it's own buffer.
But this code is now removed and mfe.c can only read from a single ring buffer and as I noted above, ring
buffer locking requires that only a single thread writes into it.
K.O. |
1023
|
14 Oct 2014 |
Konstantin Olchanski | Bug Report | Problem in mfe multithread equipments | For my reference:
good version: https://bitbucket.org/tmidas/midas/src/6899b96a4f8177d4af92035cd84aadf5a7cbc875/src/mfe.c?at=develop
first breakage: https://bitbucket.org/tmidas/midas/src/c60259d9a244bdcd296a8c5c6ab0b91de27f9905/src/mfe.c?at=develop
second breakage: https://bitbucket.org/tmidas/midas/src/45984c35b4f7257f90515f29116dec6fb46f2ebc/src/mfe.c?at=develop
The "first breakage" may actually be okey, because there the badnik loop loops over ring buffers, not infinite. But I cannot test it anymore.
K.O. |
1022
|
14 Oct 2014 |
Konstantin Olchanski | Bug Report | Hostile network scans against MIDAS RPC ports | At CERN I see a large number of hostile network scans that seem to be injecting HTTP requests into the
MIDAS RPC ports. So far, all these requests seem to be successfully rejected without crashing anything, but
they do clog up midas.log.
The main problem here is that all MIDAS programs have at least one TCP socket open where they listen for
RPC commands, such as "start of run", "please shutdown", etc. The port numbers of these sockets are
randomized and that makes them difficult to protect them with firewall rules (firewall rules like fixed port
numbers).
Note that this is different from the hostile network scans that I have first seen maybe 5 years ago that
affected the mserver main listener socket. Then, as a solution, I hardened the RPC receiver code against
bad data (and happy to see that this hardening is still holding up) and implemented the mserver "-A"
command switch to specify a list of permitted peers. Also mserver uses a fixed port number ("-p" switch)
and is easy to protect with firewall rules.
Since these ports cannot be protected by OS means (firewall, etc), we have to protect them in MIDAS.
One solution is to reject all connections from unauthorized peers.
One way to use this is to implement the "-A" switch to explicitely list all permitted peers, these switch will
ave to be added to all long running midas programs (mhttpd, mlogger, mfe.c, etc). Not very practical, IMO.
Another way is to read the list of permitted peers from ODB, at startup time, or each time a new connection
is made.
In the latter case, care needs to be taken to avoid deadlocks. For example remote programs that read ODB
through the mserver may deadlock if the same mserver is the one trying to establish the RPC connection.
Or if ODB is somehow locked.
NB - we already keep a list of permitted peers in ODB /Experiment/Security.
K.O. |
1021
|
14 Oct 2014 |
Konstantin Olchanski | Bug Report | Problem in mfe multithread equipments | In the ALPHA experiment at CERN I found a problem in mfe.c handling of multithreaded equipments. This problem was in
some forms introduced around May 2013 and around Aug 2013 (commit
https://bitbucket.org/tmidas/midas/src/45984c35b4f7/src/mfe.c) (I hope I got it right).
The effect was very odd - if event rate of multithreaded equipment was more than 100 Hz, the event counters on the midas
status page would not increment and the frontend will crash on end of run. Other than that, all the events from the
multithreaded equipment seem to appear in the SYSTEM buffer and in the data file normally.
This happened: in mfe.c::receive_trigger_event() a loop was introduced (previously,
there was no loop there - there was and still is a loop outside of receive_trigger_event()):
while (1)
wait 10 ms for an event
process event, loop back
if there is no event, exit
}
Obviously, if the event rate is more than 100 Hz (repetition rate less than 10 ms),
the 10 ms wait will always return an event and we will never exit this loop.
So the mfe.c main loop is now stuck here and will not process any periodic activity
such as updating the equipment statistics (event counters on the midas status page)
or running periodic equipments in the same front end program.
The crash at the end of run will be caused by a timeout in responding to the "end of run" RPC call.
I have a patch in testing that solves this problem by restoring receive_trigger_event() to the original configuration, i.e.
https://bitbucket.org/tmidas/midas/src/6899b96a4f8177d4af92035cd84aadf5a7cbc875/src/mfe.c?at=develop
K.O. |
1020
|
08 Sep 2014 |
Clemens Sauerzopf | Forum | CAEN V1742 midas driver | Hello all,
As an addition to the driver functions I uploaded in this thread I would also have a
C++ class that handles everything for the V1742 modules and can be directly used
integrated into a C++ frontend.
I would like to ask if you have policy for user supplied code like this? It's not a low
level driver but a frontend module that reads and controls the module, creates odb
hotlinks and handles the bank creating and storing of the data.
Best regards,
Clemens
EDIT: the question is, do you like to have codes like this collected somewhere for
example this forum or would you prefer if I would post a link to some online repository = |
1019
|
06 Aug 2014 |
Konstantin Olchanski | Info | MIDAS high speed test | > We have tested operation of MIDAS using a 10GigE network connection. Using a dummy frontend
> generating fake data, we can record MIDAS data to disk at at least 700 Mbytes/sec as reported by
> the MIDAS status page.
>
> Details of the hardware:
>
> 1) the disk server machine CPU is 3.4GHz Intel i7-4770, mobo is ASUS Z87 WS (10 SATA, 2xGigE),
> RAM is 32GB DDR3-1600.
> 2) disk array is 8x4TB Seagate ST4000VN000-1H4168 NAS disks RAID0 (striped) configuration, raw
> data read/write rate is around 1 GByte/sec, disks are directly attached to mobo (no raid card), linux
> software raid.
>
These tests were done using a raid0 array (striped), which is not suitable for production use.
For production use, RAID5 and RAID6 is recommended. But their default configuration has severely reduced performance (50% of
RAID0) this is because internally the raid driver issues disk read operations that compete against and severely slow down the disk write
requests. This is easy to see with "iostat -x 1" - when writing to the raid array, there should be no reads from the disks. Following
changes are required to achieve maximum performance:
echo 32000 > /sys/block/md6/md/stripe_cache_size # increase internal memory buffers - because "raid write" is always "read-
modify-write", bigger buffers ensure that the reads are done from cache, not from phsyical disk
mdadm --grow --bitmap=/md6bitmap /dev/md6 # use external bitmap - if bitmap is internal, there is a large number of disk reads
competing against writes. external bitmap seems to help quite a bit.
With these settings, my RAID6 array can read and write at about 700-900 Mbytes/sec - this is comparable to RAID0 (minus 2 disks).
With this, I repeated the MIDAS performance tests - (but without 10GigE) - MIDAS can write 700 Mbytes/sec of fake data to a local
RAID6 data array. (hardware configuration is listed above).
K.O. |
|