Back Midas Rome Roody Rootana
  Midas DAQ System, Page 91 of 142  Not logged in ELOG logo
ID Datedown Author Topic Subject
  1036   02 Nov 2014 Stefan RittForumRunning a frontend on Arduino Yun
> With the correct definition, you should get a compile error (type mismatch).
> 
> With the wrong current definition, you should have gotten a warning about "use of uninitialized variable 'data'", but some compilers with some settings do not generate this warning.

I redefined the definition of the bk_create function to contain a void **pdate pointer, but that did not really help. Now I get a compiler error:

"Incompatible pointer type passing 'DWORD **' to parameter of type 'void **', so I need an explicit cast each time

bk_create(... (void **)&pdata);

But I think this is better than what we had before so I leave it. Please note that all front-ends using bk_create need to be modified accordingly to suppress this warning.

/Stefan
  1035   24 Oct 2014 Konstantin OlchanskiForumRunning a frontend on Arduino Yun
> INT read_event(char *pevent, INT off)
> {
>   WORD *data;
>   bk_create(pevent, "TEM0", TID_WORD, data); // <= we are dieing at this line
> }

The declaration of bk_create() in midas.h is wrong:

void EXPRT bk_create(void *pbh, const char *name, WORD type, void *pdata);
should be
void EXPRT bk_create(void *pbh, const char *name, WORD type, void **pdata);

Notice the extra "*" in "void**pdata" to indicate that it takes a pointer to the pointer to the data.

With the correct definition, you should get a compile error (type mismatch).

With the wrong current definition, you should have gotten a warning about "use of uninitialized variable 'data'", but some compilers with some settings do not generate this warning.

As it is, without looking at an example (highly recommended) and reading documentation (do we even have a "frontend writing guide"?!?) you have
no way to tell if you should pass "data" or "&data" to bk_create().

Thank you for reporting this problem.

P.S. As for running on Arduino, for slow controls type application, any CPU and network speed should be okey,
but memory use is always a concern, so please speak up if you run into problems. We routinely run MIDAS frontends
on linux machines with 512M and 128M RAM (1GHz CPU, 100 and 1000 M/s ethernet).

K.O.
  1034   24 Oct 2014 Stefan RittForumRunning a frontend on Arduino Yun
> Hello,
> 
> I'm currently trying to create a midas bank for basic temperature reading from the Arduino Yun, but when creating a bank the frontend crashed with a segfault, my
> code currently looks like this:
> 
> INT read_event(char *pevent, INT off)
> {
>   WORD *data;
>   //printf("before init\n");
>   bk_init(pevent);
>   //printf("after init\n");
>   bk_create(pevent, "TEM0", TID_WORD, data); // <= we are dieing at this line
>   //printf("after create\n");
> 
>   bk_close(pevent, data);
> 
>   return bk_size(pevent);
> }
> 
> Does anyone have an Idea how to tackle this problem down? running a debugger is a little bit tricky on a this processor..
> 
> Thanks!

Two bugs:

bk_create(pevent, "TEMO0", TID_WORD, &data);

note the "&" in front of data. Then you have to increment the pointer for each byte you add to the bank:

  *data = <temp>;
  data++;
  bk_close(pevent, data);

this way the bk_close() function know how much data you added to the bank.

Cheers,
Stefan
  1033   24 Oct 2014 Clemens SauerzopfForumRunning a frontend on Arduino Yun
Hello,

I'm currently trying to create a midas bank for basic temperature reading from the Arduino Yun, but when creating a bank the frontend crashed with a segfault, my
code currently looks like this:

INT read_event(char *pevent, INT off)
{
  WORD *data;
  //printf("before init\n");
  bk_init(pevent);
  //printf("after init\n");
  bk_create(pevent, "TEM0", TID_WORD, data); // <= we are dieing at this line
  //printf("after create\n");

  bk_close(pevent, data);

  return bk_size(pevent);
}

Does anyone have an Idea how to tackle this problem down? running a debugger is a little bit tricky on a this processor..

Thanks!
  1032   16 Oct 2014 Stefan RittBug ReportHostile network scans against MIDAS RPC ports
> Sometimes we have very small MIDAS installations, i.e. just one machine by itself, and such setups should be secure/secured easily -
> too much work to setup an external firewall box just for one machine and OS-level firewall rules sometimes conflict
> with some OS services (i.e. NIS) (I am still waiting for the "NIS to LDAP migration for dummies" guide).

I fully agree with you. So if you find time to implement this, I will be more than happy.

/Stefan
  1031   16 Oct 2014 Konstantin OlchanskiBug ReportHostile network scans against MIDAS RPC ports
> Doing this through the ODB seems ok to me. If the ODB cannot be accessed, you can fall back to no protection.
> 
> At PSI we fortunately do not have these network scans because PSI uses a institute-wide firewall.
>

Same here at TRIUMF, no problems with hostile network activity. Only see this trouble at CERN. Nominally CERN also have
everything behind the CERN firewall, that is why I tend to think that I am seeing network scans done by CERN security people,
or some badniks on the CERN local network (PC malware, etc).

> So you can connect from outside PSI to inside PSI only 
> on certain well-defined ports (like SSH to certain machines). You can do the same in Alpha. Use one computer as a router with two network cards, where 
> the DAQ network runs on the second card as a private network. Then program the routing tables in that gateway such that only certain ports can be 
> accessed from outside, like port 8080 to mhttpd. This way you block all except the things which are needed.

Yes, this is how we did it for DEAP at SNOLAB. No network trouble there.

But generically for MIDAS, I think we should have built-in capability for MIDAS to protect itself without reliance on OS-level means (local firewall)
or network-level means ("site firewalls").

Sometimes we have very small MIDAS installations, i.e. just one machine by itself, and such setups should be secure/secured easily -
too much work to setup an external firewall box just for one machine and OS-level firewall rules sometimes conflict
with some OS services (i.e. NIS) (I am still waiting for the "NIS to LDAP migration for dummies" guide).

K.O.
  1030   16 Oct 2014 Stefan RittBug ReportProblem in mfe multithread equipments
> while (1)
>    wait 10 ms for an event
>    process event, loop back
>    if there is no event, exit
> }


This code has been rewritten now and should work for event rates >100 Hz.

/Stefan
  1029   16 Oct 2014 Stefan RittBug ReportProblem with EQ_USER
I restructured the front-end code to enable multiple readout threads for EQ_USER equipment. Last summer I was definitively interrupted during 
that work and left it in an half finished state, sorry for that.

The way it works now is illustrated in mtfe.c. You create N ring buffers and N threads via

   for (int i=0 ; i<N ; i++) {
      create_event_rb(i);
      ss_thread_create(trigger_thread, (void*)(PTYPE)i);
   }

then each readout thread accesses its own readout buffer

thread(...)
{
   index = (int)(PTYPE)param;
   signal_readout_thread_active(index, TRUE);
   rbh = get_event_rbh(index);
   
   while (is_readout_thread_enabled()) {
      ... read event and put it into ring buffer ...
   }

   signal_readout_thread_active(index, FALSE);
}

The is_readout_thread_enabled() and signal_readout_thread_active() are used by the framework to shut down gracefully threads correct at the end 
of the program. This way each thread can close any hardware correctly.

Note that no other thread management is done by the framework. In the old days with interrupt equipment, the framework disabled interrupts 
when reading out periodic events, since that was necessary when using a single CAMAC crate for ADCs and scalers. This is obsolete now and not 
needed any longer. It is now the responsibility of the user code to resolve hardware access conflicts between different threads (like using a local 
mutex to access the same hardware). There is also no "readout when running" handling. If events should not be read out when the run is stopped, 
the readout thread has to check to run status, or better the EOR routine should disable the hardware trigger and the BOR routine should re-enable 
it. The readout threads will then poll for new events and just go to sleep if nothing is there.

I testes the mtfe.c program with 100 Hz and 1 MHz event rate on a dummy experiment (no hardware access) and it worked without problem.

Let me know if there is any issue left over.

/Stefan 
  1028   15 Oct 2014 Stefan RittBug ReportProblem with EQ_USER
Sure, each thread needs its own ring buffer for writing.

So I see that we need back the multiple-ring-buffer-readout-scheme even before MEG will start. So what you need is something like

for (i=0 ; rb[i] != 0 ; i++) {
  read event from rb[i];
}

as it was before. What I do not like is that rb is a global variable, we should better use the encapsulation functions and extend get_event_rb() to 
get_event_rb(i) so you can have n ring buffers.

Give me one day, I will extend the current code to make it work again and to implement N threads.

Cheers,
Stefan
  1027   15 Oct 2014 Stefan RittBug ReportProblem in mfe multithread equipments
Please disregard my previous posting, you don't need the while loop, since it's already in the scheduler (around lines 2160 under /*---- send interrupt events ----*/). 

But now I remember the rationale behind it. The loop over the rb[i] is because in MEG I have n calibration threads, each one running on a separate CPU core. So the receive_trigger_event() routine has to collect events from all the 
threads, each of them having one ring buffer. In the process of implementing EQ_USER, I changed this somehow, and apparently broke the code by making the while() loop looping forever if the event rate is over 100 Hz.

So for the moment please remove the while loop completely, and I will worry later of putting it back correctly when MEG will start again next year.

/Stefan
  1026   15 Oct 2014 Stefan RittBug ReportProblem in mfe multithread equipments
You are absolutely correct, the code is certainly wrong. It looks to me like the 

while (rbh)

was put in there for some testing, and I forgot to remove it. The only thing I could imagine is that we want to have a while loop there for performance reason. Like

readout_start = ss_millitime();
while (ss_millitime - readout_start < (DWORD) eq_info->period) {
  read event
  return 0 if no event found
}

You find this code also in the check_polled_events() routine. It ensures that the routine does not return after every single event, but after the period defined in the 
equipment (which is usually 100 ms for polled events). This way the code is more efficiently, since we do not check for RPC calls between every event, but just 10 times 
per second. This way you can shovel more events through the system, while still being responsive to run stops.

I don't have any hardware right now to test this, so please put my code above into the routine and commit it if it works.

I notice also a difference in both codes concerning the read buffer handles. The old code uses rbh2, while the new (wrong) code uses rbh. In your case probably both 
handles are the same, so it works, but in other experiments, which might use several ring buffers, it will fail. So please use rbh instead rbh2.

Let me know if it works for you, and if you see any difference in speed between the versions with and without the while loop (actually you will see this only if your trigger 
rate maxes out the DAQ).

Cheers,
Stefan
  1025   14 Oct 2014 Stefan RittBug ReportHostile network scans against MIDAS RPC ports
Doing this through the ODB seems ok to me. If the ODB cannot be accessed, you can fall back to no protection.

At PSI we fortunately do not have these network scans because PSI uses a institute-wide firewall. So you can connect from outside PSI to inside PSI only 
on certain well-defined ports (like SSH to certain machines). You can do the same in Alpha. Use one computer as a router with two network cards, where 
the DAQ network runs on the second card as a private network. Then program the routing tables in that gateway such that only certain ports can be 
accessed from outside, like port 8080 to mhttpd. This way you block all except the things which are needed.

/Stefan
  1024   14 Oct 2014 Konstantin OlchanskiBug ReportProblem with EQ_USER
If you use EQ_USER in mfe.c and have multiple threads writing into the ring buffer, you will have a big 
problem - the thread locking in the ring buffer code only works for a single writer thread and a single 
reader thread.

Presently, it is not clear how to have multiple multithreaded equipments inside one frontend.

During the Summer of 2013 code briefly existed in mfe.c to have an array of ring buffers and each 
multithreaded equipment could write into it's own buffer.

But this code is now removed and mfe.c can only read from a single ring buffer and as I noted above, ring 
buffer locking requires that only a single thread writes into it.
K.O.
  1023   14 Oct 2014 Konstantin OlchanskiBug ReportProblem in mfe multithread equipments
For my reference:
good version: https://bitbucket.org/tmidas/midas/src/6899b96a4f8177d4af92035cd84aadf5a7cbc875/src/mfe.c?at=develop
first breakage: https://bitbucket.org/tmidas/midas/src/c60259d9a244bdcd296a8c5c6ab0b91de27f9905/src/mfe.c?at=develop
second breakage: https://bitbucket.org/tmidas/midas/src/45984c35b4f7257f90515f29116dec6fb46f2ebc/src/mfe.c?at=develop

The "first breakage" may actually be okey, because there the badnik loop loops over ring buffers, not infinite. But I cannot test it anymore.
K.O.
  1022   14 Oct 2014 Konstantin OlchanskiBug ReportHostile network scans against MIDAS RPC ports
At CERN I see a large number of hostile network scans that seem to be injecting HTTP requests into the 
MIDAS RPC ports. So far, all these requests seem to be successfully rejected without crashing anything, but 
they do clog up midas.log.

The main problem here is that all MIDAS programs have at least one TCP socket open where they listen for 
RPC commands, such as "start of run", "please shutdown", etc. The port numbers of these sockets are 
randomized and that makes them difficult to protect them with firewall rules (firewall rules like fixed port 
numbers).

Note that this is different from the hostile network scans that I have first seen maybe 5 years ago that 
affected the mserver main listener socket. Then, as a solution, I hardened the RPC receiver code against 
bad data (and happy to see that this hardening is still holding up) and implemented the mserver "-A" 
command switch to specify a list of permitted peers. Also mserver uses a fixed port number ("-p" switch) 
and is easy to protect with firewall rules.

Since these ports cannot be protected by OS means (firewall, etc), we have to protect them in MIDAS.

One solution is to reject all connections from unauthorized peers.

One way to use this is to implement the "-A" switch to explicitely list all permitted peers, these switch will 
ave to be added to all long running midas programs (mhttpd, mlogger, mfe.c, etc). Not very practical, IMO.

Another way is to read the list of permitted peers from ODB, at startup time, or each time a new connection 
is made.

In the latter case, care needs to be taken to avoid deadlocks. For example remote programs that read ODB 
through the mserver may deadlock if the same mserver is the one trying to establish the RPC connection. 
Or if ODB is somehow locked.

NB - we already keep a list of permitted peers in ODB /Experiment/Security.

K.O.
  1021   14 Oct 2014 Konstantin OlchanskiBug ReportProblem in mfe multithread equipments
In the ALPHA experiment at CERN I found a problem in mfe.c handling of multithreaded equipments. This problem was in 
some forms introduced around May 2013 and around Aug 2013 (commit 
https://bitbucket.org/tmidas/midas/src/45984c35b4f7/src/mfe.c) (I hope I got it right).

The effect was very odd - if event rate of multithreaded equipment was more than 100 Hz, the event counters on the midas 
status page would not increment and the frontend will crash on end of run. Other than that, all the events from the 
multithreaded equipment seem to appear in the SYSTEM buffer and in the data file normally.

This happened: in mfe.c::receive_trigger_event() a loop was introduced (previously,
there was no loop there - there was and still is a loop outside of receive_trigger_event()):

while (1)
   wait 10 ms for an event
   process event, loop back
   if there is no event, exit
}

Obviously, if the event rate is more than 100 Hz (repetition rate less than 10 ms),
the 10 ms wait will always return an event and we will never exit this loop.

So the mfe.c main loop is now stuck here and will not process any periodic activity
such as updating the equipment statistics (event counters on the midas status page)
or running periodic equipments in the same front end program.

The crash at the end of run will be caused by a timeout in responding to the "end of run" RPC call.

I have a patch in testing that solves this problem by restoring receive_trigger_event() to the original configuration, i.e. 
https://bitbucket.org/tmidas/midas/src/6899b96a4f8177d4af92035cd84aadf5a7cbc875/src/mfe.c?at=develop

K.O.
  1020   08 Sep 2014 Clemens SauerzopfForumCAEN V1742 midas driver
Hello all,

As an addition to the driver functions I uploaded in this thread I would also have a
C++ class that handles everything for the V1742 modules and can be directly used
integrated into a C++ frontend. 

I would like to ask if you have policy for user supplied code like this? It's not a low
level driver but a frontend module that reads and controls the module, creates odb
hotlinks and handles the bank creating and storing of the data.

Best regards,
Clemens

EDIT: the question is, do you like to have  codes like this collected somewhere for
example this forum or would you prefer if I would post a link to some online repository =
  1019   06 Aug 2014 Konstantin OlchanskiInfoMIDAS high speed test
> We have tested operation of MIDAS using a 10GigE network connection. Using a dummy frontend 
> generating fake data, we can record MIDAS data to disk at at least 700 Mbytes/sec as reported by 
> the MIDAS status page.
>
> Details of the hardware:
> 
> 1) the disk server machine CPU is 3.4GHz Intel i7-4770, mobo is ASUS Z87 WS (10 SATA, 2xGigE), 
> RAM is 32GB DDR3-1600.
> 2) disk array is 8x4TB Seagate ST4000VN000-1H4168 NAS disks RAID0 (striped) configuration, raw 
> data read/write rate is around 1 GByte/sec, disks are directly attached to mobo (no raid card), linux 
> software raid.
>

These tests were done using a raid0 array (striped), which is not suitable for production use.

For production use, RAID5 and RAID6 is recommended. But their default configuration has severely reduced performance (50% of 
RAID0) this is because internally the raid driver issues disk read operations that compete against and severely slow down the disk write 
requests. This is easy to see with "iostat -x 1" - when writing to the raid array, there should be no reads from the disks. Following 
changes are required to achieve maximum performance:

echo 32000 > /sys/block/md6/md/stripe_cache_size # increase internal memory buffers - because "raid write" is always "read-
modify-write", bigger buffers ensure that the reads are done from cache, not from phsyical disk
mdadm --grow --bitmap=/md6bitmap /dev/md6 # use external bitmap - if bitmap is internal, there is a large number of disk reads 
competing against writes. external bitmap seems to help quite a bit.

With these settings, my RAID6 array can read and write at about 700-900 Mbytes/sec - this is comparable to RAID0 (minus 2 disks).

With this, I repeated the MIDAS performance tests - (but without 10GigE) - MIDAS can write 700 Mbytes/sec of fake data to a local 
RAID6 data array. (hardware configuration is listed above).

K.O.
  1018   06 Aug 2014 Clemens SauerzopfForumAdding Interrupt handling to SIS3100 driver
Hello Pierre-Andre,

thank you for your help with the interrupt handling. To close this case I'll
attach my interrupt
handling code for the SIS 3100 to this post as a reference. Maybe someone wants
to do something
similar in the future. 

I've decide to go for a C++ frontend therefore it is a class that handles
everything. The user only
has to provide a function pointer to the constructor that handles the interrupt
bitmask. The
interrupt handling is done with a timedwait within a separate thread. 

Cheers,
Clemens

> Hello Clemens,
> 
> The hardware readout is triggered by the interrupt within this thread. The
main thread poll on  
> the data availability (from the rb) to filter/compose the frontend event.
> In a similar multi-threaded implementation presently used in a dark matter
experiment we start 
> as many thread as necessary to constantly poll on the hardware for "data
fragment" collection.
> The event composition is done in the main thread through polling on the RBs.
> 
> Depending on the trigger rate and readout time, we can afford to analyze the
data fragment at 
> the thread level and add computed/summary information to the ring buffer on a
event-by-event 
> basis. This facilitate the overall event filtering happening later on in our
event builder. 
> 
> "polling the trigger signal?", I don't understand. You can poll on the trigger
condition but 
> then you don't need interrupt.
> 
> The original Midas interrupt implementation was to let the interrupt function
set a acknowledge 
> flag which is picked up by the standard midas polling function (user code) for
triggering the 
> readout. This method ensure a minimal time spent in the IRQ and works fine for
a single thread.
> 
> In regards of the CAEN V1742, we do have a VME driver for it, but it hasn't
been added to the 
> Midas yet (quite recent), but please don't hesitate to send us a copy.
> 
> Cheers, PAA
> 
  1017   16 Jul 2014 Clemens SauerzopfForumCAEN V1742 midas driver
Hello all, 

as discussed in the thread about Interrupt triggered readout
(https://midas.triumf.ca/elog/Midas/1016) I send you out driver for the CAEN
V1742 modules.

The code is separated into two different parts, first the real midas driver
(attachment 1).
Here the non trivial part is reading the modules internal flash pages to get to
correction patterns for the DRS4 chips, this is not documented in the manual.

The functions to apply the correction patters to the data is in the second
archive (attachment 2). I have to say this is C++ code as we use this with rootana.

The driver including the signal correction was used for data taking in 2012 with
4 synchronized V1742 modules for Antihydrogen experiment by the ASACUSA
collaboration at cern. We'll use it gain this year.

I hope the archives contain all necessary information, some parts were
distributed in various files..

Cheers,
Clemens

EDIT: the driver is based on the v1740 driver
ELOG V3.1.4-2e1708b5