Back Midas Rome Roody Rootana
  Midas DAQ System, Page 73 of 143  Not logged in ELOG logo
ID Date Author Topicdown Subject
  1838   20 Feb 2020 Konstantin OlchanskiForumWritting Midas Events via FPGAs
> rb_xxx functions are midas event agnostic. The receiving side in mfe.cxx (lines 1418 in receive_trigger_event) however pulls one event at a time. If you 
> have some inconsistency I would put some debugging code there.

I agree with Stefan, I do not think there is any bugs in the ring buffer code.

But. I do not think we ever did DMA the data directly into the ring buffer. Hmm...

I just checked, this is what we do (and this worked in the ALPHA Si-strip DAQ system for 10 years now):

- mfe.cxx multithread equipment
- mfe readout thread grabs pointer from ring buffer
- mfe creates event headers, etc
- calls our read_event() function
- creates data bank
- DMA data into the data bank (this is the DMA from VME block reads, using DMA controller inside the UniverseII and tsi148 VME-to-PCI bridges)
- close data bank
- return to mfe
- mfe readout thread increments the ring buffer
- mfe main thread grabs events from ring buffer, sends them to the mserver

So there could be trouble:
a) the ring buffer code does not have the required "volatile" (ahem, "atomic") annotations, so DMA may have a bad interaction with compiler optimizations (values stored in registers 
instead of in memory, etc)
b) the DMA driver must doctor the memory settings to (1) mark the DMA target memory uncachable or (1b) invalidate the cache after DMA completes, (2) mark the DMA target 
memory unswappable.

So I see possibilities for the ring buffer to malfunction.

But now I am curious, which DMA controller you use? The Altera or Xilinx PCIe block with the vendor supplied DMA driver? Or you do DMA on an ARM SoC FPGA? (no PCI/PCIe, 
different DMA controller, different DMA driver).

I am curious because we will be implementing pretty much what you do on ARM SoC FPGAs pretty soon, so good to know
if there is trouble to expect.

But I will probably use the tmfe.h c++ frontend and a "pure c++" ring buffer instead of mfe.cxx and the midas "rb" ring buffer.

(I did not look at your code at all, there could be a bug right there, this ring buffer stuff is tricky. With luck there is no bug
in your dma driver. The dma drivers for our vme bridges did do have bugs).

K.O.
  1839   20 Feb 2020 Konstantin OlchanskiForumDifference between "Event Data Size" and "All Bank Size"
> Thanks for pointing out this error. The "All Bank Size" contains the size of all banks including their 
> bank headers, but NOT the global bank header itself. I modified the documentation accordingly.
> 
> If you want to study the C code which tells you how to fill these headers, look at midas.cxx line 
> 14788.

Also take a look at the midas event parser in ROOTANA midasio.cxx, the code is pretty clean c++
https://bitbucket.org/tmidas/rootana/src/master/libMidasInterface/midasio.cxx

But Stefan's code in midas.cxx and in the documentation is the authoritative information.

K.O.
  1841   20 Feb 2020 Marius KoeppelForumWritting Midas Events via FPGAs
We also agree and found the problem now. Since we build everything (MIDAS Event Header, Bank Header, Banks etc.) in the FPGA we had some struggle with the MIDAS data format (http://lmu.web.psi.ch/docu/manuals/bulk_manuals/software/midas195/html/AppendixA.html). We thought that only the MIDAS Event needs to be aligned to 64 bit but as it turned out also the bank data (Stefan updated the wiki page already) needs to be aligned. Since we are using the BANK32 it was a bit unclear for us since the bank header is not 64 bit aligned. But we managed this now by adding empty data and the system is running now.

Our setup looks like this:

Software:
- mfe.cxx multithread equipment
- mfe readout thread grabs pointer from dma ring buffer 
- since the dma buffer is volatile we do copy_n for transforming the data to MIDAS 
- the data is already in the MIDAS format so done from our side :)
- mfe readout thread increments the ring buffer
- mfe main thread grabs events from ring buffer, sends them to the mserver

Firmware:
- Arria 10 development board
- Altera PCIe block
- Own DMA engine since we are doing burst writing DMA with PCIe 3.0. 
- Own device driver
- no interrupts 

If you have more questions fell free to ask.
  1842   20 Feb 2020 Stefan RittForumWritting Midas Events via FPGAs
Actually the cause of all of the is a real bug in the midas functions. We want each bank 8-byte aligned, so there is code in bk_close like:

midas.cxx:14788:
     ((BANK_HEADER *) event)->data_size += sizeof(BANK32) + ALIGN8(pbk32->data_size);

While the old sizeof(BANK)=8, the extended sizeof(BANK32)=12, so not 8-byte aligned. This code should rather be:

     ((BANK_HEADER *) event)->data_size += ALIGN8(sizeof(BANK32) +pbk32->data_size);

But if we change that, it would break every midas data file on this planet!

The only chance I see is to use the "flags" in the BANK_HEADER to distinguish a current bank from a "correct" bank. 
So we could introduce a flag BANK_FORMAT_ALIGNED which distinguishes between the two pieces of code above. 
Then bk_iterate32 would look at that flag and do the right thing.

Any thoughts?

Best,
Stefan
  1843   21 Feb 2020 Konstantin OlchanskiForumWritting Midas Events via FPGAs
Hi, Stefan - is this our famous 64-bit misalignement? Where we have each alternating bank aligned and misaligned at 64 bits? Without changing the data
format, one can always store data in 64-bit aligned banks by inserting a dummy banks between real banks:

event header
bank header
bank1 --- 64-bit aligned --- with data
bank2 --- misaligned, no data
bank3 --- 64-bit aligned --- with data
bank4 --- misaligned, no data
...

for sure, wastes space for bank2, bank4, etc, but at 12 bytes per bank, maybe this is negligible overhead compared to total event size.

BTW, aligned-to-64-bit is old news. The the PWB FPGA, I have 128-bit data paths to DDR RAM, the data has to be aligned to 128 bits, or else!

K.O.



> Actually the cause of all of the is a real bug in the midas functions. We want each bank 8-byte aligned, so there is code in bk_close like:
> 
> midas.cxx:14788:
>      ((BANK_HEADER *) event)->data_size += sizeof(BANK32) + ALIGN8(pbk32->data_size);
> 
> While the old sizeof(BANK)=8, the extended sizeof(BANK32)=12, so not 8-byte aligned. This code should rather be:
> 
>      ((BANK_HEADER *) event)->data_size += ALIGN8(sizeof(BANK32) +pbk32->data_size);
> 
> But if we change that, it would break every midas data file on this planet!
> 
> The only chance I see is to use the "flags" in the BANK_HEADER to distinguish a current bank from a "correct" bank. 
> So we could introduce a flag BANK_FORMAT_ALIGNED which distinguishes between the two pieces of code above. 
> Then bk_iterate32 would look at that flag and do the right thing.
> 
> Any thoughts?
> 
> Best,
> Stefan
  1844   21 Feb 2020 Konstantin OlchanskiForumWritting Midas Events via FPGAs
> We also agree and found the problem now.

Good. what was wrong?

> - Own DMA engine since we are doing burst writing DMA with PCIe 3.0. 
> - Own device driver

Scary stuff.

> - no interrupts 

Right. Best I can tell, interrupts no longer useful in Linux - interrupt handler cannot do any real work, has to hand off to a kernel thread, resulting
in so much latency and overhead that one might as well poll for the data... And for DMA data transfers, the data rate is well known,
so easy to predict how long the DMA will run for and sleep for that amount of time instead of waiting for an interrupt.

K.O.
  1845   21 Feb 2020 Stefan RittForumWritting Midas Events via FPGAs
> Hi, Stefan - is this our famous 64-bit misalignement? Where we have each alternating bank aligned and misaligned at 64 bits? Without changing the data
> format, one can always store data in 64-bit aligned banks by inserting a dummy banks between real banks:
> 
> event header
> bank header
> bank1 --- 64-bit aligned --- with data
> bank2 --- misaligned, no data
> bank3 --- 64-bit aligned --- with data
> bank4 --- misaligned, no data
> ...
> 
> for sure, wastes space for bank2, bank4, etc, but at 12 bytes per bank, maybe this is negligible overhead compared to total event size.
> 
> BTW, aligned-to-64-bit is old news. The the PWB FPGA, I have 128-bit data paths to DDR RAM, the data has to be aligned to 128 bits, or else!

Ok, so what about the following: When we do a bk_init32, we add a parameter "alignment", which might be 1,4,8,16 and "old". We store this alignment in the bank header, so the 
decoding works correctly. Now "old" means the current encoding, which is screwed up and produces the results you mention above, but we have to keep it (actually make it the 
default!) for backward compatibility. But then we can ask for 64-bit alignment or even 128-bit alignment if that helps the DAQ speed.

The only problem I see is if one writes data with the new library using 128-bit alignment for example, and wants to read it back with old code. Then it would explode. So if we 
make this modification, we have to announce it carefully and also adjust all ROOTANA & Co libraries to read back any midas data.

Stefan
  Draft   23 Feb 2020 Marius KoeppelForumWritting Midas Events via FPGAs
> > We also agree and found the problem now.
> 
> Good. what was wrong?
> 
> > - Own DMA engine since we are doing burst writing DMA with PCIe 3.0. 
> > - Own device driver
> 
> Scary stuff.
> 
> > - no interrupts 
> 
> Right. Best I can tell, interrupts no longer useful in Linux - interrupt handler cannot do any real work, has to hand off to a kernel thread, resulting
> in so much latency and overhead that one might as well poll for the data... And for DMA data transfers, the data rate is well known,
> so easy to predict how long the DMA will run for and sleep for that amount of time instead of waiting for an interrupt.
> 
> K.O.

So the problem was that we assumed that the bank (with the header) needs to be 64bit aligned. Even more we aligned the hole Midas event to 256bit in the fpga since we have a  250mhz x 256 Bit interface for PCIe. But then we saw that you align the bank data to 64bit -> crash of mdump etc. For now we generate the data on the FPGA in the „old“ Midas format. So having a flag for changing to a different alignment would be actually really nice. 

Cheers,
Marius
  1849   06 Mar 2020 Lars MartinForumRPC error
I ported a bunch of frontends to C++ and now I'm occasionally getting this RPC 
error message:

http error: readyState: 4, HTTP status: 502 (Proxy Error), batch request: method: 
"db_get_values", params: [object Object], id: 1583456958869 method: "get_alarms", 
params: null, id: 1583456958869 method: "cm_msg_retrieve", params: [object 
Object], id: 1583456958869 method: "cm_msg_retrieve", params: [object Object], 
id: 1583456958869

I'm assuming I'm doing wrong something somewhere, but does this message contain 
information where to look? Does the ID mean something?
  1850   08 Mar 2020 Konstantin OlchanskiForumRPC error
I do not see this error, but there was one more report (they did not clearly say what http errors 
they see) https://bitbucket.org/tmidas/midas/issues/209/get-rid-of-mjsonrpc-dialogs-put-it-to-
the

To debug this, I need to know: what version of MIDAS, what version of what web browser, what 
computer is mhttpd running on? (so I can go look at the log files).

Also can you say more when you see these errors? Every time from every midas web page, or only 
some pages or only when you do something specific (push some button, etc?).

> I ported a bunch of frontends to C++ and now I'm occasionally getting this RPC 
> error message:
> 
> http error: readyState: 4, HTTP status: 502 (Proxy Error), batch request: method: 
> "db_get_values", params: [object Object], id: 1583456958869 method: "get_alarms", 
> params: null, id: 1583456958869 method: "cm_msg_retrieve", params: [object 
> Object], id: 1583456958869 method: "cm_msg_retrieve", params: [object Object], 
> id: 1583456958869
> 
> I'm assuming I'm doing wrong something somewhere, but does this message contain 
> information where to look? Does the ID mean something?

It is unlikely that this error has anything to do with the frontends: usually web page interaction 
goes through: web browser - network - apache httpd - localhost - mhttpd - midas odb.

http error 502 is very generic, does not tell us much about what happened, there may be more 
information in the httpd log files.

the json-rpc request "id" is generated by midas code in the web browser and it currently is a 
timestamp. it is not used for anything. but it is required by the json-rpc standard.

K.O.


K.O.
  1859   23 Mar 2020 Ivo SchulthessForumSave data to FTP
Dear all
I try to save data to an FTP server but don't get any data on the server. Midas does not complain or message any error but also nothing gets saved. Does somebody have experience with this? I use the following settings for the ODB mlogger channel settings: Type: FTP, Filename: server.com, 21, user, pw, ., run%06d.mid, Format: MIDAS, Output: FILE. What would be the Output: FTP setting for? I tried this but it does not work at all. 
Thanks in advance,
Ivo
  1860   23 Mar 2020 Konstantin OlchanskiForumSave data to FTP
> I try to save data to an FTP server but don't get any data on the server. Midas does not complain or message any error but also nothing gets saved. Does somebody have experience with this? I use the following settings for the ODB mlogger channel settings: Type: FTP, Filename: server.com, 21, user, pw, ., run%06d.mid, Format: MIDAS, Output: FILE. What would be the Output: FTP setting for? I tried this but it does not work at all. 

Hi, Ivo, good to hear from a midas user in these difficult times.

We do not use FTP at TRIUMF, but Stefan asked us to keep FTP alive and working, so we should be able
to get you going. I will try to find the FTP instructions for you, I am pretty sure I have them somewhere.

In the mean time, I am very curious why you are using a FTP to record data, is it some kind
of data appliance where simplest input for data is FTP? Using NFS does not work or is too hard?

Also for example at CERN, we write data to Castor and EOS, for this mlogger writes data to local disk,
then the lazylogger runs a script to move the data to Castor and EOS. The example lazylogger
scripts for this are in the MIDAS "progs" directory. But maybe you do not have a local disk and this would
not work for you.

In other news, I hope to work on mlogger and lazylogger support for cloud storage (swift and s3 apis?),
would that be useful as replacement for FTP?

K.O.
  1861   24 Mar 2020 Ivo SchulthessForumSave data to FTP
> > I try to save data to an FTP server but don't get any data on the server. Midas does not complain or message any error but also nothing gets saved. Does somebody have experience with this? I use the following settings for the ODB mlogger channel settings: Type: FTP, Filename: server.com, 21, user, pw, ., run%06d.mid, Format: MIDAS, Output: FILE. What would be the Output: FTP setting for? I tried this but it does not work at all. 
> 
> Hi, Ivo, good to hear from a midas user in these difficult times.
> 
> We do not use FTP at TRIUMF, but Stefan asked us to keep FTP alive and working, so we should be able
> to get you going. I will try to find the FTP instructions for you, I am pretty sure I have them somewhere.
> 
> In the mean time, I am very curious why you are using a FTP to record data, is it some kind
> of data appliance where simplest input for data is FTP? Using NFS does not work or is too hard?
> 
> Also for example at CERN, we write data to Castor and EOS, for this mlogger writes data to local disk,
> then the lazylogger runs a script to move the data to Castor and EOS. The example lazylogger
> scripts for this are in the MIDAS "progs" directory. But maybe you do not have a local disk and this would
> not work for you.
> 
> In other news, I hope to work on mlogger and lazylogger support for cloud storage (swift and s3 apis?),
> would that be useful as replacement for FTP?
> 
> K.O.
>
Good Morning Konstantin

Thanks for the fast reply.  Yes, it is, Midas is one of the things we can at least improve from home. 

Our experiment is planned to measure (soon) at ILL. Now since we don't use the equipment/detector from the 
beamline but our own, all the data from Midas is saved on the local drive. This is fine in the first instance
but then we also need proper backup. Since our experiment is quite small, the easiest solution I came up with
is to copy all of our data to the ILL storage which has enough space and is properly backed up. The ILL data 
storage allows only SFTP connections, nothing else. Since Midas has the FTP feature, having a separate FTP 
logger channel seemed the easiest way to go. 

Thanks for your input, I will look into how to mount SFTP and then this would also be a solution. 

Since ILL only provides access via SFTP and everything else is not existent or blocked (not even ssh is possible),
this is the only thing we can work with by now. 

Best regards,
Ivo
  1862   24 Mar 2020 Stefan RittForumSave data to FTP
Logging directly from the midas logger to FTP is a bit cumbersome. In case of delays during login etc. this can throttle the whole DAQ chain. 
What we use in our lab is to write to local disk, then use the lazylogger (https://midas.triumf.ca/MidasWiki/index.php/Lazylogger) to copy the 
local files to a remote FTP server. This way we de-couple data taking from backup, making the system much more swift.

Best,
Stefan
  1863   24 Mar 2020 Ivo SchulthessForumSave data to FTP
> Logging directly from the midas logger to FTP is a bit cumbersome. In case of delays during login etc. this can throttle the whole DAQ chain. 
> What we use in our lab is to write to local disk, then use the lazylogger (https://midas.triumf.ca/MidasWiki/index.php/Lazylogger) to copy the 
> local files to a remote FTP server. This way we de-couple data taking from backup, making the system much more swift.
> 
> Best,
> Stefan

Yes, see this now too. I will, therefore, try to set up the lazylogger properly. 
  1864   24 Mar 2020 Konstantin OlchanskiForumSave data to FTP
> 
> Since ILL only provides access via SFTP and everything else is not existent or blocked (not even ssh is possible),
> this is the only thing we can work with by now. 
> 

Oops. SFTP != FTP.

SFTP uses SSH for data transport, so we cannot do it directly from C++ code in MIDAS. (we could use libssh, etc, but...)

I suggest you use lazylogger with the lazy_dache script, replace "dccp" with "sftp", replace "nsls" with an sftp "ls" command.

If you get it working, please consider contributing your lazylogger script to midas. (and does not have to be written in perl, python should work equally well).

For setting up lazylogger with the script method, I am pretty sure I posted the instructions to the forum (ages ago),
let me know if you cannot find them.

Good luck.

K.O.
  1865   25 Mar 2020 Andreas SuterForummlogger: misleading error messages for ROOT
Dear All,

At our experiment we write ROOT files. When starting/stopping runs we get the following error messages:

[Logger,ERROR] [mlogger.cxx:3358:root_write,ERROR] Cannot write system event into ROOT file, event_id 0xffff8000

[Logger,ERROR] [mlogger.cxx:3358:root_write,ERROR] Cannot write system event into ROOT file, event_id 0xffff8001

Looking into the source code I found that log_write (line 4248) sends these Midas System Events (BOR,EOR) to root_write without filtering them. root_write() checks in a first step if it gets such Midas System Events and if yes, moans.

Wouldn't it be better just to filter these events in log_write, before calling root_write, avoiding unnecessary error messages?

Is there something I miss?

Thanks,
  Andreas
  1866   25 Mar 2020 Konstantin OlchanskiForummlogger: misleading error messages for ROOT
> [Logger,ERROR] [mlogger.cxx:3358:root_write,ERROR] Cannot write system event into ROOT file, event_id 0xffff8000

Hi, Andreas, please open a bug report for this problem on bitbucket, there is now at least 2 bugs against
the ROOT writer (some events are written in duplicate sometimes), and I hope to fix this next time i review
the mlogger (RSN!). Biggest problem is that I do not use the ROOT output myself, so I have no way
to know if ROOT files produced by mlogger are correct or make sense. (without setting up some kind
of test environment with a ROOT file reader.

Thank you for reporting this problem here, so more people know about it.

If somebody has a patch to fix this, please send it in!

K.O.
  1867   27 Mar 2020 Stefan RittForummlogger: misleading error messages for ROOT
Dear simplest solution seems to me to just remove the error message generation and silently ignore the BOE EOR events. 

Committed that change.

Stefan
  1868   27 Mar 2020 Andreas SuterForummlogger: misleading error messages for ROOT
Hi Stefan,

I think this only partially resolves the issue, in log_write:

#ifdef HAVE_ROOT
   } else if (log_chn->format == FORMAT_ROOT) {
      status = root_write(log_chn, pevent, pevent->data_size + sizeof(EVENT_HEADER));
#endif
   }

   actual_time = ss_millitime();
   if ((int) actual_time - (int) start_time > 3000)
      cm_msg(MINFO, "log_write", "Write operation on \'%s\' took %d ms", log_chn->path.c_str(), actual_time - start_time);

   if (status != SS_SUCCESS && !stop_requested) {
      cm_msg(MTALK, "log_write", "Error writing output file, stopping run");
      cm_msg(MERROR, "log_write", "Cannot write \'%s\', error %d, stopping run", log_chn->path.c_str(), status);
      stop_the_run(0);

      return status;
   }

In your solution root_write returns quietly but status == SS_INVALID_FORMAT (not SS_SUCCESS) and hence I get another misleading error message "Error writing output file, stopping run".

In order to prevent this you also would need to change the return value to SS_SUCCESS.
ELOG V3.1.4-2e1708b5