Back Midas Rome Roody Rootana
  Midas DAQ System, Page 64 of 155  Not logged in ELOG logo
ID Date Authordown Topic Subject
  2864   08 Oct 2024 Mark GrimesBug ReportDifficulty running MIDAS on Rocky 9.4
We run Midas with no problems on Rocky 9.4, although not in a virtual machine.  We're very close to the head of `develop`.

I'm fairly sure I've seen an error like this before.  I didn't pay it much attention because it was transitory and I was doing something weird at the time - probably 
stepping through with a debugger and hit a timeout.  It was definitely about an ODB semaphore but I can't recall if it was about a recursive call.

Basically I think Rocky 9.4 is a red herring.  Do you have another crashed copy of mserver/mhttpd running somewhere and stuck in limbo?  If it's a virtual machine 
are you sharing the shared memory location with the host, and running another midas on there?


> > We're trying to install the SuperCDMS version of MIDAS on a Rocky 9.4 Virtual 
> > Machine and are getting a persistent error when we run mserver.
> >
> > [mserver,ERROR] [odb.cxx:2498:db_lock_database,ERROR] cannot lock ODB semaphore, 
> > timeout 10000 ms, exiting...
> > db_lock_database: Detected recursive call to db_{lock,unlock}_database() while 
> > already inside db_{lock,unlock}_database(). Maybe this is a call from a signal 
> > handler. Cannot continue, aborting...
> > Aborted (core dumped)
> 
> This is super very bad. Since you have a core dump, please post the stack trace here (or email it to me).
> 
> I probably cannot debug your private version of midas and I will recommend that you install and run vanilla midas 
> mserver (just while we debug this problem).
> 
> Let's look at the core dump stack trace first, but likely we see a problem with System-V semaphores and hopefully it 
> is not some breakage due to Red Hat bogosity or due to something specific to running on a virtual machine.
> 
> If indeed this is Linux-kernel level breakage of System-V semaphores, solution would be to start using Posix 
> semaphores, something I wanted to do for a long time. We already switched MIDAS shared memory from System-V to Posix 
> shared memory.
> 
> If we are lucky it is just one more crasher bug in ODB. Let's see that core dump stack trace.
> 
> K.O.
  2942   12 Feb 2025 Mark GrimesForumTMFeRpcHandlerInterface::HandleEndRun when running offline on a Midas file
Hi,
I have a manalyzer that uses a derived class of TMFeRpcHandlerInterface to communicate information to 
Midas during online running.  At the end of each run it saves out custom data in the 
TMFeRpcHandlerInterface::HandleEndRun override. This works really well.
However, when I run offline on a Midas output file the HandleEndRun method is never called and my data is 
never saved.  Is this intentional?  I understand that there is no point for the HandleBinaryRpc method offline, 
but the other methods (HandleEndRun, HandleBeginRun etc) could serve a purpose.  Or is it a conscious 
choice to ignore all of TMFeRpcHandlerInterface when offline?

Thanks,

Mark.
  2997   25 Mar 2025 Mark GrimesForumTMFeRpcHandlerInterface::HandleEndRun when running offline on a Midas file
Hi,
The question was about the TMFeRpcHandlerInterface, not the TARunObject interface.  Derived classes of TARunObject do indeed work as expected in our 
environment.  We have worked around the issue by using an implementation of TARunObject as well as the (separate) implementation of 
TMFeRpcHandlerInterface.

Thanks,

Mark.

> > I have a manalyzer that uses a derived class of TMFeRpcHandlerInterface to communicate information to 
> > Midas during online running.  At the end of each run it saves out custom data in the 
> > TMFeRpcHandlerInterface::HandleEndRun override. This works really well.
> > However, when I run offline on a Midas output file the HandleEndRun method is never called and my data is 
> > never saved.  Is this intentional?  I understand that there is no point for the HandleBinaryRpc method offline, 
> > but the other methods (HandleEndRun, HandleBeginRun etc) could serve a purpose.  Or is it a conscious 
> > choice to ignore all of TMFeRpcHandlerInterface when offline?
> 
> apologies for delayed response.
> 
> I saw the question, completely did not understand it, only now got around to figure out what is going on.
> 
> according to manalyzer/README.md, section "manalyzer module and object life time", BeginRun() and EndRun() is called 
> always. Offline and online. What you see would be a bug that we do not see in our environment. I confirm this by 
> running manalyzer in a demo mode: ./bin/manalyzer_test.exe  --demo -e10 -t
> 
> no, wait, you say you use HandleBeginRun() and HandleEndRun(). this is not right, they are not part of the manalyzer 
> API and they indeed are only used when running online.
> 
> correct solution would be to use BeginRun() and EndRun() instead of HandleBeginRun() and HandleEndRun().
> 
> you could also save your data in the module destructor (although good programming recommendation is to use 
> destructor only for unavoidable things, like freeing memory, etc).
> 
> K.O.
  3005   26 Mar 2025 Mark GrimesForumTMFeRpcHandlerInterface::HandleEndRun when running offline on a Midas file
This was exactly the question, should I expect it to run?  There's no point in the HandleBinaryRpc method offline, but there's an argument that the HandleBeginRun/HandleEndRun methods have a use.
I have the answer and we have a workaround, thanks.

> then I do not understand the question. TMFeRpcHandlerInterface stuff is only used when running online and connected to MIDAS. How does it come into the 
> picture when you analyze a data file offline? ProcessMidasOnlineTmfe() does not run, the RpcHandler object is not constructed.
> 
> maybe if you point me to your source code, I can see what you are doing?
> 
> K.O.
  3049   04 Jun 2025 Mark GrimesBug ReportMemory leak in mhttpd binary RPC code
Hi,
During an evening of running we noticed that memory usage of mhttpd grew to close to 100Gb. We think we've traced this to the following issue when making RPC calls.

  • The brpc method allocates memory for the response at src/mjsonrpc.cxx#lines-3449.
  • It then makes the call at src/mjsonrpc.cxx#lines-3460, which may set `buf_length` to zero if the response was empty.
  • It then uses `MJsonNode::MakeArrayBuffer` to pass ownership of the memory to an `MJsonNode`, providing `buf_length` as the size.
  • When the `MJsonNode` is destructed at mjson.cxx#lines-657, it only calls `free` on the buffer if the size is greater than zero.

Hence, mhttpd will leak at least 1024 bytes for every binary RPC call that returns an empty response.
I tried to submit a pull request to fix this but I don't have permission to push to https://bitbucket.org/tmidas/mjson.git. Could somebody take a look?

Thanks,

Mark.
  3051   07 Jun 2025 Mark GrimesBug ReportMemory leak in mhttpd binary RPC code

Hi,

We applied an intermediate fix for this locally and it seems to have fixed our issue.  The attached plot shows the percentage memory use on our machine with 128 Gb memory, as a rough proxy for mhttpd memory use.  After applying our fix mhttpd seems to be happy using ~7% of the memory after being up for 2.5 days.

Our fix to mjson was:

diff --git a/mjson.cxx b/mjson.cxx

index 17ee268..2443510 100644

--- a/mjson.cxx

+++ b/mjson.cxx

@@ -654,8 +654,7 @@ MJsonNode::~MJsonNode() // dtor

       delete subnodes[i];

    subnodes.clear();

 

-   if (arraybuffer_size > 0) {

-      assert(arraybuffer_ptr != NULL);

+   if (arraybuffer_ptr != NULL) {

       free(arraybuffer_ptr);

       arraybuffer_size = 0;

       arraybuffer_ptr = NULL;

We also applied the following in midas for good measure, although I don't think it contributed to the leak we were seeing:

diff --git a/src/mjsonrpc.cxx b/src/mjsonrpc.cxx

index 2201d228..38f0b99b 100644

--- a/src/mjsonrpc.cxx

+++ b/src/mjsonrpc.cxx

@@ -3454,6 +3454,7 @@ static MJsonNode* brpc(const MJsonNode* params)

    status = cm_connect_client(name.c_str(), &hconn);

 

    if (status != RPC_SUCCESS) {

+      free(buf);

       return mjsonrpc_make_result("status", MJsonNode::MakeInt(status));

    }

I hope this is useful to someone.  As previously mentioned we make heavy use of binary RPC, so maybe other experiments don't run into the same problem.

Thanks,

Mark.

Attachment 1: msysmon-mu3ebe-20250601-042124-20250606-122124.png
msysmon-mu3ebe-20250601-042124-20250606-122124.png
  3058   15 Jun 2025 Mark GrimesBug ReportMemory leak in mhttpd binary RPC code
Many thanks for the fix.  We've applied and see better memory performance.  We still have to kill and restart 
mhttpd after a few days however.  I think the official fix is missing this part:

diff --git a/src/mjsonrpc.cxx b/src/mjsonrpc.cxx
index 2201d228..38f0b99b 100644
--- a/src/mjsonrpc.cxx
+++ b/src/mjsonrpc.cxx
@@ -3454,6 +3454,7 @@ static MJsonNode* brpc(const MJsonNode* params)
    status = cm_connect_client(name.c_str(), &hconn);
 
    if (status != RPC_SUCCESS) {
+      free(buf);
       return mjsonrpc_make_result("status", MJsonNode::MakeInt(status));
    }

When the other process returns a failure the memory block is also currently leaked.  I originally stated "...although I 
don't think it contributed to the leak we were seeing" but it seems this was false.

Thanks,

Mark.


> I confirm that MJSON_ARRAYBUFFER does not work correctly for zero-size buffers, 
> buffer is leaked in the destructor and copied as NULL in MJsonNode::Copy().
> 
> I also confirm memory leak in mjsonrpc "brpc" error path (already fixed).
> 
> Affected by the MJSON_ARRAYBUFFER memory leak are "brpc" (where user code returns 
> a zero-size data buffer) and "js_read_binary_file" (if reading from an empty 
> file, return of "new char[0]" is never freed).
> 
> "receive_event" and "read_history" RPCs never use zero-size buffers and are not 
> affected by this bug.
> 
> mjson commit c798c1f0a835f6cea3e505a87bbb4a12b701196c
> midas commit 576f2216ba2575b8857070ce7397210555f864e5
> rootana commit a0d9bb4d8459f1528f0882bced9f2ab778580295
> 
> Please post bug reports a plain-text so I can quote from them.
> 
> K.O.
  3062   04 Jul 2025 Mark GrimesBug ReportMemory leaks in mhttpd
Something changed in our system and we started seeing memory leaks in mhttpd again.  I guess someone 
updated some front end or custom page code that interacted with mhttpd differently.
I found a few memory leaks in some (presumably) rarely seen corner cases and we now see steady 
memory usage.  The branch is fix/memory_leaks 
(https://bitbucket.org/tmidas/midas/branch/fix/memory_leaks) and I opened pull request #55 
(https://bitbucket.org/tmidas/midas/pull-requests/55).  I couldn't find a BitBucket account for you 
Konstantin to add as a reviewer, so it currently has none.

Thanks,

Mark.
  3073   17 Sep 2025 Mark GrimesBug ReportMidas no longer compiles on macOS
Hi,
The current develop branch no longer compiles on macOS. I get lots of errors of the form

/Users/me/midas/src/history_schema.cxx:740:4: error: unknown type name 'off64_t'; did you mean 'off_t'?
  740 |    off64_t fDataOffset = 0;
      |    ^~~~~~~
      |    off_t
/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX15.5.sd
k/usr/include/sys/_types/_off_t.h:31:33: note: 'off_t' declared here
   31 | typedef __darwin_off_t          off_t;
      |                                 ^

There are also similar errors about lseek64. This appears to have come in with commit 9a6ad2e dated 
23rd July, but I think it was merged into develop with commit 2beeca0 on 3rd of September.

Googling around it seems that off64_t is a GNU extension. I don't know of a cross platform solution but I'm 
happy to test if someone has a suggestion.

Thanks,

Mark.
  3074   17 Sep 2025 Mark GrimesSuggestionGet manalyzer to configure midas::odb when running offline
Hi,
Lots of users like the midas::odb interface for reading from the ODB in manalyzers. It currently doesn't 
work offline however without a few manual lines to tell midas::odb to read from the ODB copy in the run 
header. The code also gets a bit messy to work out the current filename and get midas::odb to reopen the 
file currently being processed. This would be much cleaner if manalyzer set this up automatically, and then 
user code could be written that is completely ignorant of whether it is running online or offline.

The change I suggest is in the `set_offline_odb` branch, commit 4ffbda6, which is simply:

diff --git a/manalyzer.cxx b/manalyzer.cxx
index 371f135..725e1d2 100644
--- a/manalyzer.cxx
+++ b/manalyzer.cxx
@@ -15,6 +15,7 @@
 
 #include "manalyzer.h"
 #include "midasio.h"
+#include "odbxx.h"
 
 //////////////////////////////////////////////////////////
 
@@ -2075,6 +2076,8 @@ static int ProcessMidasFiles(const std::vector<std::string>& files, const std::v
                if (!run.fRunInfo) {
                   run.CreateRun(runno, filename.c_str());
                   run.fRunInfo->fOdb = MakeFileDumpOdb(event->GetEventData(), event->data_size);
+                  // Also set the source for midas::odb in case people prefer that interface
+                  midas::odb::set_odb_source(midas::odb::STRING, std::string(event->GetEventData(), event-
>data_size));
                   run.BeginRun();
                }


It happens at the point where the ODB record is already available and requires no effort from the user to 
be able to read the ODB offline.

Thanks,

Mark.
  3077   18 Sep 2025 Mark GrimesSuggestionGet manalyzer to configure midas::odb when running offline
> ....Before commit of this patch, can you confirm the RunInfo destructor
> deletes this ODB stuff from odbxx? manalyzer takes object life times very seriously.

The call stores the ODB string in static members of the midas::odb class. So these will have a lifetime of the process or until they're replaced by another 
call. When a midas::odb is instantiated it reads from these static members and then that data has the lifetime of that instance.

> Of course with this patch extending manalyzer to process two or more runs at the same time becomes impossible.

Yes, I hadn't realised that was an option. For that to work I guess the aforementioned static members could be made thread local storage, and 
processing of each run kept to a specific thread. Although I could imagine user code making assumptions and breaking, like storing a midas::odb as a 
class member or something.

Note that I missed doing the same for the end of run event, which should probably also be added.

Thanks,

Mark.
  3099   26 Sep 2025 Mark GrimesSuggestionGet manalyzer to configure midas::odb when running offline
> ...I talked to KO so he agreed that yo then commit your proposed change of manalyzer 

Merged and pushed.

Thanks,

Mark.
  2845   16 Sep 2024 Marius KöppelBug ReportCrash using ODB watch
Hi all,

last week I was running MIDAS with the commit 3ad98c5. Today I updated MIDAS and now all my watch functions are crashing. Attached I have a minimal example frontend of the problem.

In our software we have two functions one which sets up the ODB values of the frontend and another one which sets up all watch functions. So overall we connect two time to the ODB during fronend_init one time to create the values and one time to create the watch. In the example code a simple version of this setup is shown:

INT frontend_init() {

  cm_msg(MINFO, "frontend_init() setup", "Test FE");

  odb settings = {
    {"Test", 123},
    {"sub", {}}
  };
  settings.connect_and_fix_structure("/Equipment/Test FE/Settings");
  // settings.watch(watch); <-- this works without segmentation fault

  odb new_settings("/Equipment/Test FE/Settings");
  new_settings.watch(watch); // <-- here I am getting a segmentation fault

  return CM_SUCCESS;
}

When I directly set the watch everything runs fine however, when I create a new ODB object and use this one to set a watch I am getting the following segmentation fault:

Process 18474 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x34)
    frame #0: 0x000000010004fa38 test_fe`midas::odb::watch_callback(hDB=<unavailable>, hKey=<unavailable>, index=0, info=0x00006000002001c0) at odbxx.cxx:96:25 [opt]
   93  	      if (po->m_data == nullptr)
   94  	         mthrow("Callback received for a midas::odb object which went out of scope");
   95  	      midas::odb *poh = search_hkey(po, hKey);
-> 96  	      poh->m_last_index = index;
   97  	      po->m_watch_callback(*poh);
   98  	      poh->m_last_index = -1;
   99  	   }

Best,
Marius
Attachment 1: test_fe.cpp
#include <algorithm>
#include <math.h>
#include <random>
#include <array>
#include <string>
#include <stdio.h>
#include <stdlib.h>

#include "midas.h"
#include "msystem.h"
#include "odbxx.h"
#include "mfe.h"

using namespace std;
using midas::odb;


/*-- Globals -------------------------------------------------------*/
/* The frontend name (client name) as seen by other MIDAS clients   */
const char *frontend_name = "Test FE";
/* The frontend file name, don't change it */
const char *frontend_file_name = __FILE__;

/* frontend_loop is called periodically if this variable is TRUE    */
BOOL frontend_call_loop = FALSE;

/* a frontend status page is displayed with this frequency in ms    */
INT display_period = 0;

/* maximum event size produced by this frontend */
INT max_event_size = 32 * (1024 * 1024); // 32MiB

/* maximum event size for fragmented events (EQ_FRAGMENTED) */
INT max_event_size_frag = 5 * 1024 * 1024;

/* buffer size to hold events */
INT event_buffer_size = 4 * max_event_size;

BOOL equipment_common_overwrite = TRUE;//true is overwriting the common odb

/*-- Function declarations -----------------------------------------*/
INT read_odb(char * pevent, INT);
void watch(odb o);
/*-- Equipment list ------------------------------------------------*/

EQUIPMENT equipment[] = {

  {
    "Test FE",      /* equipment name */
    {1, 0,          /* event ID, trigger mask */
    "SYSTEM",      /* event buffer */
    EQ_PERIODIC,   /* equipment type */
    0,             /* event source */
    "MIDAS",       /* format */
    TRUE,          /* enabled */
    RO_ALWAYS | RO_ODB,  /* read always and update ODB */
    1000,          /* read every 1 sec */
    0,             /* stop run after this event limit */
    0,             /* number of sub events */
    0,             /* log history every event */
    "", "", ""},
    read_odb,       /* readout routine */
  },

  {""}};

/*-- Dummy routines ------------------------------------------------*/

INT poll_event(INT, INT count, BOOL test)  {
    return 0;
}

INT interrupt_configure(INT, INT, POINTER_T) {
    return 1;
}
/*-- Frontend Init -------------------------------------------------*/

INT frontend_init() {

  cm_msg(MINFO, "frontend_init() setup", "Test FE");

  odb settings = {
    {"Test", 123},
    {"sub", {}}
  };
  settings.connect_and_fix_structure("/Equipment/Test FE/Settings");
  // settings.watch(watch); <-- this works without segmentation fault

  odb new_settings("/Equipment/Test FE/Settings");
  new_settings.watch(watch); // <-- here I am getting a segmentation fault

  return CM_SUCCESS;
}

void watch(odb o) {
    std::string name = o.get_name();

    if (name == "Test") {
      printf("I am a watch on Test\n");
    }
}

/*-- Frontend Exit -------------------------------------------------*/

INT frontend_exit() {
  return CM_SUCCESS;
}

/*-- Frontend Loop -------------------------------------------------*/

INT frontend_loop() {
  return CM_SUCCESS;
}

/*-- Begin of Run --------------------------------------------------*/

INT begin_of_run(INT run_number, char *error) {
  return CM_SUCCESS;
}

/*-- End of Run ----------------------------------------------------*/

INT end_of_run(INT run_number, char *error) {
  return CM_SUCCESS;
}

/*-- Pause Run -----------------------------------------------------*/

INT pause_run(INT run_number, char *error) {
  return CM_SUCCESS;
}

/*-- Resume Run ----------------------------------------------------*/

INT resume_run(INT run_number, char *error) {
  return CM_SUCCESS;
}

/* -- Readout --*/
INT read_odb(char *pevent, INT){

    return 0;

}
  1549   12 Jun 2019 Marius KoeppelForumStrange JS array creation
Hello everybody,

I have a strange JS behavior. In one of my frontends I create a key in the ODB with:

db_create_key(hDB, 0, "Equipment/Switching/Variables/DATA_WRITE", TID_INT);

In my custom page I have a JS function which loops over an array and sets the
value of this key with:

for (i = 0; i < lines.length; i++) {
        modbset("/Equipment/Switching/Variables/DATA_WRITE[" + String(i) + "]",
parseInt(lines[i]));
}

After calling this function I have an array in the ODB now. For my understanding
calling an INT like an array shouldn't be possible. So is this dangerous to do?

Best regards,
Marius
  1567   24 Jun 2019 Marius KoeppelForumStrange JS array creation
> > for (i = 0; i < lines.length; i++) {
> >         modbset("/Equipment/Switching/Variables/DATA_WRITE[" + String(i) + "]", parseInt(lines[i]));
> > }
> 
> this is wrong.
> 
> a) you are programming javascript as if it were C/C++. You think this code wrote lines.length() values 
> to ODB, when what the code actually did is queued lines.length() RPC requests for later execution. 
> Eventually some time later, each RPC request will open a connection to mhttpd, send a request, wait 
> for mhtttpd to process it, etc. Where do you wait for the completion of all these RPCs before 
> proceeding as if all the data has been successfully written to ODB? (answer: you cannot, javascript 
> cannot "wait for things", instead you have to make chains of event handlers. javascript != C/C++. 
> They are completely different).

--> Following your discussion about async. functions I will change this part of the code and make chains of
event handlers.

> b)  you should write the whole array in one operation instead of looping over each element. see 
> mjsonrpc_db_paste() and example.html.

--> In the midas back-end I never created an array. I created an INT in the ODB with db_create_key(hDB, 0,
"Equipment/Switching/Variables/DATA_WRITE", TID_INT). By using modset in javascript and parsing the string
"/Equipment/Switching/Variables/DATA_WRITE[" + String(i) + "]" I call it like an array and it shows up like an
array in the ODB. So for explaining it a bit better how the value changes in the ODB take this pseudo code
example:

// midas part //
> int a = 1; // this is more or less what I think db_create_key is doing in the ODB
// midas part //

// ODB //
> print(a) // this prints me 1 and this is also the value what I see in the ODB
// ODB //

// javascript part //
> for int i in [1,2,3,4] do 
> modset(a[i], i) // for simplification I don't use event handlers here
> end for
// javascript part //

// ODB //
> print(a) // now I see [1,2,3,4]
// ODB //

This example violates type safety. I know that javascript is not type safe. According to this I would like
to know if this behavior is wanted or why there is no bounds checking?

> I do not understand your question about "calling an INT like an array".
--> Here I mean that I call the variable in the ODB via string passing, like I would call a variable, which is
an array. I don't speak about function calls.

> parseInt() (defined where?)
--> This is a global JavaScript function (https://www.w3schools.com/jsref/jsref_obj_global.asp)

Cheers,
Marius
  1822   12 Feb 2020 Marius KoeppelForumDifference between "Event Data Size" and "All Bank Size"
Dear all,

we are trying to build Midas events on FPGAs and send them directly to the midas
ring buffer via copy_n. According to the wiki
https://midas.triumf.ca/MidasWiki/index.php/Event_Structure Event Data Size is:
"The event data size contains the size of the event in bytes excluding the
header." and All Bank Size is: "Size in bytes of the following data plus the
size of the bank header". So are they actually the same or what header is the
header in the first sentence also the bank header?

Cheers,
Marius

 
  1829   13 Feb 2020 Marius KoeppelForumWritting Midas Events via FPGAs
Dear all,

we creating Midas events directly inside a FPGA and send them off via DMA into the PC RAM. For reading out this RAM via Midas the FPGA sends as a pointer where it has written the last 4kB of data. We use this pointer for telling the ring buffer of midas where the new events are. The buffer looks something like:

// event 1
dma_buf[0] = 0x00000001; // Trigger and Event ID
dma_buf[1] = 0x00000001; // Serial number
dma_buf[2] = TIME; // time
dma_buf[3] = 18*4-4*4; // event size
dma_buf[4] = 18*4-6*4; // all bank size
dma_buf[5] = 0x11; // flags
// bank 0
dma_buf[6] = 0x46454230; // bank name
dma_buf[7] = 0x6; // bank type TID_DWORD
dma_buf[8] = 0x3*4; // data size
dma_buf[9] = 0xAFFEAFFE; // data
dma_buf[10] = 0xAFFEAFFE; // data
dma_buf[11] = 0xAFFEAFFE; // data
// bank 1
dma_buf[12] = 0x1; // bank name
dma_buf[12] = 0x46454231; // bank name
dma_buf[13] = 0x6; // bank type TID_DWORD
dma_buf[14] = 0x3*4; // data size
dma_buf[15] = 0xAFFEAFFE; // data
dma_buf[16] = 0xAFFEAFFE; // data
dma_buf[17] = 0xAFFEAFFE; // data

// event 2
.....

dma_buf[fpga_pointer] = 0xXXXXXXXX;


And we do something like:

while{true}
   // obtain buffer space
   status = rb_get_wp(rbh, (void **)&pdata, 10);
   fpga_pointer = fpga.read_last_data_add();

   wlen = last_fpga_pointer - fpga_pointer; \\ in 32 bit words
   copy_n(&dma_buf[last_fpga_pointer], wlen, pdata);
   rb_status = rb_increment_wp(rbh, wlen * 4); \\ in byte

   last_fpga_pointer = fpga_pointer;

Leaving the case out where the dma_buf wrap around this works fine for a small data rate. But if we increase the rate the fpga_pointer also increases really fast and wlen gets quite big. Actually it gets bigger then max_event_size which is checked in rb_increment_wp leading to an error. 

The problem now is that the event size is actually not to big but since we have multi events in the buffer which are read by midas in one step. So we think in this case the function rb_increment_wp is comparing actually the wrong thing. Also increasing the max_event_size does not help.

Remark: dma_buf is volatile so memcpy is not possible here.

Cheers,
Marius
  Draft   20 Feb 2020 Marius Koeppel  
We also agree and found the problem now. Since we build everything (MIDAS Event Header, Bank Header, Banks etc.) in the FPGA we had some struggle with the MIDAS data format (http://lmu.web.psi.ch/docu/manuals/bulk_manuals/software/midas195/html/AppendixA.html). We thought that only the MIDAS Event needs to be aligned to 64 bit but as it turned out also the bank data (Stefan updated the wiki page already) needs to be aligned. Since we are using the BANK32 it was a bit unclear for us since the bank header is not 64 bit aligned. But we managed this now by adding empty data and the system is running now.

Our setup looks like this:

- mfe.cxx multithread equipment
- mfe readout thread grabs pointer from dma ring buffer 
- since the dma buffer is volatile we do copy_n for transforming the data to MIDAS 
- the data is already in the MIDAS format so done from our side :)
- mfe readout thread increments the ring buffer
- mfe main thread grabs events from ring buffer, sends them to the mserver

From the firmware side we have an Arria 10 development board and 

But now I am curious, which DMA controller you use? The Altera or Xilinx PCIe block with the vendor supplied DMA driver? Or you do DMA on an ARM SoC FPGA? (no PCI/PCIe, 
different DMA controller, different DMA driver).

I am curious because we will be implementing pretty much what you do on ARM SoC FPGAs pretty soon, so good to know
if there is trouble to expect.

But I will probably use the tmfe.h c++ frontend and a "pure c++" ring buffer instead of mfe.cxx and the midas "rb" ring buffer.

(I did not look at your code at all, there could be a bug right there, this ring buffer stuff is tricky. With luck there is no bug
in your dma driver. The dma drivers for our vme bridges did do have bugs).

K.O.
  1841   20 Feb 2020 Marius KoeppelForumWritting Midas Events via FPGAs
We also agree and found the problem now. Since we build everything (MIDAS Event Header, Bank Header, Banks etc.) in the FPGA we had some struggle with the MIDAS data format (http://lmu.web.psi.ch/docu/manuals/bulk_manuals/software/midas195/html/AppendixA.html). We thought that only the MIDAS Event needs to be aligned to 64 bit but as it turned out also the bank data (Stefan updated the wiki page already) needs to be aligned. Since we are using the BANK32 it was a bit unclear for us since the bank header is not 64 bit aligned. But we managed this now by adding empty data and the system is running now.

Our setup looks like this:

Software:
- mfe.cxx multithread equipment
- mfe readout thread grabs pointer from dma ring buffer 
- since the dma buffer is volatile we do copy_n for transforming the data to MIDAS 
- the data is already in the MIDAS format so done from our side :)
- mfe readout thread increments the ring buffer
- mfe main thread grabs events from ring buffer, sends them to the mserver

Firmware:
- Arria 10 development board
- Altera PCIe block
- Own DMA engine since we are doing burst writing DMA with PCIe 3.0. 
- Own device driver
- no interrupts 

If you have more questions fell free to ask.
  Draft   23 Feb 2020 Marius KoeppelForumWritting Midas Events via FPGAs
> > We also agree and found the problem now.
> 
> Good. what was wrong?
> 
> > - Own DMA engine since we are doing burst writing DMA with PCIe 3.0. 
> > - Own device driver
> 
> Scary stuff.
> 
> > - no interrupts 
> 
> Right. Best I can tell, interrupts no longer useful in Linux - interrupt handler cannot do any real work, has to hand off to a kernel thread, resulting
> in so much latency and overhead that one might as well poll for the data... And for DMA data transfers, the data rate is well known,
> so easy to predict how long the DMA will run for and sleep for that amount of time instead of waiting for an interrupt.
> 
> K.O.

So the problem was that we assumed that the bank (with the header) needs to be 64bit aligned. Even more we aligned the hole Midas event to 256bit in the fpga since we have a  250mhz x 256 Bit interface for PCIe. But then we saw that you align the bank data to 64bit -> crash of mdump etc. For now we generate the data on the FPGA in the „old“ Midas format. So having a flag for changing to a different alignment would be actually really nice. 

Cheers,
Marius
ELOG V3.1.4-2e1708b5