Back Midas Rome Roody Rootana
  Midas DAQ System, Page 68 of 154  Not logged in ELOG logo
New entries since:Wed Dec 31 16:00:00 1969
ID Date Author Topic Subjectdown
  2103   25 Feb 2021 Konstantin OlchanskiBug ReportUndefined client causing issues in transition.
Clearly something goes wrong with the STARTABORT transition. Actually from your 
sceenshot, it is not clear why the STARTABORT transition was initiated.

Usually it is called after some client fails the "start run" transition to inform 
other clients that the run did not start after all. (mlogger uses this to close the 
output file, etc).

But in the screenshot, we do not see any client fail the transition (only rootana1 
was called, and it returned "green").

So, a puzzle. One possibility is that the transition code gets so confused
that it does not record correct transition data to ODB, then the web page
gets even more confused.

One way to see what happens, is to run the odbedit command "start now -v".

Can you try that? And attach all it's output here?

K.O.
  2109   26 Feb 2021 Isaac Labrie BoulayBug ReportUndefined client causing issues in transition.
> Clearly something goes wrong with the STARTABORT transition. Actually from your 
> sceenshot, it is not clear why the STARTABORT transition was initiated.
> 
> Usually it is called after some client fails the "start run" transition to inform 
> other clients that the run did not start after all. (mlogger uses this to close the 
> output file, etc).
> 
> But in the screenshot, we do not see any client fail the transition (only rootana1 
> was called, and it returned "green").
> 
> So, a puzzle. One possibility is that the transition code gets so confused
> that it does not record correct transition data to ODB, then the web page
> gets even more confused.
> 
> One way to see what happens, is to run the odbedit command "start now -v".
> 
> Can you try that? And attach all its output here?
> 
> K.O.

Thanks for getting back to me right away. I've attached two screenshots. The first one 
is the output after running "start now -v" (everything seemed to work nicely there), the 
second output is after using odbedit to stop the run with "stop". Notice that the DAQ 
never stops because it gets stuck in between transitions (You can see the run status 
being "stopping run" with the cancel transition button).

Thanks.

Isaac
  2111   26 Feb 2021 Konstantin OlchanskiBug ReportUndefined client causing issues in transition.
So there is no error on run start anymore? To debug the stuck run stop, please use "stop -v" 
to see where it got stuck. You can also play with the RPC timeouts (the connect timeout and 
the response timeout), to make it get "unstuck" quicker. Definitely it should not be stuck 
forever, it should timeout at maximum of "rpc timeout * number of clients". K.O.
  2113   26 Feb 2021 Isaac Labrie BoulayBug ReportUndefined client causing issues in transition.
> So there is no error on run start anymore? To debug the stuck run stop, please use "stop -v" 
> to see where it got stuck. You can also play with the RPC timeouts (the connect timeout and 
> the response timeout), to make it get "unstuck" quicker. Definitely it should not be stuck 
> forever, it should timeout at maximum of "rpc timeout * number of clients". K.O.

You're right it does not stay stuck forever, it eventually gets unstuck. I forgot to mention this. 
I will try to play with these timeout parameters. It does not get stuck if I run my DAQ using the 
odbedit commands (start/stop). I don't know if this is relevant information that could help us 
identify the problem.

Thanks for all your help as always!

Isaac
  2119   03 Mar 2021 Konstantin OlchanskiBug ReportUndefined client causing issues in transition.
> It does not get stuck if I run my DAQ using the odbedit commands (start/stop).

Interesting. Run start/stop from odbedit works but from mhttpd gets stuck.

I think they do not run the transition quite the same way. mhttpd uses the multithreaded transition.

So we can debug this using the "mtransition" program. Try:

- mtransition -v -d 1 START/STOP -- this should be same as odbedit
- mtransition -m -v -d 1 START/STOP -- this should be same as mhttpd

The "-v" and "-d 1" flags should cause lots of output, for failed transitions,
cut-and-paste it all into this elog here, should give us plenty of meat to debug.

K.O.
  270   11 Jul 2006 Razvan Stefan GorneaForumTundra Universe CA91C042
I am not using Midas but I need some help from somebody experienced with VME access using the Tundra Universe, so I thought here I have a chance ...

I have a GE Fanuc 7700 and use the vme_universe driver (ver. 3.3). In the past I programed for a DAQ board using A24/D16. Now I have a new board using A24/MB and I am really last!

So the board has some 64-bit registers and some 32-bit registers (all aligned on 64-bit) and a FIFO to read the main data. After reading the user manual for universe chip and the docs for the driver I am still confused about how things are supposed to work.

First my understanding is that for reading 64-bit I need anyway the multiplex block mode. But nowhere I could find if the multiplex mode supports 32-bit transfers. Should I map two windows on the same VME address range, one for A24/D32 and one for A24/MB? Or read everything with an unsigned long long and cast to unsigned int all 32-bit registers?

Second I don't know how to handle the FIFO which is in the middle of the address range. When the board has a trigger I have to read more than 100000 times this FIFO. If I simply read at the FIFO address 100000 times do I get the VME multiplex block mode (if the window has been mapped with A24/MB address modifier)? How does the chip/driver know not to send the address and just do the data cycle after the first read?

I also had the naive idea to have a master window mapped on the board address range to access all the registers except the FIFO and to create a DMA buffer for the FIFO (FIFO readout is where most of the work is anyway so I guess an advantage is that will free the CPU) but it seems to me that the dma_transfer function in the kernel module increments the address. I don't dare change this since I don't even understand the exact relationship between accesses to the mapped window and what's happening on the VME bus.

Thanks for any help!
  2376   29 Mar 2022 Hunter LoweForumTriggering without LAM signal - mcstd_libgpmc_camac driver
Hello,

I have a question for anyone experienced with simple CAMAC systems.
 My understanding is that for a single ADC system you can use a gate to generate a
 LAM signal for triggering on ADC.
 The driver that I have "mcstd_libgpmc_camac" has LAM "not implemented" though,
 so I'm not sure how I should trigger DAQ. The frontend code that I have seems to use a TDC
 as trigger for ADC via "EQ_POLLED" type equipment setting. Should I simply plug in TDC in my
 system and use this as trigger? Is it as simple as TDC generates signal via gate and ADC performs job? 

Sorry if question is super basic, just confused how to trigger without LAM signal.

Thank you :)

Hunter Lowe
UNBC Grad Physics
  1990   02 Sep 2020 Ruslan PodviianiukForumTransition status message
Hello,

I got an error after start of run and it would be good to show this error (or 
errors) in UI that I am developing. I see this error in the Transition 
directory (please see the attached file). Is it possible to read the status 
message and error messages from the Transition directory using jsonrpc? If yes, 
could you please explain me how to do this.

Thank you.
Ruslan  
  1991   02 Sep 2020 Ben SmithForumTransition status message
The information you want is in the ODB:
* "/System/Transition/status" is the overall integer status code.
* "/System/Transition/error" is the overall error message string.

There is also per-client status information in the ODB:
* "/System/Transition/Clients/<client_name>/status"
* "/System/Transition/Clients/<client_name>/error"
  1992   02 Sep 2020 Ruslan PodviianiukForumTransition status message
> The information you want is in the ODB:
> * "/System/Transition/status" is the overall integer status code.
> * "/System/Transition/error" is the overall error message string.
> 
> There is also per-client status information in the ODB:
> * "/System/Transition/Clients/<client_name>/status"
> * "/System/Transition/Clients/<client_name>/error"


Thank you so much, Ben!
  1995   08 Sep 2020 Konstantin OlchanskiForumTransition status message
> > The information you want is in the ODB:
> > * "/System/Transition/status" is the overall integer status code.
> > * "/System/Transition/error" is the overall error message string.
> > 
> > There is also per-client status information in the ODB:
> > * "/System/Transition/Clients/<client_name>/status"
> > * "/System/Transition/Clients/<client_name>/error"

You can also use web page .../resources/transition.html as an example of how
to read transition (and other) data from ODB into your own web page. example.html
may also be helpful.

K.O.
  1996   08 Sep 2020 Ruslan PodviianiukForumTransition status message
> > > The information you want is in the ODB:
> > > * "/System/Transition/status" is the overall integer status code.
> > > * "/System/Transition/error" is the overall error message string.
> > > 
> > > There is also per-client status information in the ODB:
> > > * "/System/Transition/Clients/<client_name>/status"
> > > * "/System/Transition/Clients/<client_name>/error"
> 
> You can also use web page .../resources/transition.html as an example of how
> to read transition (and other) data from ODB into your own web page. example.html
> may also be helpful.
> 
> K.O.

Thank you Konstantin!

Ruslan
  2937   05 Feb 2025 Andreas SuterForumTransition from mana -> manalyzer
Hi,

we are planning to migrate from mana to manalyzer. I started to have a look into it and realized that I have some lose ends.
Is there a clear migration docu somewhere?

Currently I understand it the following way (which might be wrong):
The class TARunObject is used to write analyzer modules which are registered by TAFactory. I hope this is right?

However, in mana there is an analyzer implemented by the user which binds the modules and has additional routines:
analyzer_init(), analyzer_exit(), analyzer_loop()
ana_begin_of_run(), ana_end_of_run(), ana_pause_run(), ana_resume_run()
which we are using.

This part I somehow miss in manalyzer, most probably due to lack of understanding, and missing documentation.

Could somebody please give me a boost?
  2938   05 Feb 2025 Andrea CapraForumTransition from mana -> manalyzer
Hi Andreas,

please find in elog:2938/1 a short introduction that I wrote sometime ago.
I'm glad to offer additional support, if needed.

> Hi,
> 
> we are planning to migrate from mana to manalyzer. I started to have a look into it and realized that I have some lose ends.
> Is there a clear migration docu somewhere?
> 
> Currently I understand it the following way (which might be wrong):
> The class TARunObject is used to write analyzer modules which are registered by TAFactory. I hope this is right?
> 
> However, in mana there is an analyzer implemented by the user which binds the modules and has additional routines:
> analyzer_init(), analyzer_exit(), analyzer_loop()
> ana_begin_of_run(), ana_end_of_run(), ana_pause_run(), ana_resume_run()
> which we are using.
> 
> This part I somehow miss in manalyzer, most probably due to lack of understanding, and missing documentation.
> 
> Could somebody please give me a boost?
  2939   06 Feb 2025 Konstantin OlchanskiForumTransition from mana -> manalyzer
> Could somebody please give me a boost?

no need to shout into the void, it is pretty easy to identify the author of manalyzer and ask me directly.

> we are planning to migrate from mana to manalyzer. I started to have a look into it and realized that I have some lose ends.
> Is there a clear migration docu somewhere?

README.md and examples in the manalyzer git repository.

If something is missing, or unclear, please ask.

> Currently I understand it the following way (which might be wrong):
> The class TARunObject is used to write analyzer modules which are registered by TAFactory. I hope this is right?

Please read the README file. It explains what is going on there.

Design of manalyzer had 2 main goals:
a) lifetime of all c++ objects (and ROOT objects) is well defined (to fix a design defect in rootana)
b) event flow and data flow are documentable (problem in mfe.c frontends, etc)

> However, in mana there is an analyzer implemented by the user which binds the modules and has additional routines:
> analyzer_init(), analyzer_exit(), analyzer_loop()
> ana_begin_of_run(), ana_end_of_run(), ana_pause_run(), ana_resume_run()
> which we are using.

I have never used the mana analyzer, I wrote the c++ rootana analyzer very early on (first commit in 2006).

But the basic steps should all be there for you:
- initialization (create histograms, open files) can be done in the module constructor or in BeginRun()
- finalization (fit histograms, close files) should be done in EndRun()
- event processing (obviously) in Analyze()
- pause run, resume run and switch to next subrun file have corresponding methods
- all the "flow" and multithreading stuff you can ignore to first order.

To start the migration, I recommend you take manalyzer_example_root.cxx and start stuffing it with your code.

If you run into any problems, I am happy to help with them. Ask here or contact me directly.

> This part I somehow miss in manalyzer, most probably due to lack of understanding, and missing documentation.

True, I wrote a migration guide for the frontend mfe.c to c++ tmfe, because we do this migration
quite often. But I never wrote a migration guide from mana.c analyzer, because we never did such
migration. Most experiments at TRIUMF are post-2006 and use rootana in it's different incarnations.

P.S. I designed the C++ TMFe frontend after manalyzer and I think it came out quite better, I especially
value the design input from Stefan, Thomas, Pierre, Joseph and Ben.

P.P.S. Be free to ignore all this manalyzer business and write your own analyzer based
on the midasio library:

int main()
{
   TMReaderInteraface* f = TMNewReader(file.mid.gz");
   while (1) {
      TMEvent* e = TMReadEvent(f);
      dwim(e);
      delete e;
   }
   delete f;
}

For online processing I use the TMFe class, it has enough bits to be a frontend and an analyzer,
or you can use the older TMidasOnline from rootana.

Access to ODB is via the mvodb library, which is new in midas, but has been part of rootana
and my frontend toolkit since at least 2011 or earlier, inspired by Peter Green's even
older "myload" ODB access library.

K.O.
  2940   06 Feb 2025 Konstantin OlchanskiForumTransition from mana -> manalyzer
> > Is there a clear migration docu somewhere?

I can also give you links to the alpha-g analyzer (very complex) and the DarkLight trigger TDC analyzer (very simple),
there is also analyzer examples of in-between complexity.

K.O.
  939   20 Nov 2013 Konstantin OlchanskiBug ReportToo many bm_flush_cache() in mfe.c
I was looking at something in the mserver and noticed that for remote frontends, for every periodic event, 
there are about 3 RPC calls to bm_flush_cache().

Sure enough, in mfe.c::send_event(), for every event sent, there are 2 calls to bm_flush_cache() (once for 
the buffer we used, second for all buffers). Then, for a good measure, the mfe idle loop calls 
bm_flush_cache() for all buffers about once per second (even if no events were generated).

So what is going on here? To allow good performance when processing many small events,
the MIDAS event buffer code (bm_send_event()) buffers small events internally, and only after this internal
buffer is full, the accumulated events are flushed into the shared memory event buffer,
where they become visible to the mlogger, mdump and other consumers.

Because of this internal buffering, infrequent small size periodic events can become
stuck for quite a long time, confusing the user: "my frontend is sending events, how come I do not
see them in mdump?"

To avoid this, mfe.c manually flushes these internal event buffers by calling bm_flush_buffer().

And I think that works just fine for frontends directly connected to the shared memory, one call to 
bm_flush_buffer() should be sufficient.

But for remote fronends connected through the mserver, it turns out there is a race condition between 
sending the event data on one tcp connection and sending the bm_flush_cache() rpc request on another 
tcp connection.

I see that the mserver always reads the rpc connection before the event connection, so bm_flush_cache() 
is done *before* the event is written into the buffer by bm_send_event(). So the newly
send event is stuck in the buffer until bm_flush_cache() for the *next* event shows up:

mfe.c: send_event1 -> flush -> ... wait until next event ... -> send_event2 -> flush
mserver: flush -> receive_event1 -> ... wait ... -> flush -> receive_event2 -> ... wait ...
mdump -> ... nothing ... -> ... nothing ... -> event1 -> ... nothing ...

Enter the 2nd call to bm_flush_cache in mfe.c (flush all buffers) - now because mserver seems to be 
alternating between reading the rpc connection and the event connection, the race condition looks like 
this:

mfe.c: send_event -> flush -> flush
mserver: flush -> receive_event -> flush
mdump: ... -> event -> ...

So in this configuration, everything works correctly, the data is not stuck anywhere - but by accident, and 
at the price of an extra rpc call.

But what about the periodic 1/second bm_flush_cache() on all buffers? I think it does not quite work
either because the race condition is still there: we send an event, and the first flush may race it and only 
the 2nd flush gets the job done, so the delay between sending the event and seeing it in mdump would be 
around 1-2 seconds. (no more than 2 seconds, I think). Since users expect their events to show up "right
away", a 2 second delay is probably not very good.

Because periodic events are usually not high rate, the current situation (4 network transactions to send 1 
event - 1x send event, 3x flush buffer) is probably acceptable. But this definitely sets a limit on the 
maximum rate to 3x (2x?) the mserver rpc latency - without the rpc calls to bm_flush_buffer() there
would be no limit - the events themselves are sent through a pipelined tcp connection without 
handshaking.

One solution to this would be to implement periodic bm_flush_buffer() in the mserver, making all calls to 
bm_flush_buffer() in mfe.c unnecessary (unless it's a direct connection to shared memory).

Another solution could be to send events with a special flag telling the mserver to "flush the buffer right 
away".

P.S. Look ma!!! A race condition with no threads!!!

K.O.
  940   21 Nov 2013 Stefan RittBug ReportToo many bm_flush_cache() in mfe.c
> And I think that works just fine for frontends directly connected to the shared memory, one call to 
> bm_flush_buffer() should be sufficient.

That's correct. What you want is once per second or so for polled events, and once per periodic event (which anyhow will typically come only every 10 seconds or so). If there are 3 calls 
per event, this is certainly too much.


> But for remote fronends connected through the mserver, it turns out there is a race condition between 
> sending the event data on one tcp connection and sending the bm_flush_cache() rpc request on another 
> tcp connection.
> 
> ...
> 
> One solution to this would be to implement periodic bm_flush_buffer() in the mserver, making all calls to 
> bm_flush_buffer() in mfe.c unnecessary (unless it's a direct connection to shared memory).
> 
> Another solution could be to send events with a special flag telling the mserver to "flush the buffer right 
> away".

That's a very good and useful observation. I never really thought about that. 

Looking at your proposed solutions, I prefer the second one. mserver is just an interface for RPC calls, it should not do anything "by itself". This was a strategic decision at the beginning. 
So sending a flag to punch through the cache on mserver seems to me has less side effects. Will just break binary compatibility :-)

/Stefan
  624   01 Sep 2009 Jimmy NgaiForumTimeout during run transition
Dear All,

I'm using SL5 and MIDAS rev 4528. Occasionally, when I stop a run in odbedit, 
a timeout would occur: 
[midas.c:9496:rpc_client_call,ERROR] rpc timeout after 121 sec, routine 
= "rc_transition", host = "computerB", connection closed
Error: Unknown error 504 from client 'Frontend' on host computerB

This error seems to be random without any reason or pattern. After this error 
occurs, I cannot start or stop any run. Sometime restarting MIDAS can bring 
the system working again, but sometime not.

Another transition timeout occurs after I change any ODB value using the web 
interface:
[midas.c:8291:rpc_client_connect,ERROR] timeout on receive remote computer 
info: 
[midas.c:3642:cm_transition,ERROR] cannot connect to client "Frontend" on host 
computerB, port 36255, status 503
Error: Cannot connect to client 'Frontend'

This error is reproducible: start run -> change ODB value within webpage -> 
stop run -> timeout!

Any idea?

Thanks,
Jimmy
  625   03 Sep 2009 Stefan RittForumTimeout during run transition
> Dear All,
> 
> I'm using SL5 and MIDAS rev 4528. Occasionally, when I stop a run in odbedit, 
> a timeout would occur: 
> [midas.c:9496:rpc_client_call,ERROR] rpc timeout after 121 sec, routine 
> = "rc_transition", host = "computerB", connection closed
> Error: Unknown error 504 from client 'Frontend' on host computerB
> 
> This error seems to be random without any reason or pattern. After this error 
> occurs, I cannot start or stop any run. Sometime restarting MIDAS can bring 
> the system working again, but sometime not.
> 
> Another transition timeout occurs after I change any ODB value using the web 
> interface:
> [midas.c:8291:rpc_client_connect,ERROR] timeout on receive remote computer 
> info: 
> [midas.c:3642:cm_transition,ERROR] cannot connect to client "Frontend" on host 
> computerB, port 36255, status 503
> Error: Cannot connect to client 'Frontend'
> 
> This error is reproducible: start run -> change ODB value within webpage -> 
> stop run -> timeout!

A few hints for debugging:

- do the run stop via odbedit and the "-v" flag, like

[local:Online:R]/> stop -v

then you see which computer is contacted when.

- Then put some debugging code into your front-end end_of_run() routine at the 
beginning and the end of that routine, so you see when it's executed and how long 
this takes. If you do lots of things in your EOR routine, this could maybe cause a 
timeout.

- Then make sure that cm_yield() in mfe.c is called periodically by putting some 
debugging code there. This function checks for any network message, such as the 
stop command from odbedit. If you trigger event readout has an endless loop for 
example, cm_yield() will never be called and any transition will timeout.

- Make sure that not 100% CPU is used on your frontend. Some OSes have problems 
handling incoming network connections if the CPU is completely used of if 
input/output operations are too heavy.

- Stefan
ELOG V3.1.4-2e1708b5