> Dear All,
>
> I'm using SL5 and MIDAS rev 4528. Occasionally, when I stop a run in odbedit,
> a timeout would occur:
> [midas.c:9496:rpc_client_call,ERROR] rpc timeout after 121 sec, routine
> = "rc_transition", host = "computerB", connection closed
> Error: Unknown error 504 from client 'Frontend' on host computerB
>
> This error seems to be random without any reason or pattern. After this error
> occurs, I cannot start or stop any run. Sometime restarting MIDAS can bring
> the system working again, but sometime not.
>
> Another transition timeout occurs after I change any ODB value using the web
> interface:
> [midas.c:8291:rpc_client_connect,ERROR] timeout on receive remote computer
> info:
> [midas.c:3642:cm_transition,ERROR] cannot connect to client "Frontend" on host
> computerB, port 36255, status 503
> Error: Cannot connect to client 'Frontend'
>
> This error is reproducible: start run -> change ODB value within webpage ->
> stop run -> timeout!
A few hints for debugging:
- do the run stop via odbedit and the "-v" flag, like
[local:Online:R]/> stop -v
then you see which computer is contacted when.
- Then put some debugging code into your front-end end_of_run() routine at the
beginning and the end of that routine, so you see when it's executed and how long
this takes. If you do lots of things in your EOR routine, this could maybe cause a
timeout.
- Then make sure that cm_yield() in mfe.c is called periodically by putting some
debugging code there. This function checks for any network message, such as the
stop command from odbedit. If you trigger event readout has an endless loop for
example, cm_yield() will never be called and any transition will timeout.
- Make sure that not 100% CPU is used on your frontend. Some OSes have problems
handling incoming network connections if the CPU is completely used of if
input/output operations are too heavy.
- Stefan |