> Then I realized that I see a recursive call to rpc_execute(): rpc_execute() calls tr_stop() calls cm_yield() calls
> ss_suspend() calls rpc_execute(). The second rpc_execute successfully completes, but leave corrupted
> data for the original rpc_execute(), which happily crashes. At the moment of the crash, recursive call to
> rpc_execute() is no longer visible.
This is really strange. I did not protect rpc_execute against recursive calls since this should not happen. rpc_server_receive() is linked to rpc_call() on the client side. So there cannot be
several rpc_call() since there I do the recursive checking (also multi-thread checking) via a mutex. See line 10142 in midas.c. So there CANNOT be recursive calls to rpc_execute() because
there cannot be recursive calls to rpc_server_receive(). But apparently there are, according to your stack trace.
So even if your patch works fine, I would like to know where the recursive calls to rpc_server_receive() come from. Since we have one subproces of mserver for each client, there should only
be one client connected to each mserver process, and the client is protected via the mutex in rpc_call(). Can you please debug this? I would like to understand what is going on there. Maybe
there is a deeper underlying problem, which we better solve, otherwise it might fall back on use in the future.
For debugging, you have to see what commands rpc_call() send and what rpc_server_receive() gets, maybe by writing this into a common file together with a time stamp.
SR |