Back Midas Rome Roody Rootana
  Midas DAQ System, Page 134 of 142  Not logged in ELOG logo
ID Date Author Topic Subjectup
  973   28 Feb 2014 Andreas SuterSuggestionrunlog is "ugly"
Understand me right, I mostly like the new style, except the runlog as reported.
Attached you will find the diff's you were asking for. But as pointed out, I
haven't worked so far on CSS and hence this should be checked!!

I understand that the mhttpd.js needs to be the default one, however, mhttpd.css
might be left to the end-user to adopt to their specific needs. I shortly
checked in the mhttpd demon. It checks for the resources path in the ODB. If it
also would check for a CSS name, mhttpd.css could be changed/adopted by the
end-users without breaking things (at least it would then be their one business).

> > If I am not mistaken, the mhttpd.css is hard coded (path/name) into the mhttpd.
> 
> mhttpd.css is served from $MIDASSYS/resources/mhttpd.css. The actual path is
reported on the mhttpd 
> "help" page.
> 
> (I think the internal mhttpd.css and mhttpd.js should be removed as no longer
useful - nothing will work 
> right if the real mhttpd.js and mhttpd.css cannot be served).
> 
> > Especially the look and feel of the runlog is unsatisfactorily from my point
of view.
> 
> persons in charge of implementing the CSS stuff failed to convert quite a few
pages, for example, the elog 
> and the history editor pages were left completely broken. (mostly fixed now).
> 
> so thank you for reporting the runlog breakage, I hope Stefan & co can fix it
quickly. (I cannot do - I have 
> have no runlog pages on any of my test experiments).
> 
> > the old style was much more readable.
> 
> I think the new style is not too bad, except for a few visual artefacts here
and there, the general comment 
> that CSS is too complicated and hard to debug and the fact that over-subtle
colouring yields inconsistent 
> visuals between different monitors and ambient lighting conditions. (persons
who select the colours always 
> respond that "but to me, it looks just fine on my laptop", making it hard to
resolve any issues).
> 
> > I could recover the old style look and feel by slightly changing the mhttpd.cxx
> 
> If you post the patches that fix it for you, I can commit them to midas. (git
diff | mail olchansk@triumf.ca).
> 
> K.O.
Attachment 1: mhttpd.cxx.diff
diff --git a/src/mhttpd.cxx b/src/mhttpd.cxx
index 7437897..3940b96 100755
--- a/src/mhttpd.cxx
+++ b/src/mhttpd.cxx
@@ -3445,7 +3445,7 @@ void show_rawfile(const char *path)
    rsprintf("</table>");
 
    //main table:
-   rsprintf("<table class=\"dialogTable\">");
+   rsprintf("<table class=\"runlogTable\">");
    /*---- menu buttons ----*/
    rsprintf("<tr><td colspan=2>\n");
    rsprintf("<input type=submit name=cmd value=\"ELog\">\n");
Attachment 2: mhttpd.css.diff
diff --git a/resources/mhttpd.css b/resources/mhttpd.css
index 71c73a4..066cfe6 100644
--- a/resources/mhttpd.css
+++ b/resources/mhttpd.css
@@ -286,7 +286,7 @@ div.wrapper{
     margin: 0 auto -3em;
 }
 
-.historyConfigTable, .dialogTable, .messageBox, .genericTable, .sequencerTable{
+.historyConfigTable, .runlogTable, .dialogTable, .messageBox, .genericTable, .sequencerTable{
 	border-radius: 12px;
 	border: 2px solid #00B26B;
 	background-color: #EEEEEE;
@@ -304,6 +304,22 @@ table.sequencerTable td table{
 	margin: 0px;
 }
 
+table.genericTable tr:nth-child(even){
+	background: #EEEEEE;
+}
+table.genericTable tr:nth-child(odd){
+	background: #FAFAFA;
+}
+
+table.runlogTable td{
+        border:none;
+        text-align: left;
+        font-family: monospace;
+}
+
+table.runlogTable pre{
+        line-height: 125%;
+}
 
 table.dialogTable td{
 	border:none;
  974   28 Feb 2014 Stefan RittSuggestionrunlog is "ugly"
 > If I am not mistaken, the mhttpd.css is hard coded (path/name) into the mhttpd.

I agree that this should be removed, Unfortunately I'm away right now, so I will fix it next week. Also will put in 
Andreas' diffs.

/Stefan
  977   07 Mar 2014 Stefan RittSuggestionrunlog is "ugly"
I put mhttpd.css and mhttpd.js into the ODB, so every experiment can change it. I put also Andreas' modifications of the CSS file for the runlog table and 
committed the changes.

/Stefan
  1180   13 Jun 2016 Konstantin OlchanskiInforunning mhttpd on port 443
mhttpd running as non-root cannot bind to standard https port 443. By default, mhttpd uses port 
8443 and it works just fine, but some applications such as the SSLlabs https tester insist on using 
port 443.

To connect mhttpd with port 443, I use the tcpproxy package from
git://git.spreadspace.org/tcpproxy.git

./tcpproxy -D -U -p 443 -r localhost4 -o 8443

(you can run this from rc.local)

(to remember, for best security one should run mhttpd behind an industry-standard https proxy)

K.O.
  1150   10 Dec 2015 Amy RobertsSuggestionscript command limited to 256 characters; remove limit?
Both the /Script and /CustomScript trees in the ODB allow users to trigger a 
script via Midas - which silently truncates command strings longer than 
256 characters.

I'd prefer that Midas place no limit on string length.  Failing that, it would be
helpful to have character limits called out in the documentation 
(https://midas.triumf.ca/MidasWiki/index.php//Script_ODB_tree#.3Cscript-name.3E_key_or_subtree,
https://midas.triumf.ca/MidasWiki/index.php//Customscript_ODB_tree).

As far as I can tell, odb.c allows arbitrarily large strings in the ODB data.  
(Although key *names* are restricted to 256 characters.)  I've submitted one 
possible version of an arbitrary-length exec_script() as a pull request 
(https://bitbucket.org/tmidas/midas/pull-requests/).

Am I misunderstanding any critical pieces?  Does Midas intentionally treat 
strings in the ODB as limited to 256 characters?
  1157   28 Jan 2016 Konstantin OlchanskiSuggestionscript command limited to 256 characters; remove limit?
Thank you for reporting this problem:

a) ODB key *names* are restricted to 31 characters (32 bytes, last byte is a NUL), not 256 characters.
b) ODB string length is unlimited (32-bit length field)
c) ODB C API "db_get_value" & co require fixed length buffer and most users of this API provide a 256-byte fixed buffer for strings, some of them also do not 
check the status code, resulting in silent truncation. (I think the ODB functions themselves report truncation to midas.log, so not completely silent).

We try to fix this where we must - but it is cumbersome with the current ODB API - as in your fix on has to:
- get the ODB key, extract size
- allocate buffer
- call db_get_value() & co
- use the data
- remember to free the buffer on each and every return path

The first three steps could become one if we had an ODB "get_data" function that automatically allocated the data buffer.

But the main source of bugs will be the last step - remember to free the buffer, always.

P.S.

We are not alone in pondering how to do this best. If you want to see it "done right",
read the fresh-off-the-presses book "Go Programming Language" by Alan Donovan and Brian Kernighan,
http://www.gopl.io/

Brian Kernighan is the "K" in K&R "C programming language", still around and kicking, now at Google.
Sadly the "R" passed away in 2011 - http://www.nytimes.com/2011/10/14/technology/dennis-ritchie-programming-trailblazer-dies-at-70.html

K.O.

> Both the /Script and /CustomScript trees in the ODB allow users to trigger a 
> script via Midas - which silently truncates command strings longer than 
> 256 characters.
> 
> I'd prefer that Midas place no limit on string length.  Failing that, it would be
> helpful to have character limits called out in the documentation 
> (https://midas.triumf.ca/MidasWiki/index.php//Script_ODB_tree#.3Cscript-name.3E_key_or_subtree,
> https://midas.triumf.ca/MidasWiki/index.php//Customscript_ODB_tree).
> 
> As far as I can tell, odb.c allows arbitrarily large strings in the ODB data.  
> (Although key *names* are restricted to 256 characters.)  I've submitted one 
> possible version of an arbitrary-length exec_script() as a pull request 
> (https://bitbucket.org/tmidas/midas/pull-requests/).
> 
> Am I misunderstanding any critical pieces?  Does Midas intentionally treat 
> strings in the ODB as limited to 256 characters?
  1158   28 Jan 2016 Amy RobertsSuggestionscript command limited to 256 characters; remove limit?
Using low-level memory allocation routines in higher-level programs like mhttpd makes me nervous.

We could use vector arrays to allow variable-sized allocation, and use the data() member function to access the char* needed for functions like strlcat,
db_get_data, and db_sprintf.

This conforms to the c++ standard, but doesn't require explicit freeing by the user - at least, not when you're allocating std::vector<char>.

Amy

> Thank you for reporting this problem:
> 
> a) ODB key *names* are restricted to 31 characters (32 bytes, last byte is a NUL), not 256 characters.
> b) ODB string length is unlimited (32-bit length field)
> c) ODB C API "db_get_value" & co require fixed length buffer and most users of this API provide a 256-byte fixed buffer for strings, some of them also do not 
> check the status code, resulting in silent truncation. (I think the ODB functions themselves report truncation to midas.log, so not completely silent).
> 
> We try to fix this where we must - but it is cumbersome with the current ODB API - as in your fix on has to:
> - get the ODB key, extract size
> - allocate buffer
> - call db_get_value() & co
> - use the data
> - remember to free the buffer on each and every return path
> 
> The first three steps could become one if we had an ODB "get_data" function that automatically allocated the data buffer.
> 
> But the main source of bugs will be the last step - remember to free the buffer, always.
> 
> P.S.
> 
> We are not alone in pondering how to do this best. If you want to see it "done right",
> read the fresh-off-the-presses book "Go Programming Language" by Alan Donovan and Brian Kernighan,
> http://www.gopl.io/
> 
> Brian Kernighan is the "K" in K&R "C programming language", still around and kicking, now at Google.
> Sadly the "R" passed away in 2011 - http://www.nytimes.com/2011/10/14/technology/dennis-ritchie-programming-trailblazer-dies-at-70.html
> 
> K.O.
> 
> > Both the /Script and /CustomScript trees in the ODB allow users to trigger a 
> > script via Midas - which silently truncates command strings longer than 
> > 256 characters.
> > 
> > I'd prefer that Midas place no limit on string length.  Failing that, it would be
> > helpful to have character limits called out in the documentation 
> > (https://midas.triumf.ca/MidasWiki/index.php//Script_ODB_tree#.3Cscript-name.3E_key_or_subtree,
> > https://midas.triumf.ca/MidasWiki/index.php//Customscript_ODB_tree).
> > 
> > As far as I can tell, odb.c allows arbitrarily large strings in the ODB data.  
> > (Although key *names* are restricted to 256 characters.)  I've submitted one 
> > possible version of an arbitrary-length exec_script() as a pull request 
> > (https://bitbucket.org/tmidas/midas/pull-requests/).
> > 
> > Am I misunderstanding any critical pieces?  Does Midas intentionally treat 
> > strings in the ODB as limited to 256 characters?
  1166   26 Feb 2016 Konstantin OlchanskiSuggestionscript command limited to 256 characters; remove limit?
> Using low-level memory allocation routines in higher-level programs like mhttpd makes me nervous.

It should not, people have used malloc() for decades now without much injury to themselves. (Thomas corrects me: some people had big injury to their pride, me included).

> We could use vector arrays to allow variable-sized allocation, and use the data() member function to access the char* needed for functions like strlcat,
> db_get_data, and db_sprintf.

I thought auto_ptr was the correct tool to allocate "I just need a few bytes for a few minutes" arrays, but there is some discrepancy
between delete and delete[] (with brackets) and auto_ptr p(new char[i]) is verboten (even though it compiles just fine).

I ended up writing a custom replacement for auto_ptr called auto_string - now in mhttpd.cxx available for use in other places like this.

Still I think a db_get_data() that returns allocated memory is the correct solution. But this memory still needs to be released and lacking auto_ptr it opens the door for memory leaks.

> This conforms to the c++ standard, but doesn't require explicit freeing by the user - at least, not when you're allocating std::vector<char>

I do not think std::vector<char> can be cast into "char*" and used as replacement of "char str[100]" or "char* str = malloc(i);"

In other new, the limit on the command length is now removed.

K.O.

> 
> Amy
> 
> > Thank you for reporting this problem:
> > 
> > a) ODB key *names* are restricted to 31 characters (32 bytes, last byte is a NUL), not 256 characters.
> > b) ODB string length is unlimited (32-bit length field)
> > c) ODB C API "db_get_value" & co require fixed length buffer and most users of this API provide a 256-byte fixed buffer for strings, some of them also do not 
> > check the status code, resulting in silent truncation. (I think the ODB functions themselves report truncation to midas.log, so not completely silent).
> > 
> > We try to fix this where we must - but it is cumbersome with the current ODB API - as in your fix on has to:
> > - get the ODB key, extract size
> > - allocate buffer
> > - call db_get_value() & co
> > - use the data
> > - remember to free the buffer on each and every return path
> > 
> > The first three steps could become one if we had an ODB "get_data" function that automatically allocated the data buffer.
> > 
> > But the main source of bugs will be the last step - remember to free the buffer, always.
> > 
> > P.S.
> > 
> > We are not alone in pondering how to do this best. If you want to see it "done right",
> > read the fresh-off-the-presses book "Go Programming Language" by Alan Donovan and Brian Kernighan,
> > http://www.gopl.io/
> > 
> > Brian Kernighan is the "K" in K&R "C programming language", still around and kicking, now at Google.
> > Sadly the "R" passed away in 2011 - http://www.nytimes.com/2011/10/14/technology/dennis-ritchie-programming-trailblazer-dies-at-70.html
> > 
> > K.O.
> > 
> > > Both the /Script and /CustomScript trees in the ODB allow users to trigger a 
> > > script via Midas - which silently truncates command strings longer than 
> > > 256 characters.
> > > 
> > > I'd prefer that Midas place no limit on string length.  Failing that, it would be
> > > helpful to have character limits called out in the documentation 
> > > (https://midas.triumf.ca/MidasWiki/index.php//Script_ODB_tree#.3Cscript-name.3E_key_or_subtree,
> > > https://midas.triumf.ca/MidasWiki/index.php//Customscript_ODB_tree).
> > > 
> > > As far as I can tell, odb.c allows arbitrarily large strings in the ODB data.  
> > > (Although key *names* are restricted to 256 characters.)  I've submitted one 
> > > possible version of an arbitrary-length exec_script() as a pull request 
> > > (https://bitbucket.org/tmidas/midas/pull-requests/).
> > > 
> > > Am I misunderstanding any critical pieces?  Does Midas intentionally treat 
> > > strings in the ODB as limited to 256 characters?
  2608   24 Sep 2023 Frederik WautersSuggestionscroll when browsing for a link
Another small user experience request:

When making a link in the odb (web interface) a nice browser window pop's up. There is however not scrolling possible in the window. As a result, you can not reach a odb key if it is nested to deeply. 

Trying to type out the Link target in the field only allows for 32 characters

context: we are setting up a bunch of Links in the History
  2609   26 Sep 2023 Stefan RittSuggestionscroll when browsing for a link
> When making a link in the odb (web interface) a nice browser window pop's up. There is however not scrolling possible in the window. As a result, you can not reach a odb key if it is nested to deeply. 
> 
> Trying to type out the Link target in the field only allows for 32 characters

Thanks for reporting the bug with the pop-up not being able to scroll, I fixed that and committed the change.

I do however not understand the issue with 32 characters. The link NAME should not be more than 32 chars (which applies to all ODB keys). 
But if I try I can write more than 32 chars in the link target.

Stefan
  337   05 Feb 2007 Fedor IgnatovBug Reportsegmentation violation of analyzer on a x86_64
Hello,

When I  connect to analyzer on a x86_64 processor(with Roody),  
a analyzer break with segmentation violation in the root_server_thread  function.
Same code are working fine on a 32bit processor.
As I found the problem are in exchanging of pointers between analyzer and client.
Before to send a pointer, it is saved a pointer in int (size=4, instead of 8) at
this place:
Index: src/mana.c
===================================================================
--- src/mana.c  (revision 3498)
+++ src/mana.c  (working copy)
@@ -5386,7 +5386,7 @@

             //write pointer
             message->Reset(kMESS_ANY);
-            int p = (POINTER_T) obj;
+            POINTER_T p = (POINTER_T) obj;
             *message << p;
             sock->Send(*message);


Sincerely Yours,
Fedor Ignatov 
  341   06 Feb 2007 Stefan RittBug Reportsegmentation violation of analyzer on a x86_64
> Hello,
> 
> When I  connect to analyzer on a x86_64 processor(with Roody),  
> a analyzer break with segmentation violation in the root_server_thread  function.
> Same code are working fine on a 32bit processor.
> As I found the problem are in exchanging of pointers between analyzer and client.
> Before to send a pointer, it is saved a pointer in int (size=4, instead of 8) at
> this place:
> Index: src/mana.c
> ===================================================================
> --- src/mana.c  (revision 3498)
> +++ src/mana.c  (working copy)
> @@ -5386,7 +5386,7 @@
> 
>              //write pointer
>              message->Reset(kMESS_ANY);
> -            int p = (POINTER_T) obj;
> +            POINTER_T p = (POINTER_T) obj;
>              *message << p;
>              sock->Send(*message);
> 
> 
> Sincerely Yours,
> Fedor Ignatov 

Do I understand you right? With your patch it works even on 64 bit, right? Or do you
mean there is still a segmentation violation? Anyhow I committed your patch since the
"int" is clearly incorrect.

- Stefan
  342   06 Feb 2007 Fedor IgnatovBug Reportsegmentation violation of analyzer on a x86_64
Yes right, Problem of a segmentation violation is solved with this patch. Now it works
fine on x86_64.

Fedor 

> Do I understand you right? With your patch it works even on 64 bit, right? Or do you
> mean there is still a segmentation violation? Anyhow I committed your patch since the
> "int" is clearly incorrect.
> 
> - Stefan
  345   17 Feb 2007 Konstantin OlchanskiBug Reportsegmentation violation of analyzer on a x86_64
> Yes right, Problem of a segmentation violation is solved with this patch. Now it works
> fine on x86_64.

Right. I confirm this. I have this exact same fix in my stand-alone copy of the midas
histogram server, and should commit it to MIDAS CVS as well.

K.O.
  2270   19 Aug 2021 Konstantin OlchanskiBug Reportselect() FD_SETSIZE overrun
I am looking at the mlogger in the ALPHA anti-hydrogen experiment at CERN. It is 
mysteriously misbehaving during run start and stop.

The problem turns out to be with the select() system call.

The corresponding FD_SET(), FD_ISSET() & co operate on a an array of fixed size 
FD_SETSIZE, value 1024, in my case. But the socket number is 1409, so we overrun 
the FD_SET() array. Ouch.

I see that all uses of select() in midas have no protection against this.

(we should probably move away from select() to newer poll() or whatever it is)

Why does mlogger open so many file descriptors? The usual, scaling problems in the 
history. The old midas history does not reuse file descriptors, so opens the same 
3 history files (.hst, .idx, etc) for each history event. The new FILE history 
opens just one file per history event. But if the number of events is bigger than 
1024, we run into same trouble.

(BTW, the system limit on file descriptors is 4096 on the affected machine, 1024 
on some other machines, see "limit" or "ulimit -a").

K.O.
  2271   20 Aug 2021 Stefan RittBug Reportselect() FD_SETSIZE overrun
> I am looking at the mlogger in the ALPHA anti-hydrogen experiment at CERN. It is 
> mysteriously misbehaving during run start and stop.
> 
> The problem turns out to be with the select() system call.
> 
> The corresponding FD_SET(), FD_ISSET() & co operate on a an array of fixed size 
> FD_SETSIZE, value 1024, in my case. But the socket number is 1409, so we overrun 
> the FD_SET() array. Ouch.
> 
> I see that all uses of select() in midas have no protection against this.
> 
> (we should probably move away from select() to newer poll() or whatever it is)
> 
> Why does mlogger open so many file descriptors? The usual, scaling problems in the 
> history. The old midas history does not reuse file descriptors, so opens the same 
> 3 history files (.hst, .idx, etc) for each history event. The new FILE history 
> opens just one file per history event. But if the number of events is bigger than 
> 1024, we run into same trouble.
> 
> (BTW, the system limit on file descriptors is 4096 on the affected machine, 1024 
> on some other machines, see "limit" or "ulimit -a").
> 
> K.O.

I cannot imagine that you have more than 1024 different events in ALPHA. That wouldn't 
fit on your status page. 

I have some other suspicion: The logger opens a history file on access, then closes it 
again after writing to it. In the old days we had a case where we had a return from the 
write function BEFORE the file has been closed. This is kind of a memory leak, but with 
file descriptors. After some time of course you run out of file descriptors and crash. 
Now that bug has been fixed many years ago, but it sounds to me like there is another 
"fd leak" somewhere. You should add some debugging in the history code to print the 
file descriptors when you open a file and when you leave that routine. The leak could 
however also be somewhere else, like writing to the message file, ODB dump, ...

The right thing of course would be to rewrite everything with std::ofstream which 
closes automatically the file when the object gets out of scope.

Stefan
  859   11 Feb 2013 Wes GohnForumsend_tcp error
I am getting a series of errors from MIDAS that I do not understand, so I hope
someone can help me figure this out.

I am attempting to run many frontends on one machine. I can run 8 with no
problem, but if I try to add a 9th I get errors relating to send_tcp. 

I have tried adjusting the max event sizes and buffer sizes, but it has not
resolved the problem. I also tried adjusting the data rates and the total data
volume going through each frontend, but there was no change. And as far as I can
tell I am not up against any hardware limits.

The errors are repeated continuously while a run is going. The three errors I
get are:

16:45:22 [FakeData09,ERROR] [midas.c:9958:rpc_client_call,ERROR] send_tcp() failed
16:45:22 [FakeData09,ERROR] [frontend_rpc.c:191:rpc_call,ERROR] No RPC to master
16:45:22 [FakeData09,ERROR] [system.c:4166:send_tcp,ERROR]
send(socket=9,size=16) returned -1, errno: 32 (Broken pipe)

If you have any suggestions of how I can debug this, please let me know. Thanks!
  860   11 Feb 2013 Stefan RittForumsend_tcp error
> I am getting a series of errors from MIDAS that I do not understand, so I hope
> someone can help me figure this out.
> 
> I am attempting to run many frontends on one machine. I can run 8 with no
> problem, but if I try to add a 9th I get errors relating to send_tcp. 
> 
> I have tried adjusting the max event sizes and buffer sizes, but it has not
> resolved the problem. I also tried adjusting the data rates and the total data
> volume going through each frontend, but there was no change. And as far as I can
> tell I am not up against any hardware limits.
> 
> The errors are repeated continuously while a run is going. The three errors I
> get are:
> 
> 16:45:22 [FakeData09,ERROR] [midas.c:9958:rpc_client_call,ERROR] send_tcp() failed
> 16:45:22 [FakeData09,ERROR] [frontend_rpc.c:191:rpc_call,ERROR] No RPC to master
> 16:45:22 [FakeData09,ERROR] [system.c:4166:send_tcp,ERROR]
> send(socket=9,size=16) returned -1, errno: 32 (Broken pipe)
> 
> If you have any suggestions of how I can debug this, please let me know. Thanks!

Can you tell me

- why you need 9 frontends
- what kind of data your frontends produce
- how your event builder looks like and how you assemble the fragments
- what messages/errors you see when you run odbedit BEFORE the crash

/Stefan
  861   11 Feb 2013 Wes GohnForumsend_tcp error
> > I am getting a series of errors from MIDAS that I do not understand, so I hope
> > someone can help me figure this out.
> > 
> > I am attempting to run many frontends on one machine. I can run 8 with no
> > problem, but if I try to add a 9th I get errors relating to send_tcp. 
> > 
> > I have tried adjusting the max event sizes and buffer sizes, but it has not
> > resolved the problem. I also tried adjusting the data rates and the total data
> > volume going through each frontend, but there was no change. And as far as I can
> > tell I am not up against any hardware limits.
> > 
> > The errors are repeated continuously while a run is going. The three errors I
> > get are:
> > 
> > 16:45:22 [FakeData09,ERROR] [midas.c:9958:rpc_client_call,ERROR] send_tcp() failed
> > 16:45:22 [FakeData09,ERROR] [frontend_rpc.c:191:rpc_call,ERROR] No RPC to master
> > 16:45:22 [FakeData09,ERROR] [system.c:4166:send_tcp,ERROR]
> > send(socket=9,size=16) returned -1, errno: 32 (Broken pipe)
> > 
> > If you have any suggestions of how I can debug this, please let me know. Thanks!
> 
> Can you tell me
> 
> - why you need 9 frontends
> - what kind of data your frontends produce
> - how your event builder looks like and how you assemble the fragments
> - what messages/errors you see when you run odbedit BEFORE the crash
> 
> /Stefan

Our experiment will need 24 frontends that will each run on its own machine. For now we
want to run 24 "fake" frontends on one machine for testing purposes. 9 is the limit
where it stops working properly. 

We have a pulser that is giving us periodic data at a constant rate. We have a master
frontend running on a different PC in interrupt mode that assembles the events, and then
N "FakeData" frontends running in polled mode on a single PC. 

We do have an event builder, but we get these errors whether the event builder is
running or not.

At the start of a run, I see the following messages:

[mtransition,INFO] Run #21 started
Sat Feb 9 16:14:57 2013 [FakeData09,ERROR] [system.c:4166:send_tcp,ERROR]
send(socket=9,size=16) returned -1, errno: 104 (Connection reset by peer)
Sat Feb 9 16:14:57 2013 [FakeData09,ERROR] [midas.c:9958:rpc_client_call,ERROR]
send_tcp() failed
Sat Feb 9 16:14:57 2013 [FakeData09,ERROR] [frontend_rpc.c:191:rpc_call,ERROR] No RPC to
master
Sat Feb 9 16:14:57 2013 [master,ERROR] [midas.c:10844:recv_tcp_server,ERROR] Cannot
allocate 268435512 bytes for network buffer
Sat Feb 9 16:14:57 2013 [master,ERROR] [midas.c:12893:rpc_server_receive,ERROR]
recv_tcp_server() returned -1, abort
Sat Feb 9 16:14:57 2013 [master,TALK] Program 'FakeData09' on host 'fe01' aborted

After this it recycles just the first three errors that I mentioned above.
  862   12 Feb 2013 Stefan RittForumsend_tcp error
Ok, now the picture is clearer. I have however no idea what the real problem is. The number of concurrent programs in midas is 64 as defined in midas.h (MAX_CLIENTS) so that should not be the problem. In our experiment we run 10 front-ends (but 
on 10 different machines) without problems. Other experiments used 27 front-ends.

The TCP error you see comes probably from the fact that the mserver side crashes or quits, then the socket gets broken. What you can try to debug this is to run mserver manually. Just remove mserver from inetd, and start it with "mserver -d" and 
watch what happens. Do you see any additional error messages. If the mserver segfaults, you should turn on core dumps and have a look there. Note that the mserver starts a child process on each incoming connection, so running mserver in gdb 
does not really help, since the child processes (which connect back to the front-ends) are not seen by gdb.

Have you tried to run the 9 front-ends on maybe two different PCs (5 and 4) to see if the problem is on the client side?


Best regards,
Stefan
ELOG V3.1.4-2e1708b5