Back Midas Rome Roody Rootana
  Midas DAQ System, Page 99 of 142  Not logged in ELOG logo
IDdown Date Author Topic Subject
  876   12 Apr 2013 Thorsten LuxForumPersistent ipcrm error
[quote="Stefan Ritt"][quote="Thorsten Lux"]In addition now I cannot start
anymore the mlogger from the web interface but only manually. However, I can
stop it from the web interface.[/quote]

At least that one can be fixed easily. Each program has a certain command with
which one can start it. This has to be put into the ODB under
/Programs/<program>. In your case you probably need

/Programs/Logger/Start command = mlogger -D

to start the logger from the Web page. To debug your run stop problems, I would
recommend to start all programs in a terminal window and look which one crashes
on the run end.

/Stefan[/quote]


Hi Stefan,

under /Programs/Logger/Start command I have
/home/next/MIDAS/midas/linux/bin/mlogger -D . This command does not work if I
press the "Start Logger" button on the mhttpd webpage but when I copy and paste
this command to a terminal window, it does the job. 

Well, thanks to you both for the fast response. I wrote Konstantin an email with
the results of the tests he suggested me to do.

Ciao
  875   11 Apr 2013 Stefan RittForumPersistent ipcrm error

Thorsten Lux wrote:
In addition now I cannot start anymore the mlogger from the web interface but only manually. However, I can stop it from the web interface.


At least that one can be fixed easily. Each program has a certain command with which one can start it. This has to be put into the ODB under /Programs/<program>. In your case you probably need

/Programs/Logger/Start command = mlogger -D

to start the logger from the Web page. To debug your run stop problems, I would recommend to start all programs in a terminal window and look which one crashes on the run end.

/Stefan
  874   11 Apr 2013 Konstantin OlchanskiForumPersistent ipcrm error
> [system.c:308:ss_shm_open,ERROR] Shared memory segment with key 0x4d008002 already exists, 
please remove it manually: ipcrm -M 0x4d008002
> [midas.c:1950:cm_connect_experiment1,ERROR] cannot open database
> Unexpected error #304

For the record, the SYSV shared memory with it's keys and segments has always been brittle and hard to 
debug with problems such as you describe.

Also SYSV shared memory suffers from key aliasing - shared memory segments created with different 
names all map into the same key, collide and nothing works. You may not see this if all the files are 
located on a local disk, but if the .SHM files are located on an NFS disk, it can happen (and did happen in 
T2K).

For this reason, since around August 2010, MIDAS also implements the POSIX shared memory and for new 
MIDAS installations, POSIX shared memory is the default. (On MacOS, POSIX shared memory was always 
the default because MacOS has very small maximum SYSV shared memory size).

The type of shared memory is set by the contents of .SHM_TYPE.TXT and it is possible to switch between 
SYSV and POSIX shared memory at will. (Ask me).

MIDAS still uses SYSV semaphores because they have a built-in feature to automatically unlock the 
semaphore if the program that locked it dies for any reason. POSIX semaphores do not have this built-in 
feature and we would have to implement some kind of detection and recovery for the case when a 
semaphore is locked by a program that died (and will never unlock it back).

K.O.

P.S. I will address the rest of Prof. Thorsten's question in a private email.

P.P.S. Please post elog messages in the "plain" format. NOT HTML or ELCODE.
  873   11 Apr 2013 Thorsten LuxForumPersistent ipcrm error
Hello,

I have a problem with our DAQ which is based on Midas. Until now, for about 3 years, it worked quite well but since I tried to restart data taking after a break of 2 months, I get always the following error message:

[system.c:308:ss_shm_open,ERROR] Shared memory segment with key 0x4d008002 already exists, please remove it manually: ipcrm -M 0x4d008002
[midas.c:1950:cm_connect_experiment1,ERROR] cannot open database
Unexpected error #304


Then I tried the following to fix the problem:

-) I first checked with ipcs the shared memory segments:
0x4d008002 3244040 next 666 1077248 1
0x4d00006e 3276809 next 666 116444 1


Sometimes there is an additional line which I also delete.

-) I deleted with ipcrm -M 0x4d008002 / 0x4d00006e the shared memory segments

-) I removed the .SYS*.SHM files:
-rw-r--r-- 1 next users 0 Mar 16 2010 MIDAS/online/.ALARM.SHM
-rw-r--r-- 1 next users 0 Mar 16 2010 MIDAS/online/.ELOG.SHM
-rw-r--r-- 1 next users 0 Mar 16 2010 MIDAS/online/.HISTORY.SHM
-rw-r--r-- 1 next users 0 Mar 16 2010 MIDAS/online/.MSG.SHM
-rw-r--r-- 1 next users 1089536 Apr 11 15:46 MIDAS/online/.ODB.SHM
-rw-r--r-- 1 next users 116444 Apr 11 15:43 MIDAS/online/.SYSMSG.SHM
-rw-r--r-- 1 next users 16793660 Apr 11 15:43 MIDAS/online/.SYSTEM.SHM


-) I reboot the PC

-) I start the midas daemon using a shell script with the following lines:
cd /home/next/CAEN/A2818Drv/
sudo sh a2818_load
mhttpd -p 8080 -D


-) Normally I can start then a run but when I try to stop it I get again the error message from above.

In addition I get from time to time the following error messages:
[mhttpd,INFO] Client 'unknown' on buffer 'SYSMSG' removed by cm_watchdog because client pid 3287 does not exist
[NEXT DAQ,INFO] Client 'unknown' on buffer 'SYSMSG' removed by bm_wait_for_free_space because client pid 3280 does not exist
[mtransition,INFO] Client 'mhttpd' (PID 3229) on buffer 'ODB' removed by cm_watchdog (idle 47.4s,TO 10s)


Since all this did not help and although there was no update of the operation system, I decided the recompile the whole midas framework on this machine.
It compiled and I installed but the error persisted. In addition now I cannot start anymore the mlogger from the web interface but only manually. However, I can stop it from the web interface.

Do you have an idea what could be the problem? I start to be a bit desperate. Also because I am user of the DAQ system but the person who developed the system in the past, left already some years ago.

I am using a midas version from the 15.03.2010 (midas20100315.tar.gz) as it seems. In principle there is only one frontend device, a CAEN V1740 digitizer, connected to Midas.

Thanks!
  872   05 Apr 2013 Konstantin OlchanskiInfoODB JSON support
odbedit can now save ODB in JSON-formatted files. (JSON is a popular data encoding standard associated 
with Javascript). The intent is to eventually use the ODB JSON encoder in mhttpd to simplify passing of 
ODB data to custom web pages. In mhttpd I also intend to support the JSON-P variation of JSON (via the 
jQuery "callback=?" notation).

JSON encoding implementation follows specifications at:
http://json.org/
http://www.json-p.org/
http://api.jquery.com/jQuery.getJSON/  (seek to JSONP)

The result passes validation by:
http://jsonlint.com/

Added functions:
   INT EXPRT db_save_json(HNDLE hDB, HNDLE hKey, const char *file_name);
   INT EXPRT db_copy_json(HNDLE hDB, HNDLE hKey, char **buffer, int *buffer_size, int *buffer_end, int 
save_keys, int follow_links);

For example of using this code, see odbedit.c and odb.c::db_save_json().

Example json file:

Notes:
1) hex numbers are quoted "0x1234" - JSON does not permit "hex numbers", but Javascript will 
automatically convert strings containing hex numbers into proper integers.
2) "double" is encoded with full 15 digit precision, "float" with full 7 digit precision. If floating point values 
are actually integers, they are encoded as integers (10.0 -> "10" if (value == (int)value)).
3) in this example I deleted all the "name/key" entries except for "stringvalue" and "sbyte2". I use the 
"/key" notation for ODB KEY data because the "/" character cannot appear inside valid ODB entry names. 
Normally, depending on the setting of "save_keys" argument, KEY data is present or absent for all entries.

ladd03:midas$ odbedit
[local:testexpt:S]/>cd /test
[local:testexpt:S]/test>save test.js
[local:testexpt:S]/test>exit
ladd03:midas$ more test.js
# MIDAS ODB JSON
# FILE test.js
# PATH /test
{
  "test" : {
    "intarr" : [ 15, 0, 0, 3, 0, 0, 0, 0, 0, 9 ],
    "dblvalue" : 2.2199999999999999e+01,
    "fltvalue" : 1.1100000e+01,
    "dwordvalue" : "0x0000007d",
    "wordvalue" : "0x0141",
    "boolvalue" : true,
    "stringvalue" : [ "aaa123bbb", "", "", "", "", "", "", "", "", "" ],
    "stringvalue/key" : {
      "type" : 12,
      "num_values" : 10,
      "item_size" : 1024,
      "last_written" : 1288592982
    },
    "byte1" : 10,
    "byte2" : 241,
    "char1" : "1",
    "char2" : "-",
    "sbyte1" : 10,
    "sbyte2" : -15,
    "sbyte2/key" : {
      "type" : 2,
      "last_written" : 1365101364
    }
  }
}

svn rev 5356
K.O.
  871   03 Apr 2013 Randolf PohlInfoReview of github and bitbucket
> > * "git bisect" for finding which commit introduced a (reproducible) bug.
> 
> I did not know this command, so I read about it. This IS WONDERFUL! I had once (actually with MSCB) the case that a bug was introduced i the last 100 
> revisions, but I did not know in which. So I checked out -1, -2, -3 revisions, then thought a bit, then tried -99, -98, then had the bright idea to try -50, then 
> slowly converged. Later I realised that I should have done a binary search, like -50, if ok try -25, if bad try -37, and so on to iteratively find the offending 
> commit. Finding that there is a command it git which does this automatically is great news.

even more so considering the nonlinear history (due to branching) in a regular git repo.
  870   03 Apr 2013 Stefan RittInfoReview of github and bitbucket
> * "git bisect" for finding which commit introduced a (reproducible) bug.

I did not know this command, so I read about it. This IS WONDERFUL! I had once (actually with MSCB) the case that a bug was introduced i the last 100 
revisions, but I did not know in which. So I checked out -1, -2, -3 revisions, then thought a bit, then tried -99, -98, then had the bright idea to try -50, then 
slowly converged. Later I realised that I should have done a binary search, like -50, if ok try -25, if bad try -37, and so on to iteratively find the offending 
commit. Finding that there is a command it git which does this automatically is great news.

Stefan
  869   02 Apr 2013 Randolf PohlInfoReview of github and bitbucket
Hi Konstantin,

> > * No central repo. Have all the history with you on the train.
> > * Branching and merging, with stable branches and feature branches.
> >   Happy hacking while my students do analysis on a stable version.
> >   Or multiple development branches for several features.
> 
> This is the part that worries me the most. Without a "central" "authoritative" repository,
> in just a few quick days, everybody will have their own incompatible version of midas.

No! This is probably one of the biggest misunderstandings of the git workflow.

You can of course _define_ one central repo: This is the one that you and Stefan decide to be "the source" (as
Linus does for the kernel). It's like the central svn repo: Only Stefan and you can push to it, and everybody
else will pull from it. Why should I pull MIDAS from some obscure source, when your "public" repo is available.

Look at the Linux Kernel: Linus' version is authoritative, even though everybody and his best friend has his
own kernel repo.

So, the main workflow does not change a lot: You collect patches, commit them, and "push" them to the central
repo. All users "pull" from this central repo. This is very much what svn offers.

> 
> I guess I am okey with your private midas diverging from mainstream, but when *I* end up
> with 10 different incompatible versions just in *my* repository, can that be good?

See above: _You_ define what the central repo is.

But: I _bet_ you will very soon have 10 versions in your personal repo, because _you choose_ to do so. It's
just SO much easier. The non-linear history with many branches is a _feature_. I can't live without it any more:


Looking at my MIDAS analyzer:

I have a "public" repo in /pub/git/lamb.git. This is where I publish my analyzer versions. All my collaborators
pull from this.

Then I have my personal repo in ~/src/lamb. 
This is where I develop. When I think something is ready for the public, I merge this branch into the public repo. 

Whenever I start to work on a new feature, I create a branch in my _local_ repo (~/src/lamb).  I can fiddle and
play, not affecting anybody else, because it never sees the public repo.
OK, collaborator A finds a bug. I switch to my local copy of the public version, fix the bug, and push the fix
to the publix repo. Then I go back to my (local) feature branch, merge the bug fix, and continue hacking.
Only when the feature is ready, I push it to the public repo.

Things get moe interesting as you work on several features simultaneously. You have e.g. 3 topic branches:
(a) is nearly ready, and you want a bunch of people to test it.
    push branch "feature (a)" to the public repo and tell the people which branch to pull.
(b) is WIP, you hack on it without affecting (a).
(c) is bug fixes which may or may not affect (a) or (b).
And so on.

You will soon discover the beauty of several parallel branches.

Plus, git merges are SO simple that you never think about "how to merge"

> 
> >   And merging really works, including fixing up merge conflicts.
> 
> But somebody still has to do it. With a central repository, the problem takes care of
> itself - each developer has to do their own merging - with svn, you cannot commit
> to the head without merging the head into your code first. But with git, I can just throw
> my changes int some branch out there hoping that somebody else would do the merging.
> But guess what, there aint anybody home but us chickens. We do not have a mad finn here
> to enforce discipline and keep us in shape...

See above: You will have the exact same workflow in git, if you like.




> As an example, look at the HADOOP/HDFS code development, they have at least 3 "mainstream"
> branches going, neither has all the features combined together and each branch has bugs with
> the fixes in a different branch. What a way to run a railroad.

I haven't look at this. All I can say: Branches are one of the best features.

> 
> > * "git bisect" for finding which commit introduced a (reproducible) bug.
> > * "gitk --all"
> >
> > Go for git. :-)
> 
> Absolutely. For me, as soon as I can wrap my head around this business of "who does all the merging".

Easy: YOU do it.

Keep going as in svn: Collect patches, and send them out.

And then, try "git checkout -b my_first_branch", hack, hack, hack,
"git merge master".

Best,

Randolf


> 
> K.O.
  868   02 Apr 2013 Konstantin OlchanskiInfoReview of github and bitbucket
Hi, thanks for your positive feedback. I have been using git for small private projects for a few years now
and I like it. It is similar to the old SCCS days - good version control without having to setup servers,
accounts, doodads, etc.

> * No central repo. Have all the history with you on the train.
> * Branching and merging, with stable branches and feature branches.
>   Happy hacking while my students do analysis on a stable version.
>   Or multiple development branches for several features.

This is the part that worries me the most. Without a "central" "authoritative" repository,
in just a few quick days, everybody will have their own incompatible version of midas.

I guess I am okey with your private midas diverging from mainstream, but when *I* end up
with 10 different incompatible versions just in *my* repository, can that be good?

>   And merging really works, including fixing up merge conflicts.

But somebody still has to do it. With a central repository, the problem takes care of
itself - each developer has to do their own merging - with svn, you cannot commit
to the head without merging the head into your code first. But with git, I can just throw
my changes int some branch out there hoping that somebody else would do the merging.
But guess what, there aint anybody home but us chickens. We do not have a mad finn here
to enforce discipline and keep us in shape...

As an example, look at the HADOOP/HDFS code development, they have at least 3 "mainstream"
branches going, neither has all the features combined together and each branch has bugs with
the fixes in a different branch. What a way to run a railroad.

> * "git bisect" for finding which commit introduced a (reproducible) bug.
> * "gitk --all"
>
> Go for git. :-)

Absolutely. For me, as soon as I can wrap my head around this business of "who does all the merging".

K.O.
  867   01 Apr 2013 Randolf PohlInfoReview of github and bitbucket
And my 2ct:

Go for git!

I've been using git since 2007 or so, after cvs and svn. Git has some killer features which I can't miss any more:

* No central repo. Have all the history with you on the train.
* Branching and merging, with stable branches and feature branches.
  Happy hacking while my students do analysis on a stable version.
  Or multiple development branches for several features.
  And merging really works, including fixing up merge conflicts.
* "git bisect" for finding which commit introduced a (reproducible) bug.
* "gitk --all"

I use git for everything: Software, tex, even (Ooffice) Word documents.

Go for git. :-)

Randolf
  866   08 Mar 2013 Konstantin OlchanskiInfoODB /Experiment/MAX_EVENT_SIZE
Somebody pointed out an error in the MIDAS documentation regarding maximum event size 
supported by MIDAS and the MAX_EVENT_SIZE #define in midas.h.

Since MIDAS svn rev 4801 (August 2010), one can create events with size bigger than 
MAX_EVENT_SIZE in midas.h (without having to recompile MIDAS):

To do so, one must increase:
- the value of ODB /Experiment/MAX_EVENT_SIZE
- the size of the SYSTEM shared memory event buffer (and any buffers used by the event builder, 
etc)
- max_event_size & co in your frontend.

Actual limits on the bank size and event size are written up here:
https://ladd00.triumf.ca/elog/Midas/757

The bottom line is that the maximum event size is limited by the size of the SYSTEM buffer which is 
limited by the physical memory of your computer. No recompilation of MIDAS necessary.

K.O.
  865   19 Feb 2013 Wes GohnForumsend_tcp error

Thank you for the help. As it turns out, the problem was due to the fact that we were compiling MIDAS on our 64 bit backend machine, but one of the frontend machines is 32 bit. The problem was resolved by compiling a 32 bit version of MIDAS in
addition to the 64 bit version.
  864   14 Feb 2013 Stefan RittInfoReview of github and bitbucket
Let me add my five cents:

We use bitbucket now since two months at PSI, and are very happy with it.

Pros:

- We like the GIT flow model (http://nvie.com/posts/a-successful-git-branching-model/). You can at the same time do hot fixes, have a "distribution 
version", and keep a development branch, where you can try new things without compromising the distribution.
- Nice and fast Web interface, especially the "blame" is lightning fast compared to SVN/CVS
- GIT is non-centralized, so your local clone of a repository contains everything. If bitbucket is down/asks for money, you can continue with your local 
repository and clone it to some other hosting service, or host it yourself
- SourceTree (http://www.sourcetreeapp.com/) is a nice GUI for Mac lovers. 
- Easy user management
- Free for academic use

Con:

- Wiki is limited as KO wrote, so it should not be used as a "full" wiki to replace Plone for example, just to annotate your project
- SVN revision number is gone. This is on purpose since it does not make sense any more if you keep several parallel branches (merging becomes a 
nightmare), so one has to use either the (random) commit-ID or start tagging again.

So I conclusion, I would say that it's time to switch MIDAS to GIT. We'll probably do that in July when I will be at TRIUMF.

/Stefan
  863   13 Feb 2013 Konstantin OlchanskiInfoReview of github and bitbucket
I have done a review of github and bitbucket as candidates for hosting GIT repositories for collaborative 
DAQ-type projects. Here is my impressions.

1. GIT as a software management tool seems to be a reasonable choice for DAQ-type projects. "master" 
repositories can be hosted at places like github or self-hosted (in the simplest case, only 
http://host/~user web access is required to host a git repository), for each "daq project" aka "experiment" 
one would "clone" the master repository, perform any local modifications as required, with full local 
version control, and when desired feed the changes back to the master repository as direct commits (git 
push), as patches posted to github ("pull requests") or patches emailed to the maintainers (git format-
patch).

2. Modern requirements for hosting a DAQ-type project include:
a) code repository (GIT, etc) with reasonably easy user access control (i.e. commit privileges should be 
assigned by the project administrators directly, regardless of who is on the payroll at which lab or who is 
a registered user of CERN or who is in some LDAP database managed by some IT departement 
somewhere).
b) a wiki for documentation, with similar user access control requirements.
c) a mailing list, forum or bug tracking system for communication and "community building"
d) an ability to web host large static files (schematics, datasheets, firmware files, etc)
e) reasonable web-based tools for browsing the files, looking at diffs, "cvs annotate/git blame", etc.

3. Both github and bitbucket satisfy most of these requirements in similar ways:

a) GIT repositories:
aa) access using git, ssh and https with password protection. ssh keys can be uploaded to the server, 
permitting automatic commits from scripts and cron jobs.
bb) anonymous checkout possible (cannot be disabled)
cc) user management is simple: participants have to self-register, confirm their email address, the project 
administrator to gives them commit access to specific git repositories (and wikis).
dd) for the case of multiple project administrators, one creates "teams" of participants. In this 
configuration the repositories are owned by the "team" and all designated "team administrators" have 
equal administrative access to the project.

b) Wiki:
aa) both github and bitbucket provide rudimentary wikis, with wiki pages stored in secondary git 
repositories (*NOT* as a branch or subdirectory of the main repo).
bb) github supports "markdown" and "mediawiki" syntax
cc) bitbucket supports "markdown" and "creole" syntax (all documentation and examples use the "creole" 
syntax).
dd) there does not seem to be any way to set the "project standard" syntax - both wikis have the "new 
page" editor default to the "markdown" syntax.
ee) compared to mediawiki (wikipedia, triumf daq wiki) and even plone, both github and bitbucket wikis 
lack important features:
1) cannot edit individual sections of a page, only the whole page at once, bad if you have long pages.
2) cannot upload images (and other documents) directly through the web editor/interface. Both wikis 
require that you clone the wiki git repository, commit image and other files locally and push the wiki git 
repo into the server (hopefully without any collisions), only then you can use the images and documents 
in the wiki.
3) there is no "preview" function for images - in mediawiki I can have small size automatically generated 
"preview" images on the wiki page, when I click on them I get the full size image. (Even "elog" can do this!)
ff) to be extra helpful, the wiki git repository is invisible to the normal git repository graphical tools for 
looking at revisions, branches, diffs, etc. While github has a special web page listing all existing wiki 
pages, bitbucket does not have such a page, so you better write down the filenames on a piece of paper.

c) mailing list/forum/bug tracking:
aa) both github and bitbucket implement reasonable bug tracking systems (but in both systems I do not 
see any button to export the bug database - all data is stuck inside the hosting provider. Perhaps there is 
a "hidden button" somewhere).
bb) bitbucket sends quite reasonable email notifications
cc) github is silent, I do not see any email notifications at all about anything. Maybe github thinks I do not 
want to see notices about my own activities, good of it to make such decisions for me.

d) hosting of large files: both git and wiki functions can host arbitrary files (compared to mediawiki only 
accepting some file types, i.e. Quartus pof files are rejected).

e) web based tools: thumbs up to both! web interfaces are slick and responsive, easy to use.

Conclusions:

Both github and bitbucket provide similar full-featured git repository hosting, user management and bug 
tracking.

Both provide very rudimentary wiki systems. Compared to full featured wikis (i.e. mediawiki), this is like 
going back to SCCS for code management (from before RCS, before CVS, before SVN). Disappointing. A 
deal breaker if my vote counts.

K.O.
  862   12 Feb 2013 Stefan RittForumsend_tcp error
Ok, now the picture is clearer. I have however no idea what the real problem is. The number of concurrent programs in midas is 64 as defined in midas.h (MAX_CLIENTS) so that should not be the problem. In our experiment we run 10 front-ends (but 
on 10 different machines) without problems. Other experiments used 27 front-ends.

The TCP error you see comes probably from the fact that the mserver side crashes or quits, then the socket gets broken. What you can try to debug this is to run mserver manually. Just remove mserver from inetd, and start it with "mserver -d" and 
watch what happens. Do you see any additional error messages. If the mserver segfaults, you should turn on core dumps and have a look there. Note that the mserver starts a child process on each incoming connection, so running mserver in gdb 
does not really help, since the child processes (which connect back to the front-ends) are not seen by gdb.

Have you tried to run the 9 front-ends on maybe two different PCs (5 and 4) to see if the problem is on the client side?


Best regards,
Stefan
  861   11 Feb 2013 Wes GohnForumsend_tcp error
> > I am getting a series of errors from MIDAS that I do not understand, so I hope
> > someone can help me figure this out.
> > 
> > I am attempting to run many frontends on one machine. I can run 8 with no
> > problem, but if I try to add a 9th I get errors relating to send_tcp. 
> > 
> > I have tried adjusting the max event sizes and buffer sizes, but it has not
> > resolved the problem. I also tried adjusting the data rates and the total data
> > volume going through each frontend, but there was no change. And as far as I can
> > tell I am not up against any hardware limits.
> > 
> > The errors are repeated continuously while a run is going. The three errors I
> > get are:
> > 
> > 16:45:22 [FakeData09,ERROR] [midas.c:9958:rpc_client_call,ERROR] send_tcp() failed
> > 16:45:22 [FakeData09,ERROR] [frontend_rpc.c:191:rpc_call,ERROR] No RPC to master
> > 16:45:22 [FakeData09,ERROR] [system.c:4166:send_tcp,ERROR]
> > send(socket=9,size=16) returned -1, errno: 32 (Broken pipe)
> > 
> > If you have any suggestions of how I can debug this, please let me know. Thanks!
> 
> Can you tell me
> 
> - why you need 9 frontends
> - what kind of data your frontends produce
> - how your event builder looks like and how you assemble the fragments
> - what messages/errors you see when you run odbedit BEFORE the crash
> 
> /Stefan

Our experiment will need 24 frontends that will each run on its own machine. For now we
want to run 24 "fake" frontends on one machine for testing purposes. 9 is the limit
where it stops working properly. 

We have a pulser that is giving us periodic data at a constant rate. We have a master
frontend running on a different PC in interrupt mode that assembles the events, and then
N "FakeData" frontends running in polled mode on a single PC. 

We do have an event builder, but we get these errors whether the event builder is
running or not.

At the start of a run, I see the following messages:

[mtransition,INFO] Run #21 started
Sat Feb 9 16:14:57 2013 [FakeData09,ERROR] [system.c:4166:send_tcp,ERROR]
send(socket=9,size=16) returned -1, errno: 104 (Connection reset by peer)
Sat Feb 9 16:14:57 2013 [FakeData09,ERROR] [midas.c:9958:rpc_client_call,ERROR]
send_tcp() failed
Sat Feb 9 16:14:57 2013 [FakeData09,ERROR] [frontend_rpc.c:191:rpc_call,ERROR] No RPC to
master
Sat Feb 9 16:14:57 2013 [master,ERROR] [midas.c:10844:recv_tcp_server,ERROR] Cannot
allocate 268435512 bytes for network buffer
Sat Feb 9 16:14:57 2013 [master,ERROR] [midas.c:12893:rpc_server_receive,ERROR]
recv_tcp_server() returned -1, abort
Sat Feb 9 16:14:57 2013 [master,TALK] Program 'FakeData09' on host 'fe01' aborted

After this it recycles just the first three errors that I mentioned above.
  860   11 Feb 2013 Stefan RittForumsend_tcp error
> I am getting a series of errors from MIDAS that I do not understand, so I hope
> someone can help me figure this out.
> 
> I am attempting to run many frontends on one machine. I can run 8 with no
> problem, but if I try to add a 9th I get errors relating to send_tcp. 
> 
> I have tried adjusting the max event sizes and buffer sizes, but it has not
> resolved the problem. I also tried adjusting the data rates and the total data
> volume going through each frontend, but there was no change. And as far as I can
> tell I am not up against any hardware limits.
> 
> The errors are repeated continuously while a run is going. The three errors I
> get are:
> 
> 16:45:22 [FakeData09,ERROR] [midas.c:9958:rpc_client_call,ERROR] send_tcp() failed
> 16:45:22 [FakeData09,ERROR] [frontend_rpc.c:191:rpc_call,ERROR] No RPC to master
> 16:45:22 [FakeData09,ERROR] [system.c:4166:send_tcp,ERROR]
> send(socket=9,size=16) returned -1, errno: 32 (Broken pipe)
> 
> If you have any suggestions of how I can debug this, please let me know. Thanks!

Can you tell me

- why you need 9 frontends
- what kind of data your frontends produce
- how your event builder looks like and how you assemble the fragments
- what messages/errors you see when you run odbedit BEFORE the crash

/Stefan
  859   11 Feb 2013 Wes GohnForumsend_tcp error
I am getting a series of errors from MIDAS that I do not understand, so I hope
someone can help me figure this out.

I am attempting to run many frontends on one machine. I can run 8 with no
problem, but if I try to add a 9th I get errors relating to send_tcp. 

I have tried adjusting the max event sizes and buffer sizes, but it has not
resolved the problem. I also tried adjusting the data rates and the total data
volume going through each frontend, but there was no change. And as far as I can
tell I am not up against any hardware limits.

The errors are repeated continuously while a run is going. The three errors I
get are:

16:45:22 [FakeData09,ERROR] [midas.c:9958:rpc_client_call,ERROR] send_tcp() failed
16:45:22 [FakeData09,ERROR] [frontend_rpc.c:191:rpc_call,ERROR] No RPC to master
16:45:22 [FakeData09,ERROR] [system.c:4166:send_tcp,ERROR]
send(socket=9,size=16) returned -1, errno: 32 (Broken pipe)

If you have any suggestions of how I can debug this, please let me know. Thanks!
  858   06 Feb 2013 Stefan RittInfoCompression benchmarks
I redid the tests from Konstantin for our MEG experiment at PSI. The event structure is different, so it
is interesting how the two different experiments compare. We have an event size of 2.4 MB and a trigger
rate of ~10 Hz, so we produce a raw data rate of 24 MB/sec. A typical run contains 2000 events, so has a 
size of 5 GB. Here are the results:


cat                 : time   7.8s, size   4960156030   4960156030, comp   0%, rate 639M/s 639M/s

gzip -1             : time 147.2s, size   4960156030   2468073901, comp  50%, rate  33M/s  16M/s

pbzip2 -p1          : time 679.6s, size   4960156030   1738127829, comp  65%, rate   7M/s   2M/s (1 CPU)
pbzip2 -p8          : time  96.1s, size   4960156030   1738127829, comp  65%, rate  51M/s  18M/s (8 CPU)


As one can see, our compression ratio is poorer (due to the quasi random noise in our waveforms), but the
difference between gzip -1 and pbzip2 is larger (15% instead 10% for DEAP). The single CPU version of
pbzip cannot sustain our DAQ rate of 24 MB, but the parallel version can. Actually we have a somehow old
dual-core dual-CPU board 2.5 GHz Xenon box, and make 8 hyper-threading CPUs out of the total 4 cores.
Interestingly the compression rate scales with 7.3 for 8 virtual cores, so hyper-threading does its job.
So we take all our data with the pbzip2 compression. The additional 15% as compared with gzip does 
not sound much, but we produce raw 250 TB/year. So gzip gives us 132 TB/year and pbzip2 gives 
us 98 TB/year, and we save quite some disks.

Note that you can run bzip2 (as all the other methods) already now with the current logger, if you specify
an external compression program in the ODB using the pipe functionality:


local:MEG:S]/>cd Logger/Channels/0/Settings/
[local:MEG:S]Settings>ls
Active                          y
Type                            Disk
Filename                        |pbzip2>/megdata/run%06d.mid.bz2
Format                          MIDAS
Compression                     0
ODB dump                        y
Log messages                    0
Buffer                          SYSTEM
Event ID                        -1
Trigger mask                    -1
Event limit                     0
Byte limit                      0
Subrun Byte limit               0
Tape capacity                   0
Subdir format                   
Current filename                /megdata/run197090.mid.bz2
</pre>
  857   01 Feb 2013 Stefan RittForumanalyzer cannot connect to the statistics database
> The simplest thing is probably to delete all files .[A-Z]*.SHM in the odb directory (the
> one you specified in /etc/exptab).
> This wipes the ODB, shared memory and all the other obscure stuff, giving you a clean,
> fresh start.
> 
> Of course it wipes all the valuable stuff, too. That's why it's handy to sometimes open
> odbedit and "save odb_<yyyymmdd>.odb". You can reload the thing after such a fatal 
> "rm .[A-Z]*.SHM" 

Thanks Randolf for helping out, I was not in the office this week.

In addition of deleting the *SHM files, it's sometimes necessary to delete the shared memory. You do this with the 
command line tools

ipcs -m
ipcrm -m <shmid>


/Stefan
ELOG V3.1.4-2e1708b5