ELOG Midas

Back Midas Rome Roody Rootana

Midas DAQ System, Page 17 of 161

Not logged in

Find | Login | Help

Full | Summary | Threaded | Hide attachments

3201 Entries

Goto page Previous 1, 2, 3 ... 16, 17, 18 ... 159, 160, 161 Next

MIDAS packaged examples: compilation bug?

Shawn Bishop wrote:

Anyone have an idea what's going on here?

The Makefile contained the outdated target fal, which is a combined frontend/analyzer/logger. You don't need that, so I removed it from the makefile. Now it should compile fine.

New multi-threaded midas slow control system

Multi-threaded slow control system

The Midas slow control system has been modified to support multi-threaded slow control front-ends. Each device gets it's own thread in the front-end, which has several advantages:

- the communication of all devices runs in parallel and therefor is much faster
- slow devices cannot block any more the front-end. Response times to run transitions etc. become therefore much faster.

This modification requires some minor modifications in the existing class and device drivers.

Dropping of CMD_xxx_ALL commands

The slow control commands CMD_SET_ALL, CMD_GET_ALL, CMD_SET_CURRENT_LIMIT_ALL, CMD_GET_CURRENT_LIMIT_ALL, etc. have been dropped. They were there to accomodate some slow devices, which sometimes works a bit faster if all channels are set or read at once. Since the inter-thread communication scheme implemented now does only allow passing one channel at a time, the "ALL" functions cannot be supported any more. On the other hand this is not such an issue any more, since slow devices are handled now in parallel, speeding up things considereably.

The command have been removed from midas.h and from all device and class drivers coming with the midas distribution. If you have your own drivers, just delete the sections wich use these commands.

Calling the device driver inside the class driver

The device drivers have now to be called differently in the class driver. The reason for that is that in a multi-threaded front-end, there is only one central device driver dispatcher, which communicates with the individual device driver threads. The device drivers do not need to be modified, but all existing class drivers need modification, if they are going to be run in a multi-threaded front-end. Old class drivers which are not used in a multi-threaded front-end do not to be modified.

Following modifications are necessary:

Remove following line:
```
#define DRIVER(_i) ...
```
Find all lines containing
```
DRIVER(i)(CMD_xxx, info->dd_info[i], ...)
```
and replace them with
```
device_driver(info->driver[i], CMD_xxx, ...)
```
note that info->dd_info[i] is not passed any more. Instead, you pass info->driver[i]. Pleae note that the arguments passed after CMD_xxx are not checked by the compiler, since they are a variable argument list. Any error there will not produce a compiler warning, but will just crash the front-end.

Find the line with

status = pequipment->driver[i].dd(CMD_INIT, hKey, &pequipment->driver[i].dd_info,
                                        pequipment->driver[i].channels,
                                        pequipment->driver[i].flags,
                                        pequipment->driver[i].bd);

and replace it with

status = device_driver(&pequipment->driver[i], CMD_INIT, hKey);

Find the line with

pequipment->driver[i].dd(CMD_EXIT, pequipment->driver[i].dd_info);

and replace it with

device_driver(&pequipment->driver[i], CMD_EXIT);

Find following lines

hv_info->driver[i] = pequipment->driver[index].dd;
hv_info->dd_info[i] = pequipment->driver[index].dd_info;
hv_info->channel_offset[i] = offset;
hv_info->flags[i] = pequipment->driver[index].flags;

and replace them with

hv_info->driver[i] = &pequipment->driver[index];
hv_info->channel_offset[i] = offset;

The class drivers multi.c and generic.c can be used as a reference for these modifications.

Implementing CMD_STOP command

For multithread-enabled device drivers it is necessary to support the CMD_STOP command, which is needed to stop all device threads before the actual device gets closed. Following code is necessary:

INT cd_xxx(INT cmd, EQUIPMENT * pequipment)
{
   INT i, status;

   switch (cmd) {
   case CMD_INIT:
      ...

   case CMD_STOP:
      for (i = 0; pequipment->driver[i].dd != NULL &&
                  pequipment->driver[i].flags & DF_MULTITHREAD ; i++)
         status = device_driver(&pequipment->driver[i], CMD_STOP);
      break;

   case CMD_IDLE:
      ...

   return status;
}

Enabling multi-thread support

To turn on multi-thread support for a device, the flag DF_MULTITHREAD must be used in the front-end user code device driver list, such as

DEVICE_DRIVER multi_driver[] = {
   {"Input", nulldev, 2, null, DF_INPUT | DF_MULTITHREAD},
   {"Output", nulldev, 2, null, DF_OUTPUT | DF_MULTITHREAD},
   {""}
};

"double" values are truncated

> The mhttpd ODB displays and mhist truncate values of "float" and "double"
> floating point variables to 6 digits. In reality, "float" has 7 significant
> digits and "double" has 16. I recommend that db_sprintf() in odb.c be changed to
> read this:
> 
>       case TID_FLOAT:
>          sprintf(string, "%.7g", *(((float *) data) + index));
>          break;
>       case TID_DOUBLE:
>          sprintf(string, "%.16g", *(((double *) data) + index));
>          break;
> 
> K.O.

I had there

      case TID_FLOAT:
         if (ss_isnan(*(((float *) data) + index)))
            sprintf(string, "NAN");
         else
            sprintf(string, "%g", *(((float *) data) + index));
         break;
      case TID_DOUBLE:
         if (ss_isnan(*(((double *) data) + index)))
            sprintf(string, "NAN");
         else
            sprintf(string, "%lg", *(((double *) data) + index));
         break;

so I assumed that "%g" takes care of the maximal resolution. But apparently it does
not. So I changed it as you proposed.

Increase of maximum event size

Dear midas users,

The current event size in midas is limited to 512k (MAX_EVENT_SIZE in midas.h). This is mainly due to old (pre 2.2) linux kernels which had only a very limited shared memory pool. These days this limit has increased considerably and I question if we should increase the default event size and to which size we should increase it.

The drawback of a larger event size is that the SYSTEM event buffer has to hold at least two events, and when the last midas program is stopped or started, this buffer has to be written to or read from the .SYSTEM.SHM file, which slows down the start/stop of the program. But writing/reading a few MB is fast these days anyhow so this again might now be a big problem. So what do you think how big we should make the default max event size?

- Stefan

Increase of maximum event size

Since nobody complained so far, I increased MAX_EVENT_SIZE to 2MB. If anybody has problems with this setting, please report. Note that after updating to SVN revision 3327 it will be necessary to recompile all midas programs and to delete any old SYSTEM.SHM or .SYSTEM.SHM. I added some code which should check for inconsistent SYSTEM.SHM sizes, but I'm not sure if it works everywhere.

mhttpd elog corruption via double-edit

K.O. wrote:

Aparently the mhttpd elog will corrupt the elog files if two (or more\?) elog entries are being edited at the same time. K.O.

That's strange. Since mhttpd is single threaded, there should not be any multi-thread/process conflict there, since the elog files cannot be written simultaneously from two different browser sessions. If entries are edited at the same time, they get then submitted one after the other. Of course it is possible to edit the same entry, in which case the second submission "wins", overwriting the first one without notification. Withing the standalone elog server there is the option to lock entries ("use lock = 1") to prevent this, but this feature is not present in the mhttpd elog.

Increase of maximum event size

K.O. wrote:

Now, we have per-buffer tunable size (see message
https://ladd00.triumf.ca/elog/Midas/283) and in the long run, I would prefer the
compiled-in limit to go away: already all memory is allocated dynamically and
the MAX_EVENT_SIZE is only useful as kind of a sanity check against frontend
misconfiguration or against malformed events.

If MAX_EVENT_SIZE goes away, the maximum event size becomes limited by the
largest SysV shared memory segment permitted by Linux (via sysctl kernel.shmmax).

To go beyound the limit on SysV shared memories, on can use mmap() based shared
memory: this is limited by available RAM+swap (and disk space for the
.SYSTEM.SHM file). Current MIDAS system.c has an experimental implementation of
mmap() shared memory, but AFAIK it has not been used in any production system, yet.

MAX_EVENT_SIZE is also used for the RPC layer, since the receiving buffer must hold at
least one event. It is right that this can and should be made dynamically. Concerning
the shared memory there is the problem that it cannot be increased when any program is
running and attached to the shared memory, so it can only be defined at startup of the
first program creating the shared memory.

The sanity check in the frontend is done against max_event_size defined in frontend.c which can be smaller than MAX_EVENT_SIZE (some front-ends have limited memory).

So I agree that this issue may need revision, maybe something for me next visit Wink

mhttpd elog corruption via double-edit

> I do not know how to "properly" fix this bug without changing the indexing
> scheme to something similar to what is used by elogd- message numbers instead of
> file indices. In the existing scheme, message editing also breaks URLs shown in
> the email notifications (they contain file indices that point to the wrong
> places after messages are moved around by editing) and "reply threading" links.

Well, the development of elogd with it's message numbers was actually stimulated by
the problem you mentioned. After that all those problems went away. Another
incarnation of that problem is if you edit an mhttpd log file manually. Afterwards
the file offsets are different and the system gets corrupted. To fix this properly,
one would have to backport the el_xxx functions from elogd to mhttpd, or, even
simpler, remove the elog functionality in mhttpd and "force" everybody to use elogd
(after doing elconv to convert the files into the new format).

"make install" error on MacOS 10.4.7, svn 3366

> While executing "make install" under MacOS 10.4.7, you may encounter errors about "dio". It is the 
> problem of "Makefile". I did some change to it and attach the diff file here.

I committed your patch. Thank you.

Build error with mana.c while using CERNLIB, svn 3366

Committed, thanks.

Access to out_info from mana.c

I changed out_info into a global structure definition ANA_OUTPUT_INFO and put it into
midas.h, so it can be accessed easily from the user analyzer source code.

> Would it be relevant to transform out_info into a *non-static* variable of a type
> defined by a *named* struct?
> Currently,  programs that  try to access out_info cannot do it anymore; and they
> typically copy the struct definition from mana.c, which is not robust against future
> changes in mana.c.
> 
> If mana.c could be changed in the way described above, that would be great . 
> Otherwise, is it safe to patch it myself for local use?  or is there a better way of
> accessing out_info from mana.c?
> 
> As always, any help would be much appreciated :)
> 
> EOL
> 
> > Hello,
> > 
> > Is it possible to access out_info (defined in mana.c) from another program?
> > 
> > In fact, out_info is now defined as an (anonymous) "static struct" in mana.c,
> > which it seems to me precludes any direct use in another program.  Is there an
> > indirect way of getting ahold of out_info?  or of the information it contains?
> > 
> > out_info used to be defined as a *non-static* struct, and the code I'm currently
> > modifying used to compile seamlessly: it now stops the compilation during
> > linking time, as out_info is now static and the program I have to compile
> > contains an "extern struct {} out_info".
> > 
> > Any help would be much appreciated!  I searched in vain in this forum for
> > details about out_info and I really need to access the information it contains!
> > 
> > EOL (a pure MIDAS novice)

Shared memory problems

> Hello,
> 
> Just did a fresh install of MIDAS from the SVN repository under CentOS and
> everything compiles fine, but when I go to run the frontend (using dio), I get
> the following error message:
> 
> Connect to experiment ...[odb.c:868:db_open_database] Different database format:
>  Shared memory is 14, program is 2
> [midas.c:1763:cm_connect_experiment1] cannot open database
> 
> 
> Any ideas on what the problem could be, or how to fix it?  

You have an old .ODB.SHM from a previous version in your directoy (note the '.' in
front, so you need a 'ls -alg' to see it). Delete that file and try again.

Shared memory problems

That sounds like you mix versions: You have an old executable (maybe your mlogger) which
has been linked against the old midas version, but you create the ODB with the new
odbedit or frontend. The new version complains if it finds an ODB from a previous version
(the error you reported first), but an old program does not have that version check, so
it finds a different binary ODB structure and crashes.

> Thanks for your help.  I tried again and it got me back to the initial problem I had.
>  The frontend will start, and the analyzer starts (complains about there not being a
> last.root, but other than that it's fine), and then when starting mlogger, I get:
> 
> [odb.c:860:db_validate_db] Warning: database corruption, first_free_key 0x0001A4
> 04
> [odb.c:3666:db_get_key] invalid key handle
> [midas.c:1970:cm_check_client] cannot delete client info
> [odb.c:3666:db_get_key] invalid key handle
> [midas.c:1970:cm_check_client] cannot delete client info
> [odb.c:3666:db_get_key] invalid key handle
> 
> 
> And it continues to shoot out error messages about invalid key handles until I kill
> it.  Then trying to start the frontend again fails until I remove the .ODB.SHM file. 
> Any other ideas?
> 
> > > Hello,
> > > 
> > > Just did a fresh install of MIDAS from the SVN repository under CentOS and
> > > everything compiles fine, but when I go to run the frontend (using dio), I get
> > > the following error message:
> > > 
> > > Connect to experiment ...[odb.c:868:db_open_database] Different database format:
> > >  Shared memory is 14, program is 2
> > > [midas.c:1763:cm_connect_experiment1] cannot open database
> > > 
> > > 
> > > Any ideas on what the problem could be, or how to fix it?  
> > 
> > You have an old .ODB.SHM from a previous version in your directoy (note the '.' in
> > front, so you need a 'ls -alg' to see it). Delete that file and try again.

Denis Bilenko wrote:

1. Blocking calls to midas api aren't usable when client is connected through mserver. This is true at least for bm_receive_event, but seems to be a more general problem - midas application has call cm_yield within 10 seconds (or whatever timeout is set) to remain alive.
That not the case when RPC is not used.

The 10 seconds timeout you see comes from the RPC layer. If you call bm_receive_event and it blocks, then the client will consider a RPC timeout after 10 seconds. Has nothing to do with cm_yield(). Calling a blocking function via a sever connection is not a good idea anyhow, since this process then cannot respond on anything else, like run transitions. That's why I never used it and that's why I have not realized that behaviour. I did change it however such that bm_receive_event, if called without the ASYNC flag, disables the RPC timeout for this call and restores it afterwards. This is now in midas.c revision 3502. You can try this with midas/examples/lowlevel/produce and consume easily.

Denis Bilenko wrote:

2. On Windows, two processes on the same machine can send/receive events to each other only if they both use midas locally (through shared mem) or they both use midas via RPC (through mserver), but not if they use different ways.

I just tried again and it did work. I used produce/consume. If you enter just <return> for the host name, these programs connect locally. So I tried both producer locally, consumer remote, and vice versa, and both worked. I did however use consume with the callback functionality. I did not try your Python programs however. If you find out that produce/consume does work and your Python program don't, then adapt your Python programs to resemble produce/consume.

Denis Bilenko wrote:

3. Receiving/sending same events from the same process - was possible in 1.9.5-1, not so in the current version (revision 3501, mxml revision 45). Is this an intended behavior fix?

Yes. It was introduced in revision 3186 on July 28th, 2006. It fixed a problem that the buffer level was always shown as 100% full, even if there were no other clients registered. By ignoring the own process, the buffer level now correctly shows the "contents" of a buffer from 0..100%. It also gave a small speed improvement. If you want to send events to the own process, you have to do it from the calling level. Like if you call bm_send_event(), you call manually process_event or however your event receiving routine is called. This is also much faster than going through the buffer.

Denis Bilenko wrote:

1 & 3 - thanks for the fix and the explanation, as for 2 - I've tried consume and produce
and still has a problem

Acknowledged. I could reproduce it with the information you supplied, thank you very much. Also the data rate is slower than what I expect. I will investigate and fix this, but it could take some time.

I tried again and could not reproduce the problem. Last time I was probably confused by some old mserver.exe executable I had lying around. I updated to the most recent version (3516) and did a C:\midas> nmake -f makefile.nt. Last time I was also confused about the low rate, but that was caused by a mserver.exe executable which was not compiled with optimization. For small event sizes (such as 10 bytes) there is a big difference between optimized and non-optimized code. So I got:

First Console wrote:

ID of event to produce: 1
Host to connect: localhost
Event size: 10
Level:   0.0 %, Rate: 0.46 MB/sec
flush
Level:   0.0 %, Rate: 0.43 MB/sec
Level:   0.0 %, Rate: 0.43 MB/sec
Level:   0.0 %, Rate: 0.42 MB/sec
Level:   0.0 %, Rate: 0.42 MB/sec
Level:   0.0 %, Rate: 0.43 MB/sec
Level:   0.0 %, Rate: 0.43 MB/sec
Level:   0.0 %, Rate: 0.44 MB/sec
Level:   0.0 %, Rate: 0.42 MB/sec
Level:   0.0 %, Rate: 0.43 MB/sec
Level:   0.0 %, Rate: 0.43 MB/sec
flush
Level:   0.0 %, Rate: 0.44 MB/sec
Level:   0.0 %, Rate: 0.44 MB/sec
Level:   0.0 %, Rate: 0.40 MB/sec
Level:   0.0 %, Rate: 0.42 MB/sec
Level:   0.0 %, Rate: 0.43 MB/sec
Level:   0.0 %, Rate: 0.43 MB/sec
Level:   0.0 %, Rate: 0.44 MB/sec
Level:   0.0 %, Rate: 0.43 MB/sec
Level:   0.0 %, Rate: 0.43 MB/sec
Level:   0.0 %, Rate: 0.43 MB/sec
flush

and

Second Console wrote:

C:\midas\NT\bin>.\consume
ID of event to request: 1
Host to connect:
Get all events (0/1): 1
Receive via callback ([y]/n):
[consume.c:73:process_event] Serial number mismatch: Ser: 1169666, OldSer: 0, ID
: 1, size: 10
Level:   0.0 %, Rate: 0.00 MB/sec, ser mismatches: 1
Level:   0.0 %, Rate: 0.42 MB/sec, ser mismatches: 1
Level:   0.0 %, Rate: 0.41 MB/sec, ser mismatches: 1
Level:   0.0 %, Rate: 0.41 MB/sec, ser mismatches: 1
Level:   0.0 %, Rate: 0.42 MB/sec, ser mismatches: 1
Level:   0.0 %, Rate: 0.41 MB/sec, ser mismatches: 1
Level:   0.0 %, Rate: 0.41 MB/sec, ser mismatches: 1
Level:   2.4 %, Rate: 0.35 MB/sec, ser mismatches: 1
Level:   0.0 %, Rate: 0.50 MB/sec, ser mismatches: 1
Level:   0.0 %, Rate: 0.41 MB/sec, ser mismatches: 1
Level:   0.0 %, Rate: 0.41 MB/sec, ser mismatches: 1
Level:   0.0 %, Rate: 0.41 MB/sec, ser mismatches: 1
Level:   0.0 %, Rate: 0.41 MB/sec, ser mismatches: 1
Level:   0.0 %, Rate: 0.41 MB/sec, ser mismatches: 1
Level:   0.0 %, Rate: 0.41 MB/sec, ser mismatches: 1
Level:   0.0 %, Rate: 0.41 MB/sec, ser mismatches: 1
Level:   0.0 %, Rate: 0.41 MB/sec, ser mismatches: 1
Level:   0.0 %, Rate: 0.40 MB/sec, ser mismatches: 1
Received break. Aborting...

Actually sending remote and receiving local is a very common thing. Most experiments use that. They have a remote frontend, and the logger and analyzer work locally. If that would not work, all these experiments would have a problem. So I only can encourage you to try again, make sure to update and recompile the executables. Maybe delete any old *.SHM file. Maybe try on another PC or under Linux.

Large files under Windows XP

Hello,

We have problems analyzing large files under Windows XP. For small file sizes,
everything is ok. We have events of 2.8 MB each, and we can read ~30 events per
second. But if the file gets larger than typically 600-800 MB, then access
becomes very slow, about 1 event per second. This is not the case under Linux,
where it stays at 30 Hz (~90 MB/sec). 

Looking at the low level file access, it is obvious that this has nothing to do
with midas, this problem can be reproduced with a simple program reading chunks
of 3MB from a 1GB file. The Windows XP file system is NTFS, default formatting.
Does anyone else have observed a similar problem or maybe even have some
suggestions? Unfortunately many people here want to analyze midas data under
Windows...

Stefan Ritt

wrong version in include/midas.h?

> The present .../include/midas.h contains
> [alpha@laddvme06 ~/online]$ grep 1.9.5 /home/alpha/packages/midas/include/*
> /home/alpha/packages/midas/include/midas.h:#define MIDAS_VERSION "1.9.5"
> 
> All MIDAS utilities (odbedit ver) presently report version 1.9.5, even for svn
> trunk, and this may confuse people as to what version of midas they are using,
> and may complicate reporting of bugs.
> 
> Perhaps the trunk version should say something like "svn-22233344" (the svn
> revision number)? The present "1.9.5" is wrong...

Fully agree. I added a svn_revision string into midas.h, which gets reported now
by "odbedit ver". Unfortunately this reflects only changes in midas.c. If one
changes odb.c for example, the svn revision in midas.c does not get modified by
the SVN system. In addition I changed the present version 1.9.5 to 2.0.0. I made
the tar and zip files. After some internal testing, it will be announced
officially in a few days.

Problem solved by Re-define _syscall0(...)

Exaos Lee wrote:

Maybe it's not the perfect way, but it works. Smile

I changed it to:

#ifdef OS_UNIX

   return syscall(SYS_gettid);

#endif                          /* OS_UNIX */
[/code1]

without any #define.

Does this work for you?

- Stefan

segmentation violation of analyzer on a x86_64

> Hello,
> 
> When I  connect to analyzer on a x86_64 processor(with Roody),  
> a analyzer break with segmentation violation in the root_server_thread  function.
> Same code are working fine on a 32bit processor.
> As I found the problem are in exchanging of pointers between analyzer and client.
> Before to send a pointer, it is saved a pointer in int (size=4, instead of 8) at
> this place:
> Index: src/mana.c
> ===================================================================
> --- src/mana.c  (revision 3498)
> +++ src/mana.c  (working copy)
> @@ -5386,7 +5386,7 @@
> 
>              //write pointer
>              message->Reset(kMESS_ANY);
> -            int p = (POINTER_T) obj;
> +            POINTER_T p = (POINTER_T) obj;
>              *message << p;
>              sock->Send(*message);
> 
> 
> Sincerely Yours,
> Fedor Ignatov 

Do I understand you right? With your patch it works even on 64 bit, right? Or do you
mean there is still a segmentation violation? Anyhow I committed your patch since the
"int" is clearly incorrect.

- Stefan

Goto page Previous 1, 2, 3 ... 16, 17, 18 ... 159, 160, 161 Next

ELOG V3.1.6-083448f7