ELOG Midas

Back Midas Rome Roody Rootana

Midas DAQ System, Page 38 of 45

Not logged in

Find | Login | Help

New entries since:

Wed Dec 31 16:00:00 1969

Full | Summary | Threaded | Collapse | Expand

899 Entries

Goto page Previous 1, 2, 3 ... 37, 38, 39 ... 43, 44, 45 Next

23 Feb 2007, Konstantin Olchanski, Info, RFC- history system improvements

While running the ALPHA experiment at CERN, we stressed and broke the MIDAS history system. We
generated about 0.5 GB of history data per day, and this killed the performance of the history plot
system in mhttpd - we had to wait for *minutes* to look at any plots of any variables.

One way to address this problem could be by changing the way ALPHA slow controls data is collected.

Another way to address this problem could be by improving the midas history system by removing
some of the existing limitations and inefficiencies, enabling it to handle the ever increasing data
volumes we keep throwing at it.

I feel the second approach (improving midas) is more useful in general and it appears that big
improvements can be made by small modifications of existing code. No rewrites of midas are required.
Read on.

Issue 1: in the mlogger, history is recorded with fairly coarse granularity.

For an equipment, if any varible changes, *all* variables for that equipment are written into the history
file.

Historically, this worked fairly well for experiments with low data rates (a few history changes per
minute) and with variables equally distributed between different equipments. But even for a modest
sized experiment like TRIUMF-E614-TWIST, recording many variables when only one has changed has
been a visible inefficiency. Current experiments wish to record more history data more frequently, but
even with latest and greatest hardware, in the case of ALPHA, this inefficiency has become a
performance killer.

One could solve this problem by refactoring the data (one variable per equipment/one equipment per
variable). I find this approach inelegant and contrary to the "midas way" (whatever that is).

An alternative would be to change the mlogger to record history with per-variable granularity. When
one variable changes, only that variable is recorded. Preliminary examination of the existing code
indicates that history writing in the mlogger is already structured in a way that makes it easy to
implement, while the history reading code does not seem to need any changes at all.

Issue 2: all history data is recorded into a single file.

Again, this has worked well historically. In fact, until not so long ago, it was the only sane way to record
history data because operating systems could not efficiently write data into multiple files at the same
time. Insifficient data buffering, suboptimal storage allocation strategies - all leading to bad
performance. Latest Linux kernels have largely resolved all such issues.

The present problem arises when recording large amounts of history data (say 100 variables) and then
making a history plot of 1 variable. Because data for the one variable of interest is spread across the
whole file, effectively, the whole file has to be read into memory, data for 1 variable collected and data
for the other 99 variables skipped.

In this case, a speed up by a factor of 100 could be obtained by recording (say) one variable per history
file. (Yes, the history code does use "lseek", but the seek granularity of modern disks is very coarse and
in my tests, reading the whole file (streaming) is almost faster than seeking through it).

One has to be very careful when looking at these numbers and running benchmarks. Modern computers
with fast disks and large RAM performs very well no matter how history data is stored and organized.
Performance problems surface only under the load when running the production system, when the
disks are busy recording the main data stream and all RAM is consumed by user applications doing
data analysis.

The obvious solution to this problem is to record each variable into a separate data file. This will
require modifications to the history writing code in the mlogger and to the history reading code in
mhttpd, mhist & co.

An extra challenge in this tast is to minimize changes to the existing code and to keep compatibility
with the existing data files - new code should be able to read existing data files.

I propose to organize data into subdirectores:
history/equipmentNNN/variableVVV/YYMMDD.hst

This scheme does two good things for the history plotting in mhttpd:

1) note that mhttpd always plots one variable at a time, and the variables are addressed by equipment
(int) and variable name (string) (plus the array index). In the proposed scheme, the code would know
exactly which history file to open to get the data, no scanning of directories or seeking inside the
history file.

2) when setting up mhttpd history plots, the code can easily see what equipment and variables exist
and *ever existed*. The present code only examines the latest history file and cannot see variables that
have been deleted (or not yet written into the existing file). For example, one cannot see variables that
existed in the 2005 history but were removed (or renamed) in 2006. (Yes, it can be done by an expert
using mhist to examine the 2005 history files and odbedit to manually setup the history plots).

Over the next few weeks, I will proceed with implementing these two improvements: (1) mlogger write
history with per-variable granularity; (2) history file split into one-file-per-variable. If my initial
assessment is correct and the changes indeed are small, contained, non-intrusive and compatible with
existing history files, I will submit them for inclusion into mainline midas.

K.O.

26 Feb 2007, Stefan Ritt, Info, RFC- history system improvements

I agree to what you propose. I'm pretty sure you are right in getting a significant improvement in readout speed
of the history system. So far there was no big request for improving the history system, since the performance in
the experiments I was involved in was good. In MEG for example, we have ~20MB of history data per day, and all
plots even going back some months can be made in a couple of seconds. Have a look for example at

http://midas.psi.ch/megon00/HS/PCS/Pressures.gif?hscale=1843200&hoffset=-5068800

This plot stretches over two weeks and involves ~500 MB of history data, and is prepared in a couple of seconds.
The key question here is how big the disk cache of the OS is. The above plot does not read all 500 MB, but skips
many data points in order to obtain ~1000 data points (one per pixel) for the requested period. To find these
data points, it reads and scans the history index files (yymmdd.idx), which are only a few percent of the
yymmdd.hst data files. The index file contains only the time stamp, the event id and the location of the event in
the *.hst file. Scanning the index file is as efficient as scanning a history file with a single variable. Now
comes the access of the history file. For ~1000 data points, 1000 locations have to be read. This requires
reading in the FAT table for the history file and accessing the sector clusters containing the data. In worst
case one has to read 1000 clusters. With a cluster size of 2kB this will be 2MB of data, something which can be
read very quickly. On the MEG system I observe that the first history plot takes about 5 seconds, while all
consecutive plots take about 1 second. This indicates that the FAT information is cached by the OS. This depends
of course as you indicated correctly on how much memory is available for disk caching, how many processes are
running etc. and will finally determine how fast your history access will be.

So if you implement your proposed new scheme, please consider the following:

- Scanning a single variable file is about the same as scanning the current index file. You save however the
access to the data file. If you plot several variables together, you have to access several "single variable
files", so your access time scales with the number of variables. In the current system, it's likely that
different variables from the same event are located in the same cluster. So you have to read the history file
once for each variable, but after the first variable the sectors of interest are very likely cached by the OS. So
I would estimate that the break-even point is about 2-3 variables. I mean if you read more that three variables,
your proposed method might get slower than the current one. This is of course not the case if there are very many
events in the history file. In that case the index file might be much bigger, since it gets a new entry if *any*
variable in an event changes. If all index file together are bigger than you disk cache, the system will become
slow (and I guess that's what you see). In MEG, the index file is about 1MB per day, so a few weeks fit easily
into the disk cache.

- In order not to get too much data, the history system needs fine tuning. Each slow control system class driver
as an "update threshold", which is used to determine if a variable has "changed". For some noisy channels, it
might be worth to set the threshold at 3 sigma of the noise level (RMS). This can reduce your history data
dramatically. For some equipment, you even might consider to define a minimum update period. This is done via
"/Equipment/<name>/Common/Log history". If that variable is set to 10, the time between two consecutive history
records is at least 10 seconds. For some temperatures for example it might make sense to set this even to one
minute or so, depending on how fast your temperatures change.

- If you implement a per-variable history, you probably have to use the per-event hot link in the ODB. Otherwise
you would exceed the number of hot links MAX_OPEN_RECORDS which is currently 256. If you then get a hot link
update, you have to check manually which variable(s) have changed in log_history() in mlogger.c

- Before you actually go and implement the full system, I would write some small test code to "simulate" the new
scheme. Write some dummy files with the full data you expect in the ALPHA experiment and see what the improvement
is under realistic conditions. Only if you see a big improvement it's worth to implement the full code. Test this
on various machine to get a better overview. Maybe it's worth testing different file systems and cluster sizes as
well.

- If there is an improvement, I'm more than happy to replace the current history code in midas. It might however
not be clean to have a heterogeneous history system, where some files are in the old format and some in the new.
It might be better to write a little conversion routine which converts the old format into the new one, even
omitting records where single variables did not change. This conversion could be even put into the standard
mlogger code and is executed automatically if the logger is started first and finds some old data files.

Even if the speed improvement is not so big, one will certainly win a lot on disk file size (like if only one
variable out of 100 changes). This will probably make it worth to implement anyhow.

16 Mar 2007, Konstantin Olchanski, Info, RFC- history system improvements

> Let's improve the midas history system...

After implementing 2 prototypes, one aspect of the new design is starting to firm up enough to write it down (I do so in a mock FAQ format).

Q. I ran an experiment at triumf, returned home and now I have a bunch of midas history files (*.hst) on my laptop. How do I export these history 
data to some useful format?
A. Run "mhdump *.hst | import_to_sql.perl" or "mh2ttree -o history.root *.hst" (export to mysql or ROOT TTree respectively). (TBW: 
import_to_sql.perl and mh2ttree)

Q. I have all these midas history files (*.hst), how do I look at them with mhttpd?
A. Follow these steps:
1) setup a blank experiment (no frontends, no analyzer, no mlogger), make sure you can run odbedit and mhttpd.
2) put (symlink) the history files into the history (data) directory
3) run "mhdump -t *.hst > tags.cmd"
4) run "odbedit -c @tags.cmd"
5) start mhttpd, go to the "history" page, setup history plots
6) look at history plots as usual

As always, all the cool stuff is happening behind the scenes:

- in step (3) and (4) we create ODB entries for all events and tags in the history files:
/history/tags/2 = "Trigger"   <--- declare event 2 "Trigger" (was equipment "Trigger" while we were taking data)
/history/tags/2:Rate = 1       <--- declare tag "Rate" as an array of one element
/history/tags/2:Scalers = 10 <--- declare tag "Scalers" as an array of 10 elements
... and so forth for each event and tag that ever existed in the history files.

When running a live experiment, the /history/tags entries are created by the mlogger.

- in step (5), the history plot setup page reads the names of history events and tags from /history/tags. The existing code for extracting the 
names of events and tags from the /equipment tree goes away. The variables part of history plots are saved the same way as now, i.e. 
"Trigger:Rate" and "Trigger:Scalers[3]" - existing plot definitions continue working as before.

- in step (6), to plot the variable named "Trigger:Scalers[3]", the mhttpd code again reads /history/tags to find out that "Trigger" corresponds to 
event id 2 and "Scalers" is a valid array (of size 10). This is enough to call hs_read() with the correct arguments to read the existing .hst files - the 
existing code will even regenerate the .idx and .def history files.

How do existing experiments migrate to the new code? It is all automatic, no user actions needed. For writing history files, there are no changes. 
For reading history files, the "new mhttpd" expects to find /history/tags, which will be created automatically by the "new mlogger".

I am presently cleaning up the implementation of this idea in mhttpd and in the mlogger (only those 2 files are affected- 2 functions in mhttpd.c 
and 1 function in mlogger.c) and after some testing it will be ready for commiting to midas svn.

The next step would be changes in mlogger.c for recording the history for each variable separately (each variable gets it's own event id). I have 
this implemented, but interaction with mhttpd is still in flux and I may want to run the new code at CERN for a few months before I deem it stable 
enough for general use.

K.O.

06 Mar 2007, Konstantin Olchanski, Info, commited mhttpd fixes & improvements

I commited the mhttpd fixes and improvements to the history code accumulated while running the ALPHA 
experiment at CERN:

- fix crashes and infinite loops while generating history plots (also seen in TWIST)
- permit more than 10 variables per history plot
- let users set their own colours for variables on history plot
- (finally) add gui elements for setting mimimum and maximum values on a plot
- implement special "history" mode. In this mode, the master mhttpd does all the work, except for 
generating of history plots, which is done in a separate mhttpd running in history mode, possibly on a 
different computer (via ODB variable "/history/url").

I also have improvements to the mhttpd elog code (better formatting of email) and to the "export history 
plot as CSV" function, which I will not be commiting: for elog, we switched to the standalone elogd; and 
CSV export is still very broken, even with my fixes.

The commited fixes have been in use at CERN since last Summer, but I could have introduced errors 
during the merge & commit. I am now using this new code, so any new errors should surface and get 
squashed quickly.

K.O.

27 Feb 2007, Piotr Zolnierczuk, Forum, event builder scalability

Hi there:
I have a question if there's anybody out there running MIDAS with event builder
that assembles events from more that just a few front ends (say on the order of
0x10 or more)?
Any experiences with scalability?

Cheers
 Piotr

27 Feb 2007, Stefan Ritt, Forum, event builder scalability

> Hi there:
> I have a question if there's anybody out there running MIDAS with event builder
> that assembles events from more that just a few front ends (say on the order of
> 0x10 or more)?
> Any experiences with scalability?

At the MEG experiment at PSI we run with 5 front-ends (later 8), each running at
about 10 MB/sec. This gives an overall rate of 50MB/sec without any problem. The
CPU load on the backend (2.6 GHz dual Xenon) is 30% for the event builder and 26%
for the logger. The DANCE experiment at Los Alamos runs 17 front-ends if I'm not
mistaken (John?).

27 Feb 2007, John M O'Donnell, Forum, event builder scalability

At Los Alamos, we have 15+1 frontends - the 15 between them read about 2 or 3
TB/hour and reduce it to 1 to 5 GB/hour which is then sent to the mevb on a 17th
computer.  The 16th frontend handles deadtime issues and scalers (small data rate).

frontends are 1GHz pentium 3, and backend is 2.8GHz dual CPU with hyperthreading.
Interconnect is 100Mb ethernet from frontends to switch, and 1Gb ethernet from
switch to backend.

Our bottle neck is (a) compactPCI backplane reading data from waveform digitizers
to the frontend CPUs and (b) CPU power on the frontend CPUs to analyzer the waveforms.

John

27 Feb 2007, Stefan Ritt, Forum, event builder scalability

> Our bottle neck is (a) compactPCI backplane reading data from waveform digitizers
> to the frontend CPUs and (b) CPU power on the frontend CPUs to analyzer the waveforms.

I forgot to mention that our front-ends at MEG are 2.8 GHz dual Xenon with Hyperthreading.
This gives "virtual" 4 CPU cores which are really necessary for waveform calibration and
analysis. It makes use of the new multi-threading feature in the midas front-end. I run
actually 7 threads (one VME readout, 4 calibration threads, one encoding thread and the
main thread sending data to the backend. This speeds up data taking by a factor of four
compared to a single thread. So if one plans for waveform analysis in the frontend to
reduce the data, I would recommend a box with dual quad cores.

02 Mar 2007, Kevin Lynch, Forum, event builder scalability

> Hi there:
> I have a question if there's anybody out there running MIDAS with event builder
> that assembles events from more that just a few front ends (say on the order of
> 0x10 or more)?
> Any experiences with scalability?
> 
> Cheers
>  Piotr

Mulan (which you hopefully remember with great fondness :-) is currently running
around ten frontends, six of which produce data at any rate.  If I'm remembering
correctly, the event builder handles about 30-40MB/s.  You could probably ping Tim
Gorringe or his current postdoc Volodya Tishenko (tishenko@pa.uky.edu) if you want
more details.  Volodya solved a significant number of throughput related
bottlenecks in the year leading up to our 2006 run.

03 Mar 2007, Piotr Zolnierczuk, Forum, event builder scalability

Hi all,
thank you for all responses. 

It seems that there's no problem running MIDAS with event builder assembling
data from ~10 front-ends. How about ~100? One possible solution is to have a
multi-tiered architecture. 

The reason I am asking is that we are in the process of designing an Ethernet
based DAQ system with front-ends running on embedded computers (Linux/ARM
CPU/Xilinix FPGA) and MIDAS is one of my options as a DAQ framework.
I am open for advice/suggestions.

Thanks again
  Piotr

03 Mar 2007, Stefan Ritt, Forum, event builder scalability

> It seems that there's no problem running MIDAS with event builder assembling
> data from ~10 front-ends. How about ~100? One possible solution is to have a
> multi-tiered architecture. 
> 
> The reason I am asking is that we are in the process of designing an Ethernet
> based DAQ system with front-ends running on embedded computers (Linux/ARM
> CPU/Xilinix FPGA) and MIDAS is one of my options as a DAQ framework.
> I am open for advice/suggestions.

The event builder is a standalone application not part of the "midas core". It
receives data from N producers and combines the fragments into events based on
their serial number as a dedicated process. If it would become a bottleneck, it
can simply be redesigned and optimized. I made currently good experience with
multi-threaded applications running on multi-core CPUs. Implementing your
multi-tiered architecture as a multi-threaded event builder, where each of ten
threads receives data from ten front-ends, combines them and passes them to the
"collector thread" would make sense to me. Between the threads you can pass data
with many GB/sec, as compared to an ethernet-based architecture. I currently
implemented the rb_xxx functions inside midas.c which lets you pass data between
threads on a zero-copy basis.

Inside the core functions of midas there is no limitations whatsoever. All
counters etc. are 32-bit, so you can run 2^32 data consumers etc. You will first
hit the OS process limit. What I'm more concerned is your network bandwidth. If
you run 100 front-ends each with more than 1MB/sec, you would hit the 1GBit limit
of your network card. If you put more network interfaces, you will hit the disk
I/O limit which is around 100-200MB/sec even on larger RAID1 disk arrays (unless
you do data compression during event building). 

Another limit I see is the run transition. On each start/stop of a run, the
process which wants to start/stop the run has to contact all producers via a TCP
connection. Opening 100 TCP connection will take maybe 10-30 seconds, which is not
very convenient. A multi-threaded approach will help, but this is not (yet)
implemented, maybe you would have to do it yourself.

Another approach would be that you put the event building "in front of midas". All
your front-ends run a specific protocol outside of midas. They send their data to
a collecting process which acts as a single front-end to midas. So in the midas
framework you see only a single front-end, which gets it's data not from hardware,
but from 100 other nodes. This way you can optimize the protocol between your
front-end nodes and the collector process for your application. Run transitions
can be done through multicast UDP messages for example, which will even work with
1000 front-ends. But you have to implement that yourself.

I would start with the first approach: Taking the out-of-the box midas, see how
far I get. If you have access to a normal linux cluster, you can simply run ten
dummy front-ends on each of ten nodes, thus simulating 100 front-ends and see how
far you get. If the event builder is the bottle neck, do an optimization or
redesign. If the run transitions become your bottle neck, switch to method two. In
both ways you can utilize the downstream part of midas, like the logger, the
history system, etc. so you would still gain a lot compared to a design from scratch.

Best regards,

  Stefan

26 Feb 2007, Stefan Ritt, Info, Fragmented polled events

Fragmented polled events have been implemented in SVN revision 3625.
Fragmentation is a method of breaking down large (>MB) events into smaller
pieces and send them through the shared memory buffers, reassembling them at the
output. In the past this was only possible for periodic events (such as large
histograms read out once every few seconds), but now this is also possible for
polled events.

26 Feb 2007, Stefan Ritt, Info, Usage of event channel for improved throughput

Starting from SVN revision 3642, sending events from the front-end has been revised.

Since long time ago, there is a special TCP socket established between any front-end and the mserver which can be used to bypass the midas RPC layer completely and purely send events. There was a #define USE_EVENT_CHANNEL but to my knowledge nobody used it.

While optimizing data throughput for the MEG experiment, I revisited this mechanism and got it finally working. Here are some benchmark tests made with the produce program on two dual-CPU machines running on Gigabit Ethernet:

Using normal RPC socket:

event size    speed [MB/sec] CPU usage front-end  CPU usage server
==================================================================
    40          3            22                   100                
  1000         44            25                   100
100000        101            14                    50

Using new event socket:

event size    speed [MB/sec] CPU usage front-end  CPU usage server
==================================================================
    40         12            100                   34                
  1000         99            58                    59
100000        101            14                    43

As can be seen, the CPU load on the server drops significantly for smaller events since the processing time per event is reduced. If the transfer was limited by the server, the throughput goes up significantly. For large events the bottleneck on the server side is the memcpy of events, so no big improvement is visible. The saved CPU time however can be used to analyze more events for example.

The event socket is now enabled by default in the front-end by setting

rpc_mode = 1

in mfe.c and should be checked carefully in various experiments. There is a small chance that events get stuck in the buffer cache on the server side at the end of the run, in which case they would show up as the first events of the next run. I know that this problem happened in some experiment before, but that must have been unrelated to the rpc_mode. So please check again and report any problem with the new rpc_mode.

23 Feb 2007, Konstantin Olchanski, Info, RFC- support for writing to removable hard disk storage

At triumf, we are developing a system to use removable hard drives to store data collected by midas
daq stations. The basic idea is to replace storage on 300 GB DLT tapes with storage on removable
esata, usb2 or firewire 750 GB hard drives.

To minimize culture shock, we stay as close as possible to the "tape" paradigm. Two removable disks
are used in tandem. Data is written to the first removable disk until it is full. Then midas automatically
switches to the second disk and asks the operator to replace the full disk with a blank disk. Similar to
handling tapes, the operator takes the full disk and stores it on the shelf (offline); takes a blank disk
and connects it to the computer. To read data from one of the disks, the operator takes the disk from
the shelf and connects it to the daq computer or to some other computer equipped with a compatible
removable storage bay. The full data disks are mounted read-only to prevent accidental data
modifications.

Two pieces of software are needed to implement this system:

1) midas support for switching to alternate output disks as they become full. Data could be written to
the removable disk directly by the mlogger (no extra data copy on local disks) or by the lazylogger
(mlogger writes the data to the local disk, then the lazylogger copies it to the removable disk). Writing
directly to the removable disk is more efficient as it avoids the one extra data copy operation by the
lazylogger.

2) a user interface utility for mounting and dismounting removable disks. Handling of removable disks
cannot be fully automatic: before unplugging a removable disk, the user has to inform the system; after
connecting a removable disk, the user has to tell the system to mount it read-only (for existing data),
read-write (to add more data) or to initialize a blank disk (fdisk+mkfs). (Also, some SATA interfaces do
not implement automatic hot-plug: they have to be manually told "please look for new disks").

We are presently evaluating various internal SATA hot-plug enclosures. We evaluated external eSATA
and USB2 enclosures and decided not to use them: while the performance is adequate, presence of
extra bulky components (eSATA and USB cables, non-standardized power bricks) and the extra cost of
eSATA and USB hard drive enclosures makes them unattractive.

I am open to suggestions and comments. I am most interested in hearing which data path (mlogger or
the lazylogger) would be most useful for other users.

K.O.

23 Feb 2007, John M O'Donnell, Info, RFC- support for writing to removable hard disk storage

We stopped using tapes at Los Alamos a while ago.  The model we use is:

write data with mlogger to a local RAID system.  This is NFS mounted read only on teh analysis machines, and
becomes the working copy for most tasks.  Copy data to external hardrives.  We have been using USB.  The USB
system is sometime a little flaky (lnux 2.4.21-7, so we have a computer dedicated to this task.  The USB driver
can be reloaded, or if the user is not so knowledgeable, the copmuter can be rebooted.  users on this computer
have sudo privs, so they can format hard drives.  The disks are inserted into boxes while in use, and stored on
a shelf for data archival, so we don't have a lot of enclosures.

I use the automounter to mount and unmount the drives.  With a 10 second timeout, the user needs only to wait a
few seconds before unplugging the disk.  (cat /proc/mounts allows them to check if they want.) dmesg allows
them to find the drive letter.  This works for any device which appears later as a SCSI disk.  The automounter
manages /mnt/usb for vfat formatted devices, and /mnt/usbl for ext3 formatted devices (preferred for data
archiving).

autofs config files are:

/etc/auto.usb

# This is an automounter map and it has the following format
# key [ -mount-options-separated-by-comma ] location
# Details may be found in the autofs(5) manpage
 
*       -fstype=auto,nosuid,nodev,umask=0000,noatime    :/dev/&

/etc/auto.usbl

# This is an automounter map and it has the following format
# key [ -mount-options-separated-by-comma ] location
# Details may be found in the autofs(5) manpage
 
*       -fstype=auto,nosuid,nodev       :/dev/&

/etc/auto.master contains

/mnt/usb                /etc/auto.usb  --timeout=10
/mnt/usbl               /etc/auto.usbl --timeout=10


John.

> At triumf, we are developing a system to use removable hard drives to store data collected by midas 
> daq stations. The basic idea is to replace storage on 300 GB DLT tapes with storage on removable 
> esata, usb2 or firewire 750 GB hard drives.
> 
> To minimize culture shock, we stay as close as possible to the "tape" paradigm. Two removable disks 
> are used in tandem. Data is written to the first removable disk until it is full. Then midas automatically 
> switches to the second disk and asks the operator to replace the full disk with a blank disk. Similar to 
> handling tapes, the operator takes the full disk and stores it on the shelf (offline); takes a blank disk 
> and connects it to the computer. To read data from one of the disks, the operator takes the disk from 
> the shelf and connects it to the daq computer or to some other computer equipped with a compatible 
> removable storage bay. The full data disks are mounted read-only to prevent accidental data 
> modifications.
> 
> Two pieces of software are needed to implement this system:
> 
> 1) midas support for switching to alternate output disks as they become full. Data could be written to 
> the removable disk directly by the mlogger (no extra data copy on local disks) or by the lazylogger 
> (mlogger writes the data to the local disk, then the lazylogger copies it to the removable disk). Writing 
> directly to the removable disk is more efficient as it avoids the one extra data copy operation by the 
> lazylogger.
> 
> 2) a user interface utility for mounting and dismounting removable disks. Handling of removable disks 
> cannot be fully automatic: before unplugging a removable disk, the user has to inform the system; after 
> connecting a removable disk, the user has to tell the system to mount it read-only (for existing data), 
> read-write (to add more data) or to initialize a blank disk (fdisk+mkfs). (Also, some SATA interfaces do 
> not implement automatic hot-plug: they have to be manually told "please look for new disks").
> 
> We are presently evaluating various internal SATA hot-plug enclosures. We evaluated external eSATA 
> and USB2 enclosures and decided not to use them: while the performance is adequate, presence of 
> extra bulky components (eSATA and USB cables, non-standardized power bricks) and the extra cost of 
> eSATA and USB hard drive enclosures makes them unattractive.
> 
> I am open to suggestions and comments. I am most interested in hearing which data path (mlogger or 
> the lazylogger) would be most useful for other users.
> 
> K.O.

26 Feb 2007, Stefan Ritt, Info, RFC- support for writing to removable hard disk storage

In the MEG experiment, we simply installed 100TB of RAID disks and don't need to change anything Wink

But seriously, you are right that such a system might be beneficial. I propose to extend the current logger code to switch disks. In the current tr_start() funciton in mlogger, the code checks for "subdir_format" to create separate subdirectories like once per week. One could extend this code in the following way:

- Add an array of strings and name it "Path", such as

/dev/sda1/datadir/
/dev/sdb1/datadir/

- On each stop of the run, check if the current disk has enough space for one more run. Take either the "Byte limit" of that channel, or the actual size of the last run and multiply it by two or so. If the disk is "almost full", switch to the next array element in "Path". Append the file name, such as "/dev/sda1/datadir/run1234.mid" and put this into "Current filename" as a feedback for the user. Now write to the new disk/file.

- Add as string like "Execute on switch", which gets called after you switched to the next disk. This shell script can then handle the un-mounting of the full disk, notify the user etc. This is similar to the "/Programs/Execute on start run" in the ODB, but it gets only called if you switch the disk.

05 Feb 2007, Fedor Ignatov, Bug Report, segmentation violation of analyzer on a x86_64

Hello,

When I  connect to analyzer on a x86_64 processor(with Roody),  
a analyzer break with segmentation violation in the root_server_thread  function.
Same code are working fine on a 32bit processor.
As I found the problem are in exchanging of pointers between analyzer and client.
Before to send a pointer, it is saved a pointer in int (size=4, instead of 8) at
this place:
Index: src/mana.c
===================================================================
--- src/mana.c  (revision 3498)
+++ src/mana.c  (working copy)
@@ -5386,7 +5386,7 @@

             //write pointer
             message->Reset(kMESS_ANY);
-            int p = (POINTER_T) obj;
+            POINTER_T p = (POINTER_T) obj;
             *message << p;
             sock->Send(*message);


Sincerely Yours,
Fedor Ignatov

06 Feb 2007, Stefan Ritt, Bug Report, segmentation violation of analyzer on a x86_64

> Hello,
> 
> When I  connect to analyzer on a x86_64 processor(with Roody),  
> a analyzer break with segmentation violation in the root_server_thread  function.
> Same code are working fine on a 32bit processor.
> As I found the problem are in exchanging of pointers between analyzer and client.
> Before to send a pointer, it is saved a pointer in int (size=4, instead of 8) at
> this place:
> Index: src/mana.c
> ===================================================================
> --- src/mana.c  (revision 3498)
> +++ src/mana.c  (working copy)
> @@ -5386,7 +5386,7 @@
> 
>              //write pointer
>              message->Reset(kMESS_ANY);
> -            int p = (POINTER_T) obj;
> +            POINTER_T p = (POINTER_T) obj;
>              *message << p;
>              sock->Send(*message);
> 
> 
> Sincerely Yours,
> Fedor Ignatov 

Do I understand you right? With your patch it works even on 64 bit, right? Or do you
mean there is still a segmentation violation? Anyhow I committed your patch since the
"int" is clearly incorrect.

- Stefan

06 Feb 2007, Fedor Ignatov, Bug Report, segmentation violation of analyzer on a x86_64

Yes right, Problem of a segmentation violation is solved with this patch. Now it works
fine on x86_64.

Fedor 

> Do I understand you right? With your patch it works even on 64 bit, right? Or do you
> mean there is still a segmentation violation? Anyhow I committed your patch since the
> "int" is clearly incorrect.
> 
> - Stefan

17 Feb 2007, Konstantin Olchanski, Bug Report, segmentation violation of analyzer on a x86_64

> Yes right, Problem of a segmentation violation is solved with this patch. Now it works
> fine on x86_64.

Right. I confirm this. I have this exact same fix in my stand-alone copy of the midas
histogram server, and should commit it to MIDAS CVS as well.

K.O.

28 Jul 2006, Shawn Bishop, Bug Report, Latest FC5 Compilation attempt

Perhaps some progess? Problem for compilation on FC5 now seems to be in odb.c for revision 3189. Compilation output as follows: --Shawn

[midas@daruma ~/midas]$ make
cc -c -g -O2 -Wall -Wuninitialized -Iinclude -Idrivers -I../mxml -Llinux/lib -DINCLUDE_FTPLIB -D_LARGEFILE64_SOURCE -m32 -DOS_LINUX -fPIC -Wno-unused-function -o linux/lib/odb.o src/odb.c
src/odb.c: In function �db_open_database�:
src/odb.c:805: warning: dereferencing type-punned pointer will break strict-aliasing rules
src/odb.c: In function �db_lock_database�:
src/odb.c:1350: warning: dereferencing type-punned pointer will break strict-aliasing rules
cc: Internal error: Segmentation fault (program cc1)
Please submit a full bug report.
See <URL:http://bugzilla.redhat.com/bugzilla> for instructions.
make: *** [linux/lib/odb.o] Error 1

05 Aug 2006, Ryu Sawada, Bug Report, Latest FC5 Compilation attempt

Which version of compiler do you use ?

This is probably bug of GCC. Please refer following page.
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=27616

It seems they are trying to fix, but unfortunately it happens also with the latest snapshot of GCC 4.2.

This does not happen when you compile without optimize options.

I hope following command will OK.
cc -c -g -Wall -Wuninitialized -Iinclude -Idrivers -I../mxml -Llinux/lib -DINCLUDE_FTPLIB -D_LARGEFILE64_SOURCE -m32 -DOS_LINUX -fPIC -Wno-unused-function -o linux/lib/odb.o src/odb.c

Shawn Bishop wrote:

08 Sep 2006, Ryu Sawada, Bug Report, Latest FC5 Compilation attempt

GCC developers fixed this problem in development version of GCC 4.2.

There will not be this problem in GCC 4.2 release version.

15 Feb 2007, Ryu Sawada, Info, Latest FC5 Compilation attempt

On February 13, 2007, gcc 4.1.2 was released.
I checked this version, and it compiles midas successfully,

GCC 3                    - OK
GCC 4.0                  - OK
GCC 4.1.0 and 4.1.1      - Bad
GCC 4.1.2                - OK
GCC 4.2                  - This is not released. Development version of GCC 4.2 is OK

11 Feb 2007, Konstantin Olchanski, Info, svn and "make indent" trashed my svn checkout tree...

Fuming, fuming, fuming.

The combination of "make indent" and "svn update" completely trashed my work copy of midas. Half of 
the files now show as status "M", half as status "C" ("in conflict"), even those I never edited myself (e.g. 
mscb firmware files).

I think what happened as that once I ran "make indent", the indent program did things to the source 
files (changed indentation, added spaces in "foo(a,b,c); --> foo(a, b, c);" etc, so now svn thinks that I 
edited the files and they are in conflict with later modifications.

I suggest that nobody ever ever ever should use "make indent", and if they do, they should better 
commit their "changes" made by indent very quickly, before their midas tree is trashed by the next "svn 
update".

And if they commit the changes made by "make indent", beware that "make indent" is not idempotent, 
running it multiple times, it keeps changing files (keeps moving some dox comments around).

Also beware of entering a tug-of-war with Stefan - at least on my machines, my "make indent" seems 
to produce different output from his.

Still fuming, even after some venting...
K.O.

02 Feb 2007, Exaos Lee, Bug Report, Compiling failed with SVN3562 under Ubuntu 6.10

I tried to solve the problem by adding a ";". It was wrong. In fact, the macro "_syscall0(..)" doesn't need the ";".
I searched and found that somebody said "the overall _syscall$magicnumber will disappear". I don't mind whether the "_syscall" disappear or not. I just want to compile the code and do my job. I deleted the additional ";" and recompiled. The error output is as the attachment [elog:335/1].

02 Feb 2007, Exaos Lee, Bug Fix, Problem solved by Re-define _syscall0(...)

OK, I searched and found that my kernel doesn't support "_syscall0" any more. So I patched the system.c as the following (from line 954):


#if defined(OS_DARWIN)
// blank
#elif defined(OS_LINUX)

#include <sys/syscall.h>
#include <unistd.h>
#undef _syscall0
#define _syscall0(type, name) \
  type name(void) \
  {\
    return syscall(__NR_##name); \
  }

_syscall0(pid_t,gettid)
#endif

My kernel version:

exaos@memes midas>$ uname -a
Linux memes 2.6.17-10-generic #2 SMP Tue Dec 5 22:28:26 UTC 2006 i686 GNU/Linux

Maybe it's not the perfect way, but it works. Smile

06 Feb 2007, Stefan Ritt, Bug Fix, Problem solved by Re-define _syscall0(...)

Exaos Lee wrote:

Maybe it's not the perfect way, but it works. Smile

I changed it to:

#ifdef OS_UNIX

   return syscall(SYS_gettid);

#endif                          /* OS_UNIX */
[/code1]

without any #define.

Does this work for you?

- Stefan

05 Feb 2007, Konstantin Olchanski, Bug Report, wrong version in include/midas.h?

The present .../include/midas.h contains
[alpha@laddvme06 ~/online]$ grep 1.9.5 /home/alpha/packages/midas/include/*
/home/alpha/packages/midas/include/midas.h:#define MIDAS_VERSION "1.9.5"

All MIDAS utilities (odbedit ver) presently report version 1.9.5, even for svn
trunk, and this may confuse people as to what version of midas they are using,
and may complicate reporting of bugs.

Perhaps the trunk version should say something like "svn-22233344" (the svn
revision number)? The present "1.9.5" is wrong...

K.O.

06 Feb 2007, Stefan Ritt, Bug Report, wrong version in include/midas.h?

> The present .../include/midas.h contains
> [alpha@laddvme06 ~/online]$ grep 1.9.5 /home/alpha/packages/midas/include/*
> /home/alpha/packages/midas/include/midas.h:#define MIDAS_VERSION "1.9.5"
> 
> All MIDAS utilities (odbedit ver) presently report version 1.9.5, even for svn
> trunk, and this may confuse people as to what version of midas they are using,
> and may complicate reporting of bugs.
> 
> Perhaps the trunk version should say something like "svn-22233344" (the svn
> revision number)? The present "1.9.5" is wrong...

Fully agree. I added a svn_revision string into midas.h, which gets reported now
by "odbedit ver". Unfortunately this reflects only changes in midas.c. If one
changes odb.c for example, the svn revision in midas.c does not get modified by
the SVN system. In addition I changed the present version 1.9.5 to 2.0.0. I made
the tar and zip files. After some internal testing, it will be announced
officially in a few days.

02 Feb 2007, Exaos Lee, Bug Report, Compiling failed with SVN3562 under Ubuntu 6.10

The error log is as the following:

cc -c -g -O2 -Wall -Wuninitialized -Iinclude -Idrivers -I../mxml -Llinux/lib -DINCLUDE_FTPLIB   -D_LARGEFILE64_SOURCE -DHAVE_MYSQL -DHAVE_ROOT -pthread -I/opt/root/current/include -DOS_LINUX -fPIC -Wno-unused-function -o linux/lib/system.o src/system.c
src/system.c:958: error: expected declaration specifiers or �...� before �gettid�
src/system.c:958: warning: data definition has no type or storage class
src/system.c:958: warning: type defaults to �int� in declaration of �_syscall0�
src/system.c: In function �ss_gettid�:
src/system.c:1005: warning: implicit declaration of function �gettid�
src/system.c: In function �ss_suspend_init_ipc�:
src/system.c:2948: warning: pointer targets in passing argument 3 of �getsockname� differ in signedness
src/system.c: In function �ss_suspend�:
src/system.c:3414: warning: pointer targets in passing argument 6 of �recvfrom� differ in signedness
src/system.c:3441: warning: pointer targets in passing argument 6 of �recvfrom� differ in signedness
make: *** [linux/lib/system.o] &#38169;&#35823; 1

The error might be here:

void ss_force_single_thread()
{
   _single_thread = TRUE;
}

#if defined(OS_DARWIN)
// blank
#elif defined(OS_LINUX)
_syscall0(pid_t,gettid);
#endif

INT ss_gettid(void)

I have no idea about the usage of _syscall0(...).

02 Feb 2007, Exaos Lee, Bug Report, Compiling failed with SVN3562 under Ubuntu 6.10

02 Feb 2007, Exaos Lee, Bug Fix, Problem solved by Re-define _syscall0(...)

OK, I searched and found that my kernel doesn't support "_syscall0" any more. So I patched the system.c as the following (from line 954):


#if defined(OS_DARWIN)
// blank
#elif defined(OS_LINUX)

#include <sys/syscall.h>
#include <unistd.h>
#undef _syscall0
#define _syscall0(type, name) \
  type name(void) \
  {\
    return syscall(__NR_##name); \
  }

_syscall0(pid_t,gettid)
#endif

My kernel version:

exaos@memes midas>$ uname -a
Linux memes 2.6.17-10-generic #2 SMP Tue Dec 5 22:28:26 UTC 2006 i686 GNU/Linux

Maybe it's not the perfect way, but it works. Smile

06 Feb 2007, Stefan Ritt, Bug Fix, Problem solved by Re-define _syscall0(...)

Exaos Lee wrote:

Maybe it's not the perfect way, but it works. Smile

I changed it to:

#ifdef OS_UNIX

   return syscall(SYS_gettid);

#endif                          /* OS_UNIX */
[/code1]

without any #define.

Does this work for you?

- Stefan

30 Jan 2007, Stefan Ritt, Bug Report, Large files under Windows XP

Hello,

We have problems analyzing large files under Windows XP. For small file sizes,
everything is ok. We have events of 2.8 MB each, and we can read ~30 events per
second. But if the file gets larger than typically 600-800 MB, then access
becomes very slow, about 1 event per second. This is not the case under Linux,
where it stays at 30 Hz (~90 MB/sec). 

Looking at the low level file access, it is obvious that this has nothing to do
with midas, this problem can be reproduced with a simple program reading chunks
of 3MB from a 1GB file. The Windows XP file system is NTFS, default formatting.
Does anyone else have observed a similar problem or maybe even have some
suggestions? Unfortunately many people here want to analyze midas data under
Windows...

Stefan Ritt

26 Jan 2007, Carl Metelko, Forum, Front end electronics broadcast data over ethernet, can midas read this in

Hi,
   the system I'm building will have data read into the frontend nodes
via ethernet (optic). Is this possible?>

21 Jan 2007, Denis Bilenko, Bug Report, buffer bugs

Hello,

We've been using midas and have stumbled upon some inconsistent behaviour:
1. Blocking calls to midas api aren't usable when client is connected through
mserver. This is true at least for bm_receive_event, but seems to be a more
general problem - midas application has call cm_yield within 10 seconds (or
whatever timeout is set) to remain alive.
That not the case when RPC is not used.

2. On Windows, two processes on the same machine can send/receive events to
each other only if they both use midas locally (through shared mem) or they
both use midas via RPC (through mserver), but not if they use different ways.

3. Receiving/sending same events from the same process - was possible in
1.9.5-1, not so in the current version (revision 3501, mxml revision 45). Is this an intended behavior fix?
To explain how to reproduce bugs, I will use 2 helper programs evprint.py and
evsend.py - for receiving and sending events respectively. You don't need
them, just something to send and receive events. (These are part of pymidas, which will be
released to public any time soon, but is quite usable already).

They both accept
* --path option in "host/experiment" format (for cm_connect_experiment call)
* --log option which command them to trace all midas' calls to terminal

evprint.py have two ways of receiving events
1) via looping over bm_receive_event
2) via providing callback to bm_request_event and looping over cm_yield(400) call

Example of use:

first-console$ python evprint.py receive
second-console$ python evsend.py 123
[first console]
id=2007 mask=2007 serial=2007 time=1169372833 len=3 '123'

So,

1. Blocking calls to midas api aren't usable when client is connected through
mserver.

$ python evprint.py --log --path 127.0.0.1/online receive"
cm_connect_experiment('127.0.0.1', 'online', 'evprint.py', None)
bm_open_buffer('SYSTEM', 1048576, &c_long(2)) -> BM_CREATED
bm_request_event(2, -1, -1, 2, &c_long(0), None)
... wait for a couple of seconds ...
[midas.c:9348:rpc_call] rpc timeout, routine = "bm_receive_event"
[system.c:3570:send_tcp] send(socket=0,size=8) returned -1, errno: 88 (Socket 
operation on non-socket)
[midas.c:9326:rpc_call] send_tcp() failed

bm_receive_event(2, ...) -> RPC_TIMEOUT

bm_remove_event_request(2, 0) -> BM_INVALID_HANDLE
bm_close_buffer(2) -> BM_INVALID_HANDLE
cm_disconnect_experiment()

2. Missing events on windows
a) Both use midas locally - works

   1: python evprint.py receive
   2: python evsend.py 123
   1: id=2007 mask=2007 serial=2007 time=1169372833 len=3 '123'

b) Both use midas via RPC - works

   1: python evprint.py --path 127.0.0.1/ dispatch
   2: python evsend.py --path 127.0.0.1/ 123
   1: id=2007 mask=2007 serial=2007 time=1169373366 len=3 '123'

c) Receiver uses midas locally, sender uses mserver - doesn't work on windows

   1: python evprint.py dispatch
   2: python evsend.py --path 127.0.0.1/ 123
   1: (nothing printed)

d) The other way around - doesn't work on windows

   1: python evprint.py --path 127.0.0.1/ dispatch
   2: python evsend.py 123
   1: (nothing printed)

No such problem on linux.

3. Receiving/sending same events from the same process.
To reproduce this, just request events, send one and then try to receive
it � via cm_yield. I care for this, because I have a test in pymidas which
relies on this behavior.

hope this will help.

22 Jan 2007, Stefan Ritt, Bug Report, buffer bugs

Denis Bilenko wrote:

1. Blocking calls to midas api aren't usable when client is connected through mserver. This is true at least for bm_receive_event, but seems to be a more general problem - midas application has call cm_yield within 10 seconds (or whatever timeout is set) to remain alive.
That not the case when RPC is not used.

The 10 seconds timeout you see comes from the RPC layer. If you call bm_receive_event and it blocks, then the client will consider a RPC timeout after 10 seconds. Has nothing to do with cm_yield(). Calling a blocking function via a sever connection is not a good idea anyhow, since this process then cannot respond on anything else, like run transitions. That's why I never used it and that's why I have not realized that behaviour. I did change it however such that bm_receive_event, if called without the ASYNC flag, disables the RPC timeout for this call and restores it afterwards. This is now in midas.c revision 3502. You can try this with midas/examples/lowlevel/produce and consume easily.

Denis Bilenko wrote:

2. On Windows, two processes on the same machine can send/receive events to each other only if they both use midas locally (through shared mem) or they both use midas via RPC (through mserver), but not if they use different ways.

I just tried again and it did work. I used produce/consume. If you enter just <return> for the host name, these programs connect locally. So I tried both producer locally, consumer remote, and vice versa, and both worked. I did however use consume with the callback functionality. I did not try your Python programs however. If you find out that produce/consume does work and your Python program don't, then adapt your Python programs to resemble produce/consume.

Denis Bilenko wrote:

3. Receiving/sending same events from the same process - was possible in 1.9.5-1, not so in the current version (revision 3501, mxml revision 45). Is this an intended behavior fix?

Yes. It was introduced in revision 3186 on July 28th, 2006. It fixed a problem that the buffer level was always shown as 100% full, even if there were no other clients registered. By ignoring the own process, the buffer level now correctly shows the "contents" of a buffer from 0..100%. It also gave a small speed improvement. If you want to send events to the own process, you have to do it from the calling level. Like if you call bm_send_event(), you call manually process_event or however your event receiving routine is called. This is also much faster than going through the buffer.

23 Jan 2007, Denis Bilenko, Bug Report, buffer bugs

1 & 3 - thanks for the fix and the explanation, as for 2 - I've tried consume and produce
and still has a problem:

Config: GET_ALL, event id = 1, event size = 10, Receive via callback,
OS = Windows XP SP2
I restart mserver manually from command-line every time (not using system service).
I start produce first, then I start consume.
In two cases of four starting 'consume' causes 'produce' to exit immediatelly.
Guess which two Smile

both local or both remote - works (i.e. non-zero rates in both consoles)
produce local, consume via rpc and vice versa - 'produce' exits with error

1. produce via rpc, consume locally

first console:

D:\denis\cmd\midas\current\06jan21-export\midas\NT\bin>produce.exe
ID of event to produce: 1
Host to connect: 127.0.0.1
Event size: 10
Level:   0.0 %, Rate: 0.64 MB/sec
flush
Level:   0.0 %, Rate: 0.64 MB/sec
Level:   0.0 %, Rate: 0.63 MB/sec
Level:   0.0 %, Rate: 0.64 MB/sec
Level:   0.0 %, Rate: 0.61 MB/sec
Level:   0.0 %, Rate: 0.62 MB/sec
Level:   0.0 %, Rate: 0.62 MB/sec
Level:   0.0 %, Rate: 0.64 MB/sec
Level:   0.0 %, Rate: 0.63 MB/sec
Level:   0.0 %, Rate: 0.63 MB/sec
Level:   0.0 %, Rate: 0.64 MB/sec
flush
Level:   0.0 %, Rate: 0.62 MB/sec

## Now I've started consume in the other console ##

[system.c:3570:send_tcp] send(socket=1900,size=8136) returned -1, errno: 0 (No error)
send_tcp() returned -1
[midas.c:9669:rpc_send_event] send_tcp() failed
rpc_send_event returned error 503, event_size 10

second console:

D:\denis\cmd\midas\current\06jan21-export\midas\NT\bin>consume.exe
ID of event to request: 1
Host to connect:
Get all events (0/1): 1
Receive via callback ([y]/n):
Level:   0.0 %, Rate: 0.00 MB/sec, ser mismatches: 0
Level:   0.0 %, Rate: 0.00 MB/sec, ser mismatches: 0
Level:   0.0 %, Rate: 0.00 MB/sec, ser mismatches: 0
Received break. Aborting...

mserver's output:

D:\denis\cmd\midas\current\06jan21-export\midas\NT\bin\mserver.exe started interactively
[midas.c:2315:bm_validate_client_index] Invalid client index 0 in buffer 'SYSTEM'.
Client name 'Power Consumer', pid 1964 should be 3216

2. produce locally, consume via rpc

D:\denis\cmd\midas\current\06jan21-export\midas\NT\bin>produce.exe
ID of event to produce: 1
Host to connect:
Event size: 10
Client 'Producer' (PID 2584) on 'ODB' removed by cm_watchdog (idle 144.1s,TO 10s)
Level:   0.0 %, Rate: 3.20 MB/sec
flush
Level:   0.0 %, Rate: 3.20 MB/sec
Level:   0.0 %, Rate: 3.11 MB/sec
Level:   0.0 %, Rate: 3.13 MB/sec
Level:   0.0 %, Rate: 3.06 MB/sec
Level:   0.0 %, Rate: 3.20 MB/sec
Level:   0.0 %, Rate: 2.96 MB/sec
Level:   0.0 %, Rate: 3.11 MB/sec
Level:   0.0 %, Rate: 3.18 MB/sec
Level:   0.0 %, Rate: 3.13 MB/sec
Level:   0.0 %, Rate: 3.17 MB/sec
flush
Level:   0.0 %, Rate: 3.19 MB/sec
Level:   0.0 %, Rate: 3.08 MB/sec
Level:   0.0 %, Rate: 3.06 MB/sec

## Now I've started consume ##

[midas.c:2315:bm_validate_client_index] Invalid client index 0 in buffer 'SYSTEM'. Client name '', pid 0 should be 760

Second console:

D:\denis\cmd\midas\current\06jan21-export\midas\NT\bin>consume.exe
ID of event to request: 1
Host to connect: 127.0.0.1
Get all events (0/1): 1
Receive via callback ([y]/n):
Level:   0.0 %, Rate: 0.00 MB/sec, ser mismatches: 0
Level:   0.0 %, Rate: 0.00 MB/sec, ser mismatches: 0
Received break. Aborting...
Level:   0.0 %, Rate: 0.00 MB/sec, ser mismatches: 0

mserver haven't said anything.

3. Both remote (just for comparison)

D:\denis\cmd\midas\current\06jan21-export\midas\NT\bin>produce.exe
ID of event to produce: 1
Host to connect: 127.0.0.1
Event size: 10
Level:   0.0 %, Rate: 0.65 MB/sec
flush
Level:   0.0 %, Rate: 0.66 MB/sec
Level:   0.0 %, Rate: 0.65 MB/sec
Level:   0.0 %, Rate: 0.60 MB/sec
Level:   0.0 %, Rate: 0.64 MB/sec
Level:   0.0 %, Rate: 0.63 MB/sec
Level:   0.0 %, Rate: 0.61 MB/sec
Level:   0.0 %, Rate: 0.63 MB/sec
Level:   0.0 %, Rate: 0.65 MB/sec
Level:   0.0 %, Rate: 0.65 MB/sec
Level:   0.0 %, Rate: 0.67 MB/sec
flush
Level:   0.0 %, Rate: 0.66 MB/sec
Level:   0.0 %, Rate: 0.65 MB/sec
Level:   0.0 %, Rate: 0.65 MB/sec
Level:   0.0 %, Rate: 0.66 MB/sec
Level:   0.0 %, Rate: 0.66 MB/sec
Level:   0.0 %, Rate: 0.65 MB/sec
Level:   0.0 %, Rate: 0.66 MB/sec
Level:   0.0 %, Rate: 0.66 MB/sec
Level:   0.0 %, Rate: 0.66 MB/sec
Level:  66.8 %, Rate: 0.66 MB/sec
flush
Level:   0.0 %, Rate: 0.00 MB/sec
Level:  66.8 %, Rate: 0.31 MB/sec
Level:  57.2 %, Rate: 0.15 MB/sec
Level:  57.3 %, Rate: 0.14 MB/sec
Level:  57.3 %, Rate: 0.15 MB/sec
Level:  57.3 %, Rate: 0.14 MB/sec
Level:  57.3 %, Rate: 0.14 MB/sec
Level:  57.3 %, Rate: 0.14 MB/sec
Received break. Aborting...
Received 2nd break. Hard abort.
[midas.c:1581:] cm_disconnect_experiment not called at end of program

Second console:

D:\denis\cmd\midas\current\06jan21-export\midas\NT\bin>consume.exe
ID of event to request: 1
Host to connect: 127.0.0.1
Get all events (0/1): 1
Receive via callback ([y]/n):
[consume.c:73:process_event] Serial number mismatch: Ser: 1397076, OldSer: 0, ID: 1, size: 10
Level:  37.1 %, Rate: 0.00 MB/sec, ser mismatches: 1
Level:   0.0 %, Rate: 0.15 MB/sec, ser mismatches: 1
Level:  95.4 %, Rate: 0.08 MB/sec, ser mismatches: 1
Level:  66.8 %, Rate: 0.14 MB/sec, ser mismatches: 1
Level:  66.8 %, Rate: 0.12 MB/sec, ser mismatches: 1
Level:  76.3 %, Rate: 0.12 MB/sec, ser mismatches: 1
Level:  95.4 %, Rate: 0.11 MB/sec, ser mismatches: 1
Level:  57.3 %, Rate: 0.15 MB/sec, ser mismatches: 1
Level:  66.8 %, Rate: 0.11 MB/sec, ser mismatches: 1
Level:  85.9 %, Rate: 0.11 MB/sec, ser mismatches: 1
Level:  95.5 %, Rate: 0.12 MB/sec, ser mismatches: 1
Level:  57.4 %, Rate: 0.15 MB/sec, ser mismatches: 1
Level:   9.7 %, Rate: 0.15 MB/sec, ser mismatches: 1
[Producer] [midas.c:1581:] cm_disconnect_experiment not called at end of program
Level:   0.0 %, Rate: 0.03 MB/sec, ser mismatches: 1
Level:   0.0 %, Rate: 0.00 MB/sec, ser mismatches: 1
Received break. Aborting...

23 Jan 2007, Stefan Ritt, Bug Report, buffer bugs

Denis Bilenko wrote:

1 & 3 - thanks for the fix and the explanation, as for 2 - I've tried consume and produce
and still has a problem

Acknowledged. I could reproduce it with the information you supplied, thank you very much. Also the data rate is slower than what I expect. I will investigate and fix this, but it could take some time.

24 Jan 2007, Stefan Ritt, Bug Report, buffer bugs

I tried again and could not reproduce the problem. Last time I was probably confused by some old mserver.exe executable I had lying around. I updated to the most recent version (3516) and did a C:\midas> nmake -f makefile.nt. Last time I was also confused about the low rate, but that was caused by a mserver.exe executable which was not compiled with optimization. For small event sizes (such as 10 bytes) there is a big difference between optimized and non-optimized code. So I got:

First Console wrote:

ID of event to produce: 1
Host to connect: localhost
Event size: 10
Level:   0.0 %, Rate: 0.46 MB/sec
flush
Level:   0.0 %, Rate: 0.43 MB/sec
Level:   0.0 %, Rate: 0.43 MB/sec
Level:   0.0 %, Rate: 0.42 MB/sec
Level:   0.0 %, Rate: 0.42 MB/sec
Level:   0.0 %, Rate: 0.43 MB/sec
Level:   0.0 %, Rate: 0.43 MB/sec
Level:   0.0 %, Rate: 0.44 MB/sec
Level:   0.0 %, Rate: 0.42 MB/sec
Level:   0.0 %, Rate: 0.43 MB/sec
Level:   0.0 %, Rate: 0.43 MB/sec
flush
Level:   0.0 %, Rate: 0.44 MB/sec
Level:   0.0 %, Rate: 0.44 MB/sec
Level:   0.0 %, Rate: 0.40 MB/sec
Level:   0.0 %, Rate: 0.42 MB/sec
Level:   0.0 %, Rate: 0.43 MB/sec
Level:   0.0 %, Rate: 0.43 MB/sec
Level:   0.0 %, Rate: 0.44 MB/sec
Level:   0.0 %, Rate: 0.43 MB/sec
Level:   0.0 %, Rate: 0.43 MB/sec
Level:   0.0 %, Rate: 0.43 MB/sec
flush

and

Second Console wrote:

C:\midas\NT\bin>.\consume
ID of event to request: 1
Host to connect:
Get all events (0/1): 1
Receive via callback ([y]/n):
[consume.c:73:process_event] Serial number mismatch: Ser: 1169666, OldSer: 0, ID
: 1, size: 10
Level:   0.0 %, Rate: 0.00 MB/sec, ser mismatches: 1
Level:   0.0 %, Rate: 0.42 MB/sec, ser mismatches: 1
Level:   0.0 %, Rate: 0.41 MB/sec, ser mismatches: 1
Level:   0.0 %, Rate: 0.41 MB/sec, ser mismatches: 1
Level:   0.0 %, Rate: 0.42 MB/sec, ser mismatches: 1
Level:   0.0 %, Rate: 0.41 MB/sec, ser mismatches: 1
Level:   0.0 %, Rate: 0.41 MB/sec, ser mismatches: 1
Level:   2.4 %, Rate: 0.35 MB/sec, ser mismatches: 1
Level:   0.0 %, Rate: 0.50 MB/sec, ser mismatches: 1
Level:   0.0 %, Rate: 0.41 MB/sec, ser mismatches: 1
Level:   0.0 %, Rate: 0.41 MB/sec, ser mismatches: 1
Level:   0.0 %, Rate: 0.41 MB/sec, ser mismatches: 1
Level:   0.0 %, Rate: 0.41 MB/sec, ser mismatches: 1
Level:   0.0 %, Rate: 0.41 MB/sec, ser mismatches: 1
Level:   0.0 %, Rate: 0.41 MB/sec, ser mismatches: 1
Level:   0.0 %, Rate: 0.41 MB/sec, ser mismatches: 1
Level:   0.0 %, Rate: 0.41 MB/sec, ser mismatches: 1
Level:   0.0 %, Rate: 0.40 MB/sec, ser mismatches: 1
Received break. Aborting...

Actually sending remote and receiving local is a very common thing. Most experiments use that. They have a remote frontend, and the logger and analyzer work locally. If that would not work, all these experiments would have a problem. So I only can encourage you to try again, make sure to update and recompile the executables. Maybe delete any old *.SHM file. Maybe try on another PC or under Linux.

11 Jan 2007, Steve Hardy, Forum, Shared memory problems

Hello,

Just did a fresh install of MIDAS from the SVN repository under CentOS and
everything compiles fine, but when I go to run the frontend (using dio), I get
the following error message:

Connect to experiment ...[odb.c:868:db_open_database] Different database format:
 Shared memory is 14, program is 2
[midas.c:1763:cm_connect_experiment1] cannot open database


Any ideas on what the problem could be, or how to fix it?  


~Steve

11 Jan 2007, Stefan Ritt, Forum, Shared memory problems

> Hello,
> 
> Just did a fresh install of MIDAS from the SVN repository under CentOS and
> everything compiles fine, but when I go to run the frontend (using dio), I get
> the following error message:
> 
> Connect to experiment ...[odb.c:868:db_open_database] Different database format:
>  Shared memory is 14, program is 2
> [midas.c:1763:cm_connect_experiment1] cannot open database
> 
> 
> Any ideas on what the problem could be, or how to fix it?  

You have an old .ODB.SHM from a previous version in your directoy (note the '.' in
front, so you need a 'ls -alg' to see it). Delete that file and try again.

11 Jan 2007, Steve Hardy, Forum, Shared memory problems

Thanks for your help.  I tried again and it got me back to the initial problem I had.
 The frontend will start, and the analyzer starts (complains about there not being a
last.root, but other than that it's fine), and then when starting mlogger, I get:

[odb.c:860:db_validate_db] Warning: database corruption, first_free_key 0x0001A4
04
[odb.c:3666:db_get_key] invalid key handle
[midas.c:1970:cm_check_client] cannot delete client info
[odb.c:3666:db_get_key] invalid key handle
[midas.c:1970:cm_check_client] cannot delete client info
[odb.c:3666:db_get_key] invalid key handle


And it continues to shoot out error messages about invalid key handles until I kill
it.  Then trying to start the frontend again fails until I remove the .ODB.SHM file. 
Any other ideas?

> > Hello,
> > 
> > Just did a fresh install of MIDAS from the SVN repository under CentOS and
> > everything compiles fine, but when I go to run the frontend (using dio), I get
> > the following error message:
> > 
> > Connect to experiment ...[odb.c:868:db_open_database] Different database format:
> >  Shared memory is 14, program is 2
> > [midas.c:1763:cm_connect_experiment1] cannot open database
> > 
> > 
> > Any ideas on what the problem could be, or how to fix it?  
> 
> You have an old .ODB.SHM from a previous version in your directoy (note the '.' in
> front, so you need a 'ls -alg' to see it). Delete that file and try again.

11 Jan 2007, Stefan Ritt, Forum, Shared memory problems

That sounds like you mix versions: You have an old executable (maybe your mlogger) which
has been linked against the old midas version, but you create the ODB with the new
odbedit or frontend. The new version complains if it finds an ODB from a previous version
(the error you reported first), but an old program does not have that version check, so
it finds a different binary ODB structure and crashes.

> Thanks for your help.  I tried again and it got me back to the initial problem I had.
>  The frontend will start, and the analyzer starts (complains about there not being a
> last.root, but other than that it's fine), and then when starting mlogger, I get:
> 
> [odb.c:860:db_validate_db] Warning: database corruption, first_free_key 0x0001A4
> 04
> [odb.c:3666:db_get_key] invalid key handle
> [midas.c:1970:cm_check_client] cannot delete client info
> [odb.c:3666:db_get_key] invalid key handle
> [midas.c:1970:cm_check_client] cannot delete client info
> [odb.c:3666:db_get_key] invalid key handle
> 
> 
> And it continues to shoot out error messages about invalid key handles until I kill
> it.  Then trying to start the frontend again fails until I remove the .ODB.SHM file. 
> Any other ideas?
> 
> > > Hello,
> > > 
> > > Just did a fresh install of MIDAS from the SVN repository under CentOS and
> > > everything compiles fine, but when I go to run the frontend (using dio), I get
> > > the following error message:
> > > 
> > > Connect to experiment ...[odb.c:868:db_open_database] Different database format:
> > >  Shared memory is 14, program is 2
> > > [midas.c:1763:cm_connect_experiment1] cannot open database
> > > 
> > > 
> > > Any ideas on what the problem could be, or how to fix it?  
> > 
> > You have an old .ODB.SHM from a previous version in your directoy (note the '.' in
> > front, so you need a 'ls -alg' to see it). Delete that file and try again.

27 Dec 2006, Eric-Olivier LE BIGOT, Forum, Access to out_info from mana.c

Hello,

Is it possible to access out_info (defined in mana.c) from another program?

In fact, out_info is now defined as an (anonymous) "static struct" in mana.c,
which it seems to me precludes any direct use in another program.  Is there an
indirect way of getting ahold of out_info?  or of the information it contains?

out_info used to be defined as a *non-static* struct, and the code I'm currently
modifying used to compile seamlessly: it now stops the compilation during
linking time, as out_info is now static and the program I have to compile
contains an "extern struct {} out_info".

Any help would be much appreciated!  I searched in vain in this forum for
details about out_info and I really need to access the information it contains!

EOL (a pure MIDAS novice)

05 Jan 2007, Eric-Olivier LE BIGOT, Suggestion, Access to out_info from mana.c

Would it be relevant to transform out_info into a *non-static* variable of a type
defined by a *named* struct?
Currently,  programs that  try to access out_info cannot do it anymore; and they
typically copy the struct definition from mana.c, which is not robust against future
changes in mana.c.

If mana.c could be changed in the way described above, that would be great . 
Otherwise, is it safe to patch it myself for local use?  or is there a better way of
accessing out_info from mana.c?

As always, any help would be much appreciated :)

EOL

> Hello,
> 
> Is it possible to access out_info (defined in mana.c) from another program?
> 
> In fact, out_info is now defined as an (anonymous) "static struct" in mana.c,
> which it seems to me precludes any direct use in another program.  Is there an
> indirect way of getting ahold of out_info?  or of the information it contains?
> 
> out_info used to be defined as a *non-static* struct, and the code I'm currently
> modifying used to compile seamlessly: it now stops the compilation during
> linking time, as out_info is now static and the program I have to compile
> contains an "extern struct {} out_info".
> 
> Any help would be much appreciated!  I searched in vain in this forum for
> details about out_info and I really need to access the information it contains!
> 
> EOL (a pure MIDAS novice)

08 Jan 2007, Stefan Ritt, Suggestion, Access to out_info from mana.c

I changed out_info into a global structure definition ANA_OUTPUT_INFO and put it into
midas.h, so it can be accessed easily from the user analyzer source code.

> Would it be relevant to transform out_info into a *non-static* variable of a type
> defined by a *named* struct?
> Currently,  programs that  try to access out_info cannot do it anymore; and they
> typically copy the struct definition from mana.c, which is not robust against future
> changes in mana.c.
> 
> If mana.c could be changed in the way described above, that would be great . 
> Otherwise, is it safe to patch it myself for local use?  or is there a better way of
> accessing out_info from mana.c?
> 
> As always, any help would be much appreciated :)
> 
> EOL
> 
> > Hello,
> > 
> > Is it possible to access out_info (defined in mana.c) from another program?
> > 
> > In fact, out_info is now defined as an (anonymous) "static struct" in mana.c,
> > which it seems to me precludes any direct use in another program.  Is there an
> > indirect way of getting ahold of out_info?  or of the information it contains?
> > 
> > out_info used to be defined as a *non-static* struct, and the code I'm currently
> > modifying used to compile seamlessly: it now stops the compilation during
> > linking time, as out_info is now static and the program I have to compile
> > contains an "extern struct {} out_info".
> > 
> > Any help would be much appreciated!  I searched in vain in this forum for
> > details about out_info and I really need to access the information it contains!
> > 
> > EOL (a pure MIDAS novice)

26 Oct 2006, Hans Fynbo, Forum, Setup of Ortec ADC AD413A in MIDAS

We are new to MIDAS and try to setup a simple system with one ortec camac ADC
AD413A and the hytec1331 controler. Has anyone used this module in MIDAS we
would be grateful for the corresponding frontend.c etc. 

It would be very useful to have somewhere examples of files used by various
experiments in addition to the example files provided in the installation.

Best regards,
Hans

16 Oct 2006, Exaos Lee, Bug Fix, Build error with mana.c while using CERNLIB, svn 3366

If you use CERNLIB to build hmana.o, you may encounter the following error:

src/mana.c: In function �write_event_hbook�:
src/mana.c:2881: error: invalid assignment

or somthing like this:

src/mana.c: In function �write_event_hbook�:
src/mana.c:2881: warning: target of assignment not really an lvalue; this will be a hard error in the future

So I checked the mana.c and found these lines

2880            /* shift data pointer to next item */
2881            (char *) pdata += key.item_size * key.num_values;

should be changed to

2880            /* shift data pointer to next item */
2881            pdata += key.item_size * key.num_values * sizeof(char) ;

16 Oct 2006, Stefan Ritt, Bug Fix, Build error with mana.c while using CERNLIB, svn 3366

Committed, thanks.

23 Sep 2006, Konstantin Olchanski, Bug Report, mhttpd elog corruption via double-edit

Aparently the mhttpd elog will corrupt the elog files if two (or more?) elog entries are being edited at the 
same time. K.O.

24 Sep 2006, Stefan Ritt, Bug Report, mhttpd elog corruption via double-edit

K.O. wrote:

Aparently the mhttpd elog will corrupt the elog files if two (or more\?) elog entries are being edited at the same time. K.O.

That's strange. Since mhttpd is single threaded, there should not be any multi-thread/process conflict there, since the elog files cannot be written simultaneously from two different browser sessions. If entries are edited at the same time, they get then submitted one after the other. Of course it is possible to edit the same entry, in which case the second submission "wins", overwriting the first one without notification. Withing the standalone elog server there is the option to lock entries ("use lock = 1") to prevent this, but this feature is not present in the mhttpd elog.

27 Sep 2006, Konstantin Olchanski, Bug Report, mhttpd elog corruption via double-edit

[quote="Stefan Ritt"][Quote="K.O.]Aparently the mhttpd elog will corrupt the
elog files if two (or more\?) elog entries are being edited at the same time.
K.O.[/quote]

The corruption is very simple. mhttpd elog indexes the elog entries by the elog
file and offset inside the file, i.e. "http://ladd00:8088/EL/060927.318",
"060927" corresponds to log file "060927.log", "318" is the offset inside the
file where the message is located.

During "edit", the code "remembers" the offset of the original message and in
el_submit() blindly writes the edited message into the file at the remembered
offset.

If another message was edited before the edit of the first message is submitted,
the remembered offset becomes invalid (messages have shifted inside the file)
and el_submit() writes the edited text into the wrong place in the file,
corrupting it.

I have now added a check for this and we crash instead of corrupting the elog
file (midas.c rev 3340).

I do not know how to "properly" fix this bug without changing the indexing
scheme to something similar to what is used by elogd- message numbers instead of
file indices. In the existing scheme, message editing also breaks URLs shown in
the email notifications (they contain file indices that point to the wrong
places after messages are moved around by editing) and "reply threading" links.

Here is how I reproduce this bug:

1) start with an empty elog
2) create two messages
3) "edit" the second message, but do not submit it yet.
4) "edit" the first message, change the text to make sure the message size
becomes different; submit this change.
5) submit the "edit" of the first message. !!BOOM!!

K.O.

28 Sep 2006, Stefan Ritt, Bug Report, mhttpd elog corruption via double-edit

> I do not know how to "properly" fix this bug without changing the indexing
> scheme to something similar to what is used by elogd- message numbers instead of
> file indices. In the existing scheme, message editing also breaks URLs shown in
> the email notifications (they contain file indices that point to the wrong
> places after messages are moved around by editing) and "reply threading" links.

Well, the development of elogd with it's message numbers was actually stimulated by
the problem you mentioned. After that all those problems went away. Another
incarnation of that problem is if you edit an mhttpd log file manually. Afterwards
the file offsets are different and the system gets corrupted. To fix this properly,
one would have to backport the el_xxx functions from elogd to mhttpd, or, even
simpler, remove the elog functionality in mhttpd and "force" everybody to use elogd
(after doing elconv to convert the files into the new format).

Goto page Previous 1, 2, 3 ... 37, 38, 39 ... 43, 44, 45 Next

ELOG V3.1.4-2e1708b5