21 Nov 2017, Konstantin Olchanski, Info, MIDAS support on el5?
|
It has been reported that the current midas release candidate does not build on el5 linux (SL/RHEL/CentOS-5).
According to Red Hat, el5 is end-of-life, last SL 5 (SL5.11) was done in 2014, so this linux is very old. Also as it happens, I do not have access to any
el5 machines to check if midas builds or runs (but this can be fixed).
https://www.scientificlinux.org/downloads/sl-versions/sl5/
https://access.redhat.com/support/policy/updates/errata
On the midas web page (https://midas.triumf.ca) we do not explicitly state which versions of which linux we definitely support. Most other open-
source projects only support current major linux distributions, hardly anybody supports end-of-life linuxes such as el5. Some projects do not even
support recent linuxes still widely in use (ROOT6 does not build on stock el6 and there is no KDE5 for el7).
So back to midas. Support for different operating systems comes down to:
1) C/C++ language support. We still use el6 (GCC 4.4.7), so use of c++-11 language features should be avoided
2) operating system features support:
a) sysv semaphores (sysv shared memory no longer used, cannot be used on macos)
aa) (macos also is missing parts of the sysv semaphore api, such as "wait for lock, with timeout", we are using an ugly work-around)
b) posix shared memory with mprotect() & co
c) posix mutexes, including recursive-type mutexes (this seems to be the problem on el5)
d) bsd networking (need to migrate from select() to poll() and from gethostbyname() to getaddrinfo() & co (for IPv6 support))
Not all of these operating system functions are required for all of midas. Running mhttpd and mlogger requires
pretty much everything. Running just a frontend connected to midas through the mserver requires the least features,
just the networking is enough, I think.
Obviously we cannot support midas in perpetuity on all versions of all operating systems, once I do not have
access to a machine, I cannot even check that midas builds and that it runs the basic functions.
Instead, we could provide a "feature reduced" build of midas (makefile target) that includes "just enough" of midas
to (say) run a frontend, maybe even odbedit. We already have some provisions for this, but no obvious documented
way actually doing it.
So back to el5.
How important it is to support very old operating systems?
How many people still use el5?
How about old versions of Ubuntu? Macos?
If you use anything older than el6, can you speak up,
(and if possible say why you cannot migrate to an up-to-date linux).
K.O. |
10 Nov 2017, Frederik Wauters, Bug Report, bug in init of hv class driver
|
bug in init
-----------
I used the lv.c class driver, combined with a custom device driver, to control
our Keithley2611B source meter. This to set negative voltage on Si detectors.
In the 'init' routing, the class driver sets the hv:
hv_info->demand_mirror[i] = MIN(hv_info->demand[i], hv_info->voltage_limit[i]);
This fails for negative voltage, as it sets the (negative) voltage limit, instead
of the demand voltage. A simple 'fabs' solves this.
suggestion for 'idle'
---------------------
I let the device do the ramping, not the driver. This also means I have to reset
the state of the device (current limit) after ramping. The easiest way to to
this, is using CMD_IDLE of the device driver. This is currently not done in the
hv.c class driver. |
17 Nov 2017, Konstantin Olchanski, Bug Report, bug in init of hv class driver
|
Hi, Frederick, this is my personal opinion on the slow controls hv classes, I have
used them a couple of times and I found them full of little buglets like this,
plus some incomplete functions, plus some missing features, plus it is all
written in C trying to do object oriented programming. On the balance my opinion
is that it is less work to write a high voltage control program in C++ from scratch
using the regular midas frontend infrastructure compared to having to understand
the hv class driver, write the missing bits, fix the little buglets, debug
the crashes in the C string handling, and what not. (For example I had to debug
mysterious failures to pass float and double values through the C stdarg interface,
there are more fun things to do out there).
K.O.
> bug in init
> -----------
>
> I used the lv.c class driver, combined with a custom device driver, to control
> our Keithley2611B source meter. This to set negative voltage on Si detectors.
>
> In the 'init' routing, the class driver sets the hv:
>
> hv_info->demand_mirror[i] = MIN(hv_info->demand[i], hv_info->voltage_limit[i]);
>
> This fails for negative voltage, as it sets the (negative) voltage limit, instead
> of the demand voltage. A simple 'fabs' solves this.
>
> suggestion for 'idle'
> ---------------------
>
> I let the device do the ramping, not the driver. This also means I have to reset
> the state of the device (current limit) after ramping. The easiest way to to
> this, is using CMD_IDLE of the device driver. This is currently not done in the
> hv.c class driver. |
21 Nov 2017, Stefan Ritt, Bug Report, bug in init of hv class driver
|
> bug in init
> -----------
>
> I used the lv.c class driver, combined with a custom device driver, to control
> our Keithley2611B source meter. This to set negative voltage on Si detectors.
>
> In the 'init' routing, the class driver sets the hv:
>
> hv_info->demand_mirror[i] = MIN(hv_info->demand[i], hv_info->voltage_limit[i]);
>
> This fails for negative voltage, as it sets the (negative) voltage limit, instead
> of the demand voltage. A simple 'fabs' solves this.
>
> suggestion for 'idle'
> ---------------------
>
> I let the device do the ramping, not the driver. This also means I have to reset
> the state of the device (current limit) after ramping. The easiest way to to
> this, is using CMD_IDLE of the device driver. This is currently not done in the
> hv.c class driver.
I can't find the line you quote in the class driver. Why don't you make a git pull request
and I will approve it.
The original idea behind the hv driver is that all voltages in the ODB and the class driver are
positive. If you have a negative power supply, then the voltage is inverted at the device
driver level. That's why you have MIN and MAX in the class driver.
Stefan |
21 Nov 2017, Konstantin Olchanski, Bug Report, bug in init of hv class driver
|
>
> The original idea behind the hv driver is that all voltages in the ODB and the class driver are
> positive. If you have a negative power supply, then the voltage is inverted at the device
> driver level. That's why you have MIN and MAX in the class driver.
>
This rings a bell. I used the hv class driver to write a frontend for the L1440 mainframe (negative voltage),
on ODB it will be positive values, when writing to the device I had to add a minus sign,
and when reading back they came back negative and I had to add an fabs() in the comparison
between readback and demand.
Persons with bipolar power supplies need not apply.
K.O. |
15 Nov 2017, Andreas Knecht, Suggestion, Feature request: Separate ODB flag to show programs on "Programs page"
|
Currently one has to set the required flag in the ODB (e.g., /Programs/Logger/Required) to "y" for the program
to appear on the "Programs page" and being able to start and stop the program easily.
However, if one wants to run with the "Prevent start on required progs" in /Experiment enabled, all the
programs in the "Programs page" need to be running and one cannot have one of them stopped while still
taking a run.
It would be nice to separate these two functionalities: Have a flag that makes the program appear on the
"Programs page" and have a flag that controls the "Prevent start on required frogs" functionality. |
17 Nov 2017, Konstantin Olchanski, Suggestion, Feature request: Separate ODB flag to show programs on "Programs page"
|
> Currently one has to set the required flag in the ODB (e.g., /Programs/Logger/Required) to "y" for the program
> to appear on the "Programs page" and being able to start and stop the program easily.
>
> However, if one wants to run with the "Prevent start on required progs" in /Experiment enabled, all the
> programs in the "Programs page" need to be running and one cannot have one of them stopped while still
> taking a run.
>
> It would be nice to separate these two functionalities: Have a flag that makes the program appear on the
> "Programs page" and have a flag that controls the "Prevent start on required frogs" functionality.
I agree. All the programs should be always visible on the "programs" page, there should be /Programs/xxx/hidden to
hide them, and /Programs/xxx/required should be used for "Prevent start on required progs".
K.O. |
21 Nov 2017, Stefan Ritt, Suggestion, Feature request: Separate ODB flag to show programs on "Programs page"
|
> > Currently one has to set the required flag in the ODB (e.g., /Programs/Logger/Required) to "y" for the program
> > to appear on the "Programs page" and being able to start and stop the program easily.
> >
> > However, if one wants to run with the "Prevent start on required progs" in /Experiment enabled, all the
> > programs in the "Programs page" need to be running and one cannot have one of them stopped while still
> > taking a run.
> >
> > It would be nice to separate these two functionalities: Have a flag that makes the program appear on the
> > "Programs page" and have a flag that controls the "Prevent start on required frogs" functionality.
>
> I agree. All the programs should be always visible on the "programs" page, there should be /Programs/xxx/hidden to
> hide them, and /Programs/xxx/required should be used for "Prevent start on required progs".
Konstantin, since you wrote the current "Programs" page, can you add that feature to the display (well, when you have time). I guess we
event don't have to change the subdirectory structure (which might lead to incopatibilities), but just show a program if the "Start command"
is non-null. If there is no start command, it does not make sense to start that program, so it can be hidden.
Stefan |
02 Nov 2017, Konstantin Olchanski, Bug Fix, Fixed mlogger memory corruption, updated mxml
|
I the agdaq system I see memory corruption in the mlogger. There were at least two bugs: one
memory allocation error in mxml and one incorrect memset() in mlogger.cxx. The mxml bug is fixed
in the mxml repository, mlogger.cxx bug is fixed in the midas-2017-10 branch.
I suggest that all update mxml to the latest version: (without waiting for the new midas release)
https://bitbucket.org/tmidas/mxml/commits/branch/master
K.O. |
13 Oct 2017, Konstantin Olchanski, Info, odb multithread support repaired
|
multithreaded access to odb was implemented back in 2013-2014. but recently a bug surfaced -
there was a race condition in the odb locking code against cm_watchdog(). Somehow this only
affected the mserver for the DRAGON experiment at TRIUMF. This is now fixed on the branch
feature/midas-2017-10. (this branch collects all the code that needs additional testing before
merging into develop and becoming the next release of midas).
K.O. |
11 Oct 2017, Konstantin Olchanski, Info, added support for ucLinux
|
Support for building for ucLinux was added to MIDAS. I use the emcraft toolchain and userland on
some kind of embedded ARM CPU that does not have an MMU. See the Makefile for details. The
main difference of ucLinux is lack of fork(), which cannot be done without an MMU. Not everything
works, but at the least I can run a frontend and connect to an experiment on a remote host
computer (mserver connection). K.O. |
27 Jul 2017, Wes Gohn, Suggestion, Increasing Max Number of Frontends
|
Below are the steps we used to increase the maximum number of frontends that we could run.
In midas.h
#define MAX_CLIENTS 64
changed to
#define MAX_CLIENTS 128
In msystem.h:
#define MAX_RPC_CONNECTION 64
changed to
#define MAX_RPC_CONNECTION 128
In odb.c:
assert(sizeof(BUFFER_HEADER) == 16444);
GUESS: 256*64+60 = 16444, so change 64 to 128
changed to:
assert(sizeof(BUFFER_HEADER) == 32828); //256*128+60
DATABASE_HEADER = 64 + 64*DATABASE_CLIENT = 64 + 64*8256 = 528448
changed to:
DATABASE_HEADER = 64 + 128*DATABASE_CLIENT = 64 + 128*8256 = 1056832. |
10 Aug 2017, Stefan Ritt, Suggestion, Increasing Max Number of Frontends
|
The sizeof checks were originally invented by KO to check for binary compatibility between processes attached to the same ODB and event buffers. So if a
compiler generates different structure sizes due to different padding, one would see that immediately. I wonder however if the absolute numbers make sense
here. We could replace the 16444 by
NAME_LENGTH + 7*sizeof(INT) + MAX_CLIENTS *(NAME_LENGTH+13*sizeof(INT)+sizeof(float)+2*sizeof(DWORD)+MAX_EVENT_REQUESTS*4*sizeof(INT))
which makes this value automatically scale when one changes MAX_CLIENTS.
People of course have to be aware that if one changes MAX_CLIENTS, then all programs connected to the same ODB or event buffer need to be re-compiled
and the ODB needs to be re-created from an ASCII file, but at least this would avoid tedious manual calculations.
Any opinion?
Stefan
> Below are the steps we used to increase the maximum number of frontends that we could run.
>
> In midas.h
>
> #define MAX_CLIENTS 64
>
> changed to
>
> #define MAX_CLIENTS 128
>
> In msystem.h:
>
> #define MAX_RPC_CONNECTION 64
>
> changed to
>
> #define MAX_RPC_CONNECTION 128
>
> In odb.c:
>
> assert(sizeof(BUFFER_HEADER) == 16444);
>
> GUESS: 256*64+60 = 16444, so change 64 to 128
>
> changed to:
>
> assert(sizeof(BUFFER_HEADER) == 32828); //256*128+60
>
>
>
> DATABASE_HEADER = 64 + 64*DATABASE_CLIENT = 64 + 64*8256 = 528448
>
> changed to:
>
> DATABASE_HEADER = 64 + 128*DATABASE_CLIENT = 64 + 128*8256 = 1056832. |
12 Aug 2017, Konstantin Olchanski, Suggestion, Increasing Max Number of Frontends
|
The checks for byte sizes of critical data structures have been added to ensure (enforce) binary compatibility
of midas with itself on different platforms (32-bit and 64-bit intel, on PPC, on ARM, etc).
This has worked well in the past and helped avoid problems and subtle bugs in the transition
from 32-bit to 64-bit machines a few years ago. Of course now 32-bit machines are back
as ARM CPUs and FPGA synthetic CPUs.
Replacing the checks with "computed" values will defeat this purpose because the values may be computed
differently on different machines.
Specifically as proposed by Stefan, sizeof(int) can change depending on the target machine and depending
on the compiler settings.
Of course this needs to be balanced against flexibility to adjust important settings like MAX_CLIENTS and MAX_EVENT_REQUESTS.
I would say the present system is just fine. You can change MAX_CLIENTS, rebuild MIDAS and it will not run (assert failure) giving
you an indication that you are doing something non-trivial that will cause problems if you do it without thinking about it.
For example, one may think nothing of changing midas.h and recompiling MIDAS. But having to change odb.c
may ring the little bell to tell you that you *also* have to rebuild *all* of your frontends. Even one unrebuilt frontend
will corrupt all shared memory and crash everything.
I guess one other way to look at this is as a balance between something a few people do rarely against
a function that protects everybody all the time.
That said, I think the checks should be reworked, instead of an assert failure they should give the error message
and tell the user exactly what number to adjust in the size test. Also some checks are obsolete, there is no longer
need to check the size of many ODB structures (equipment, etc). Once we are done with the db_get_record() rework,
only checks for data structures in shared memory shall remain.
As the bottom line, to change MAX_CLIENTS, you already have to edit midas.h, asking you to also edit odb.c does
not add much to the burden.
P.S. We are thinking how to make all these values dynamically changable, but basically it requires rolling out
a new binary-incompatible version of MIDAS with added bugs. Maybe some day.
K.O.
> The sizeof checks were originally invented by KO to check for binary compatibility between processes attached to the same ODB and event buffers. So if a
> compiler generates different structure sizes due to different padding, one would see that immediately. I wonder however if the absolute numbers make sense
> here. We could replace the 16444 by
>
> NAME_LENGTH + 7*sizeof(INT) + MAX_CLIENTS *(NAME_LENGTH+13*sizeof(INT)+sizeof(float)+2*sizeof(DWORD)+MAX_EVENT_REQUESTS*4*sizeof(INT))
>
> which makes this value automatically scale when one changes MAX_CLIENTS.
>
> People of course have to be aware that if one changes MAX_CLIENTS, then all programs connected to the same ODB or event buffer need to be re-compiled
> and the ODB needs to be re-created from an ASCII file, but at least this would avoid tedious manual calculations.
>
> Any opinion?
>
> Stefan
>
>
> > Below are the steps we used to increase the maximum number of frontends that we could run.
> >
> > In midas.h
> >
> > #define MAX_CLIENTS 64
> >
> > changed to
> >
> > #define MAX_CLIENTS 128
> >
> > In msystem.h:
> >
> > #define MAX_RPC_CONNECTION 64
> >
> > changed to
> >
> > #define MAX_RPC_CONNECTION 128
> >
> > In odb.c:
> >
> > assert(sizeof(BUFFER_HEADER) == 16444);
> >
> > GUESS: 256*64+60 = 16444, so change 64 to 128
> >
> > changed to:
> >
> > assert(sizeof(BUFFER_HEADER) == 32828); //256*128+60
> >
> >
> >
> > DATABASE_HEADER = 64 + 64*DATABASE_CLIENT = 64 + 64*8256 = 528448
> >
> > changed to:
> >
> > DATABASE_HEADER = 64 + 128*DATABASE_CLIENT = 64 + 128*8256 = 1056832. |
13 Aug 2017, Stefan Ritt, Suggestion, Increasing Max Number of Frontends
|
I agree that the binary compatibility checks are crucial. But I kind of find it strange if one gets an assert failure some where if one tries to change MAX_CLIENTS. It is then not straight
forward to relate both things and understand the consequences. That's why I put a comment next to the definition of MAX_CLIENTS saying:
/* note that if you change any of the following items, the ODB and the event shared memory buffers
become binary incopatible and one has to recompile ALL programs which are locally connected to the
ODB and to event buffers */
I think this is more descriptive than just a failing assert.
If you look carefully in my proposal below, you will see that I rather used
sizeof(INT)
and not
sizeof(int)
since as KO stated correctly sizeof(int) can change between different architectures. The derived type INT (all uppercase) has been carefully designed to have 32 bits on all architectures. So
it will NOT change between them. If it does change, then we have a principal problem and many more things will break down. We should therefore have something like
if (sizeof(INT) != 4) then severe_error_and_stop_all_programs()
Now given that sizeof(INT) is everywhere the same, we can use it in the test
sizeof(BUFFER_HEADER) == NAME_LENGTH + 7*sizeof(INT) + MAX_CLIENTS *(NAME_LENGTH+13*sizeof(INT)+sizeof(float)+2*sizeof(DWORD)+MAX_EVENT_REQUESTS*4*sizeof(INT))
which then basically tests the structure byte alignment and padding. The comment above should warn users to change MAX_CLIENTS without thinking.
Another strategy would be to put sizeof(BUFFER_HEADER) as the first two byes of the structure itself. We can the dynamically test the size of each bm_open_buffer(), and if the local size
differs from the one saved in the buffer header, the program refuses to start, so we know exactly which program should have to be recompiled. The downside of this would be that the
header structure has to be changed and we break binary compatibility with all existing programs. But maybe we should do this step once and be safe in the future.
Stefan
> The checks for byte sizes of critical data structures have been added to ensure (enforce) binary compatibility
> of midas with itself on different platforms (32-bit and 64-bit intel, on PPC, on ARM, etc).
>
> This has worked well in the past and helped avoid problems and subtle bugs in the transition
> from 32-bit to 64-bit machines a few years ago. Of course now 32-bit machines are back
> as ARM CPUs and FPGA synthetic CPUs.
>
> Replacing the checks with "computed" values will defeat this purpose because the values may be computed
> differently on different machines.
>
> Specifically as proposed by Stefan, sizeof(int) can change depending on the target machine and depending
> on the compiler settings.
>
> Of course this needs to be balanced against flexibility to adjust important settings like MAX_CLIENTS and MAX_EVENT_REQUESTS.
>
> I would say the present system is just fine. You can change MAX_CLIENTS, rebuild MIDAS and it will not run (assert failure) giving
> you an indication that you are doing something non-trivial that will cause problems if you do it without thinking about it.
>
> For example, one may think nothing of changing midas.h and recompiling MIDAS. But having to change odb.c
> may ring the little bell to tell you that you *also* have to rebuild *all* of your frontends. Even one unrebuilt frontend
> will corrupt all shared memory and crash everything.
>
> I guess one other way to look at this is as a balance between something a few people do rarely against
> a function that protects everybody all the time.
>
> That said, I think the checks should be reworked, instead of an assert failure they should give the error message
> and tell the user exactly what number to adjust in the size test. Also some checks are obsolete, there is no longer
> need to check the size of many ODB structures (equipment, etc). Once we are done with the db_get_record() rework,
> only checks for data structures in shared memory shall remain.
>
> As the bottom line, to change MAX_CLIENTS, you already have to edit midas.h, asking you to also edit odb.c does
> not add much to the burden.
>
> P.S. We are thinking how to make all these values dynamically changable, but basically it requires rolling out
> a new binary-incompatible version of MIDAS with added bugs. Maybe some day.
>
> K.O.
>
>
> > The sizeof checks were originally invented by KO to check for binary compatibility between processes attached to the same ODB and event buffers. So if a
> > compiler generates different structure sizes due to different padding, one would see that immediately. I wonder however if the absolute numbers make sense
> > here. We could replace the 16444 by
> >
> > NAME_LENGTH + 7*sizeof(INT) + MAX_CLIENTS *(NAME_LENGTH+13*sizeof(INT)+sizeof(float)+2*sizeof(DWORD)+MAX_EVENT_REQUESTS*4*sizeof(INT))
> >
> > which makes this value automatically scale when one changes MAX_CLIENTS.
> >
> > People of course have to be aware that if one changes MAX_CLIENTS, then all programs connected to the same ODB or event buffer need to be re-compiled
> > and the ODB needs to be re-created from an ASCII file, but at least this would avoid tedious manual calculations.
> >
> > Any opinion?
> >
> > Stefan
> >
> >
> > > Below are the steps we used to increase the maximum number of frontends that we could run.
> > >
> > > In midas.h
> > >
> > > #define MAX_CLIENTS 64
> > >
> > > changed to
> > >
> > > #define MAX_CLIENTS 128
> > >
> > > In msystem.h:
> > >
> > > #define MAX_RPC_CONNECTION 64
> > >
> > > changed to
> > >
> > > #define MAX_RPC_CONNECTION 128
> > >
> > > In odb.c:
> > >
> > > assert(sizeof(BUFFER_HEADER) == 16444);
> > >
> > > GUESS: 256*64+60 = 16444, so change 64 to 128
> > >
> > > changed to:
> > >
> > > assert(sizeof(BUFFER_HEADER) == 32828); //256*128+60
> > >
> > >
> > >
> > > DATABASE_HEADER = 64 + 64*DATABASE_CLIENT = 64 + 64*8256 = 528448
> > >
> > > changed to:
> > >
> > > DATABASE_HEADER = 64 + 128*DATABASE_CLIENT = 64 + 128*8256 = 1056832. |
13 Aug 2017, Konstantin Olchanski, Suggestion, Increasing Max Number of Frontends
|
> if (sizeof(INT) != 4) then severe_error_and_stop_all_programs()
Quick reply.
Today, for fixed size data types one should use uint32_t & co, see
stdint.h
https://en.wikipedia.org/wiki/C_data_types#stdint.h
https://en.wikipedia.org/wiki/C99 (scroll down and click to open "implementation -> compiler support"
The other popular convention is "u32" used by the Linux kernel, you will see it in the linux kernel drivers.
If I remember right, WORD and DWORD grow legs from the 16-bit Motorolla 68xxx processors,
VxWorks and the VME bus. At some point the data buses were 16-bit wide and that we the WORD.
(I do not think UNIX ever used the WORD/DWORD names, i.e. MacOS has int32_t and u_int32_t).
K.O. |
13 Aug 2017, Stefan Ritt, Suggestion, Increasing Max Number of Frontends
|
The type INT has been defined in 1989 when I for the first time sent data between a 16-bit MS-DOS computer and a 32-bit VAX computer (good old
days!). At that time, uint32_t was not available at all. So much for the historical background.
I agree that switching from INT to int32_t is getting closer to standards and might help new people better understand things. This means however to
touch all midas files and change about 5000 (!) locations:
BYTE -> uint8_t
WORD -> uint16_t
DWORD -> uint32_t
INT -> int32_t
Next we have the midas data types TID_xxx?
The nice thing now is that for example WORD and TID_WORD belong together and this is obvious. For uint16_t and TID_WORD is is not so obvious
any more, so I guess we should rename TID_WORD to TID_UINT16_t. The same fore
TID_BYTE -> TID_UINT8_T
TID_SBYTE -> TID_INT8_T
TID_WORD -> TID_UINT16_T
TID_DWORD -> TID_UINT32_T
TID_INT -> TID_INT32_T
But if we changer TID_XXX, the ASCII representations of the ODB break compatibility! Right now we have for example
[/Experiment]
midas http port = INT : 8080
which will become
[/Experiment]
midas http port = INT32_T : 8080
so one cannot load old ODB files any more!
With JSON encoding it's better because only the type number is stored, not the string. So INT -> 7 could stay, although in my opinion encoding the
type in an integer number is not good for readability. Nobody knows what "7" means as a type. You always have to do a look-up in midas.c and count
array indices manually.
I'm not sure how many experiments use the ASCII ODB format in one way or the other in some custom scripts. It might be that changing the format
might have severe side effects for some experiments, so before we undertake this endeavor I would like to get some feedback here on the forum
about people from other experiments and see what they think.
Stefan
> > if (sizeof(INT) != 4) then severe_error_and_stop_all_programs()
>
> Quick reply.
>
> Today, for fixed size data types one should use uint32_t & co, see
> stdint.h
> https://en.wikipedia.org/wiki/C_data_types#stdint.h
> https://en.wikipedia.org/wiki/C99 (scroll down and click to open "implementation -> compiler support"
>
> The other popular convention is "u32" used by the Linux kernel, you will see it in the linux kernel drivers.
>
> If I remember right, WORD and DWORD grow legs from the 16-bit Motorolla 68xxx processors,
> VxWorks and the VME bus. At some point the data buses were 16-bit wide and that we the WORD.
>
> (I do not think UNIX ever used the WORD/DWORD names, i.e. MacOS has int32_t and u_int32_t).
>
> K.O. |
04 Aug 2017, Konstantin Olchanski, Info, Notes on installing midas from scratch
|
Notes on installing midas from scratch. The instruction on midaswiki will be synced with this later.
cd ~/packages
git clone ...
cd midas
make
cd ~
mkdir ~/online
cd ~/online
~/git/midas/darwin/bin/odbinit --env
source env.sh
~/git/midas/darwin/bin/odbinit --exptab
~/git/midas/darwin/bin/odbinit
ls -la
send:online olchansk$ ls -la
total 2376
drwxr-xr-x 15 olchansk staff 510 Aug 4 15:34 .
drwxr-xr-x+ 244 olchansk staff 8296 Aug 4 15:33 ..
-rw-r--r-- 1 olchansk staff 0 Aug 4 15:34 .ALARM.SHM
-rw-r--r-- 1 olchansk staff 0 Aug 4 15:34 .ELOG.SHM
-rw-r--r-- 1 olchansk staff 0 Aug 4 15:34 .HISTORY.SHM
-rw-r--r-- 1 olchansk staff 0 Aug 4 15:34 .MSG.SHM
-rw-r--r-- 1 olchansk staff 1183808 Aug 4 15:34 .ODB.SHM
-rw-r--r-- 1 olchansk staff 8 Aug 4 15:34 .ODB_SIZE.TXT
-rw-r--r-- 1 olchansk staff 15 Aug 4 15:34 .SHM_HOST.TXT
-rw-r--r-- 1 olchansk staff 12 Aug 4 15:34 .SHM_TYPE.TXT
-rw-r--r-- 1 olchansk staff 0 Aug 4 15:34 .SYSMSG.SHM
-rw-r--r-- 1 olchansk staff 341 Aug 4 15:33 env.csh
-rw-r--r-- 1 olchansk staff 322 Aug 4 15:33 env.sh
-rw-r--r-- 1 olchansk staff 40 Aug 4 15:34 exptab
-rw-r--r-- 1 olchansk staff 287 Aug 4 15:34 midas.log
send:online olchansk$
odbedit ### works
mhttpd ### bombs, requires SSL certificate https://bitbucket.org/tmidas/midas/issues/57/initial-mhttpd-should-bind-to-localhost
odbedit ### cd /experiment, set "http redirect to https" to no, set "midas https port" to 0
mhttpd ### runs now
connect to http://localhost:8080 ### status page works
restart mhttpd as mhttpd -D
mlogger -D
fetest ### runs, prints time and data
start a run from web page ### works
### fetest generates crazy data rate https://bitbucket.org/tmidas/midas/issues/58/fetest-crazy-data-rate
### go to history, define plot for SLOW/SLOW, see sine wave ### works
### history is written to expt dir, no good, go to "history"
### data files written to expt dir, no good, go to "data"
### midas.log written to data dir, no good (want expt dir)
### elog written to expt dir, go to "elog"
### logger channel config is wrong - gzip compression and crc32c should be enabled by default
### history config is wrong - FILE per-variable history should be enabled by default
K.O.
|
07 Aug 2017, Stefan Ritt, Info, Notes on installing midas from scratch
|
Thanks for documenting this in detail. A few suggestions:
- is it really necessary to call odbedit three times? Maybe two or even three functions can be merged. Like you call odbinit, it checks if the environment is
there, and creates it automatically if not. Same with the exptab.
- can we make "http redirecto to https = n" and "midas https port = 0" as the default? Of course this has to go with binding to localhost only.
- does it make sense to define default directories for history, data files and midas.log? Maybe we could come with a "default scheme" which can then later
adjusted if needed.
- will you take care of the wrong logger channel config and history config?
Best regards,
Stefan
> Notes on installing midas from scratch. The instruction on midaswiki will be synced with this later.
>
> cd ~/packages
> git clone ...
> cd midas
> make
> cd ~
> mkdir ~/online
> cd ~/online
> ~/git/midas/darwin/bin/odbinit --env
> source env.sh
> ~/git/midas/darwin/bin/odbinit --exptab
> ~/git/midas/darwin/bin/odbinit
> ls -la
> send:online olchansk$ ls -la
> total 2376
> drwxr-xr-x 15 olchansk staff 510 Aug 4 15:34 .
> drwxr-xr-x+ 244 olchansk staff 8296 Aug 4 15:33 ..
> -rw-r--r-- 1 olchansk staff 0 Aug 4 15:34 .ALARM.SHM
> -rw-r--r-- 1 olchansk staff 0 Aug 4 15:34 .ELOG.SHM
> -rw-r--r-- 1 olchansk staff 0 Aug 4 15:34 .HISTORY.SHM
> -rw-r--r-- 1 olchansk staff 0 Aug 4 15:34 .MSG.SHM
> -rw-r--r-- 1 olchansk staff 1183808 Aug 4 15:34 .ODB.SHM
> -rw-r--r-- 1 olchansk staff 8 Aug 4 15:34 .ODB_SIZE.TXT
> -rw-r--r-- 1 olchansk staff 15 Aug 4 15:34 .SHM_HOST.TXT
> -rw-r--r-- 1 olchansk staff 12 Aug 4 15:34 .SHM_TYPE.TXT
> -rw-r--r-- 1 olchansk staff 0 Aug 4 15:34 .SYSMSG.SHM
> -rw-r--r-- 1 olchansk staff 341 Aug 4 15:33 env.csh
> -rw-r--r-- 1 olchansk staff 322 Aug 4 15:33 env.sh
> -rw-r--r-- 1 olchansk staff 40 Aug 4 15:34 exptab
> -rw-r--r-- 1 olchansk staff 287 Aug 4 15:34 midas.log
> send:online olchansk$
>
> odbedit ### works
> mhttpd ### bombs, requires SSL certificate https://bitbucket.org/tmidas/midas/issues/57/initial-mhttpd-should-bind-to-localhost
> odbedit ### cd /experiment, set "http redirect to https" to no, set "midas https port" to 0
> mhttpd ### runs now
> connect to http://localhost:8080 ### status page works
> restart mhttpd as mhttpd -D
> mlogger -D
> fetest ### runs, prints time and data
> start a run from web page ### works
> ### fetest generates crazy data rate https://bitbucket.org/tmidas/midas/issues/58/fetest-crazy-data-rate
> ### go to history, define plot for SLOW/SLOW, see sine wave ### works
> ### history is written to expt dir, no good, go to "history"
> ### data files written to expt dir, no good, go to "data"
> ### midas.log written to data dir, no good (want expt dir)
> ### elog written to expt dir, go to "elog"
> ### logger channel config is wrong - gzip compression and crc32c should be enabled by default
> ### history config is wrong - FILE per-variable history should be enabled by default
>
> K.O.
> |
04 May 2017, Thomas Lindner, Forum, MIDAS Workshop - July 26
|
Dear MIDAS users,
We would like to announce another MIDAS workshop at TRIUMF on July 26, 2017.
This will be a follow-on to the successful workshop two years ago. This
workshop will again be during one of Stefan Ritt's visit to TRIUMF.
The goal of the workshop would be to have a general discussion on the state of
MIDAS. We would have presentations from MIDAS developers on new MIDAS features
that are being implemented, with a particular focus on improvements to MIDAS web
functionality and analyzers. But equally important would be to hear the
experiences of MIDAS users. What aspects of MIDAS work well? Which aspects need
improving? What are the major trends in scientific computing that we should
adapt to? We always appreciate feedback and suggestions from the MIDAS
community (even when we have trouble finding time to make the changes!)
We will naturally broadcast the workshop on the web, but it would also be great
if anyone was interested in coming to TRIUMF in person to participate.
Thomas, on behalf of MIDAS developers |
11 Jul 2017, Thomas Lindner, Forum, MIDAS Workshop - July 26
|
Dear MIDAS users,
We have an approximately final agenda for the MIDAS workshop in two weeks. The
workshop will be on July 26, from 1-6PM (Vancouver time). The detailed agenda is
posted here:
https://indico.triumf.ca/conferenceDisplay.py?confId=2342
Next week I will provide details on how to remotely connect to the workshop.
Cheers,
Thomas
PS: as a reminder, the timetable and slides from the last MIDAS workshop can be
found here:
https://indico.psi.ch/conferenceTimeTable.py?confId=3793#20150715
> Dear MIDAS users,
>
> We would like to announce another MIDAS workshop at TRIUMF on July 26, 2017.
> This will be a follow-on to the successful workshop two years ago. This
> workshop will again be during one of Stefan Ritt's visit to TRIUMF.
>
> The goal of the workshop would be to have a general discussion on the state of
> MIDAS. We would have presentations from MIDAS developers on new MIDAS features
> that are being implemented, with a particular focus on improvements to MIDAS web
> functionality and analyzers. But equally important would be to hear the
> experiences of MIDAS users. What aspects of MIDAS work well? Which aspects need
> improving? What are the major trends in scientific computing that we should
> adapt to? We always appreciate feedback and suggestions from the MIDAS
> community (even when we have trouble finding time to make the changes!)
>
> We will naturally broadcast the workshop on the web, but it would also be great
> if anyone was interested in coming to TRIUMF in person to participate.
>
> Thomas, on behalf of MIDAS developers |
19 Jul 2017, Thomas Lindner, Forum, MIDAS Workshop - July 26
|
Dear MIDAS colleagues,
We will use Zoom for people making remote connections to the MIDAS workshop next week. The connection details
are shown below. You will need to install a Zoom application, which should happen automatically when clicking on the
first link below. It seemed to work pretty easily for me.
Cheers,
Thomas
_________________________________________
Hi there,
Thomas Lindner is inviting you to a scheduled Zoom meeting.
Topic: MIDAS workshop
Time: Jul 26, 2017 12:30 PM Pacific Time (US and Canada)
Join from PC, Mac, Linux, iOS or Android: https://zoom.us/j/749477537?pwd=-TSKYSiS0_k
Password: midas
Or iPhone one-tap (US Toll): +16465588656,,749477537# or +14086380968,,749477537#
Or Telephone:
Dial: +1 646 558 8656 (US Toll) or +1 408 638 0968 (US Toll)
Meeting ID: 749 477 537
International numbers available: https://zoom.us/zoomconference?m=0Bug-COhDHYndpVqRLnNST9H-uXrauWk
> Dear MIDAS users,
>
> We have an approximately final agenda for the MIDAS workshop in two weeks. The
> workshop will be on July 26, from 1-6PM (Vancouver time). The detailed agenda is
> posted here:
>
> https://indico.triumf.ca/conferenceDisplay.py?confId=2342
>
> Next week I will provide details on how to remotely connect to the workshop.
>
> Cheers,
> Thomas
>
> PS: as a reminder, the timetable and slides from the last MIDAS workshop can be
> found here:
>
> https://indico.psi.ch/conferenceTimeTable.py?confId=3793#20150715
>
>
>
> > Dear MIDAS users,
> >
> > We would like to announce another MIDAS workshop at TRIUMF on July 26, 2017.
> > This will be a follow-on to the successful workshop two years ago. This
> > workshop will again be during one of Stefan Ritt's visit to TRIUMF.
> >
> > The goal of the workshop would be to have a general discussion on the state of
> > MIDAS. We would have presentations from MIDAS developers on new MIDAS features
> > that are being implemented, with a particular focus on improvements to MIDAS web
> > functionality and analyzers. But equally important would be to hear the
> > experiences of MIDAS users. What aspects of MIDAS work well? Which aspects need
> > improving? What are the major trends in scientific computing that we should
> > adapt to? We always appreciate feedback and suggestions from the MIDAS
> > community (even when we have trouble finding time to make the changes!)
> >
> > We will naturally broadcast the workshop on the web, but it would also be great
> > if anyone was interested in coming to TRIUMF in person to participate.
> >
> > Thomas, on behalf of MIDAS developers |
25 Jul 2017, Thomas Lindner, Forum, MIDAS Workshop - July 26
|
Hi Folks,
I just realized I never provided the location for the meeting (for those at TRIUMF). It will be in the ISAC-II conference room.
Cheers,
Thomas
> Dear MIDAS colleagues,
>
> We will use Zoom for people making remote connections to the MIDAS workshop next week. The connection details
> are shown below. You will need to install a Zoom application, which should happen automatically when clicking on the
> first link below. It seemed to work pretty easily for me.
>
> Cheers,
> Thomas
>
> _________________________________________
>
> Hi there,
>
> Thomas Lindner is inviting you to a scheduled Zoom meeting.
>
> Topic: MIDAS workshop
> Time: Jul 26, 2017 12:30 PM Pacific Time (US and Canada)
>
> Join from PC, Mac, Linux, iOS or Android: https://zoom.us/j/749477537?pwd=-TSKYSiS0_k
> Password: midas
>
> Or iPhone one-tap (US Toll): +16465588656,,749477537# or +14086380968,,749477537#
>
> Or Telephone:
> Dial: +1 646 558 8656 (US Toll) or +1 408 638 0968 (US Toll)
> Meeting ID: 749 477 537
> International numbers available: https://zoom.us/zoomconference?m=0Bug-COhDHYndpVqRLnNST9H-uXrauWk
>
>
>
>
> > Dear MIDAS users,
> >
> > We have an approximately final agenda for the MIDAS workshop in two weeks. The
> > workshop will be on July 26, from 1-6PM (Vancouver time). The detailed agenda is
> > posted here:
> >
> > https://indico.triumf.ca/conferenceDisplay.py?confId=2342
> >
> > Next week I will provide details on how to remotely connect to the workshop.
> >
> > Cheers,
> > Thomas
> >
> > PS: as a reminder, the timetable and slides from the last MIDAS workshop can be
> > found here:
> >
> > https://indico.psi.ch/conferenceTimeTable.py?confId=3793#20150715
> >
> >
> >
> > > Dear MIDAS users,
> > >
> > > We would like to announce another MIDAS workshop at TRIUMF on July 26, 2017.
> > > This will be a follow-on to the successful workshop two years ago. This
> > > workshop will again be during one of Stefan Ritt's visit to TRIUMF.
> > >
> > > The goal of the workshop would be to have a general discussion on the state of
> > > MIDAS. We would have presentations from MIDAS developers on new MIDAS features
> > > that are being implemented, with a particular focus on improvements to MIDAS web
> > > functionality and analyzers. But equally important would be to hear the
> > > experiences of MIDAS users. What aspects of MIDAS work well? Which aspects need
> > > improving? What are the major trends in scientific computing that we should
> > > adapt to? We always appreciate feedback and suggestions from the MIDAS
> > > community (even when we have trouble finding time to make the changes!)
> > >
> > > We will naturally broadcast the workshop on the web, but it would also be great
> > > if anyone was interested in coming to TRIUMF in person to participate.
> > >
> > > Thomas, on behalf of MIDAS developers |
25 Jul 2017, Stefan Ritt, Info, Current git repository "develop" branch broken
|
Dear all,
we are currently undergoing major modifications in the way mhttpd is working. I realized that
we are now at a state where mhttpd is currently broken, and it will take a few weeks in order to
get everything converted to the new scheme we plan to use. Therefore I moved the git branch
"master" to the last known stable version of midas. So for any practical purpose, please do
NOT update your "develop" branch until further notice. To get the last stable version, you can
do a
$ git checkout master
which moves you right before we started to make major modifications. Once we are finished,
we will announce this here in the forum.
Best regards,
Stefan |
13 Jul 2017, Konstantin Olchanski, Info, implemented: json-rpc batch requests
|
The mhttpd json-rpc interface now implements batch requests per
http://www.jsonrpc.org/specification#batch
In the nutshell, instead of a single request, one can send a json array of requests and receive a json
array of replies.
As a variance from the spec, the midas implementation executes the requests strictly in-order and
the array of replies corresponds exactly to the array of requests (the spec requires user to use the
"id" field to match replies to requests, in midas json-rpc, the 1st reply is always to the 1st request,
2nd reply is to the 2nd request and so forth).
See this in action look at resources/example.html and in resources/transition.html
K.O. |
19 Jun 2017, Thomas Lindner, Bug Report, mhttpd ODB editor changes string length, breaks
|
I guess this might be related to the changes in the last elog conversation; but
I'll break it out as a separate problem.
The new mhttpd ODB editor seems to resize all strings (not just strings that are
greater than 256 characters). So, when I change some string with the mhttpd ODB
editor to 'ffffff', then I find that the string size is now ~7 characters.
This might be fine in general; but it seems to cause a problem when dealing with
alarms. In particular, I find that if I try to set (through mhttpd) the
"execute command" for an alarm class or the "condition" for an alarm, then I get
into lots of trouble. For instance, I changed the "execute command" for my
alarm class through mhttpd; when associated alarms were triggered, I got errors
21:58:12 [feSourceEpics,ERROR] [odb.c:9133:db_get_record,ERROR] struct size
mismatch for "/Alarms/Classes/Alarm" (expected size: 348, size in ODB: 100)
21:58:12 [feSourceEpics,ERROR] [alarm.c:379:al_trigger_class,ERROR] Cannot get
alarm class record
This makes sense, since ALARM_CLASS has a fixed size
typedef struct {
BOOL write_system_message;
...
char execute_command[256];
...
char display_fgcolor[32];
} ALARM_CLASS;
so problems will clearly occur when I change the size and try to grab it:
ALARM_CLASS ac;
status = db_get_record1(hDB, hkeyclass, &ac, &size, 0, strcomb(alarm_class_str));
I guess that similar problems also occur if you edit the string for ALARM or
PROGRAM_INFO instances. These problems do not occur when I change my strings
with odbedit, which doesn't resize strings below 256.
I'm not sure what the proper solution is. A temporary solution is that the
mhttpd ODB editor shouldn't resize strings if the new size is less than 256
characters; in that case the size should be left as 256 characters.
This test was done with MIDAS git repository as of today:
commit 45a90dc329554f528485da121501daf6ecde100d |
21 Jun 2017, Thomas Lindner, Bug Report, mhttpd ODB editor changes string length, breaks
|
To follow up; with some help from Konstantin and Stefan, we realized that this
particular problem should already be fixed. While I was using the most recent version
of MIDAS, I hadn't rebuild the EPICS frontend programs when I was doing this test. Once
I did that the error no longer occurred. This is because the most recent version of
MIDAS includes a check that will resize these particular string variables before using
them (technically, this is included in db_get_record1()); this resizing only happens for
these couple strings that must have a fixed size.
We are still having a separate discussion about whether this treatment of string lengths
that need to have a fixed size can be further improved. Will update once discussion
converges.
> I guess this might be related to the changes in the last elog conversation; but
> I'll break it out as a separate problem.
>
> The new mhttpd ODB editor seems to resize all strings (not just strings that are
> greater than 256 characters). So, when I change some string with the mhttpd ODB
> editor to 'ffffff', then I find that the string size is now ~7 characters.
>
> This might be fine in general; but it seems to cause a problem when dealing with
> alarms. In particular, I find that if I try to set (through mhttpd) the
> "execute command" for an alarm class or the "condition" for an alarm, then I get
> into lots of trouble. For instance, I changed the "execute command" for my
> alarm class through mhttpd; when associated alarms were triggered, I got errors
>
> 21:58:12 [feSourceEpics,ERROR] [odb.c:9133:db_get_record,ERROR] struct size
> mismatch for "/Alarms/Classes/Alarm" (expected size: 348, size in ODB: 100)
> 21:58:12 [feSourceEpics,ERROR] [alarm.c:379:al_trigger_class,ERROR] Cannot get
> alarm class record
>
> This makes sense, since ALARM_CLASS has a fixed size
>
> typedef struct {
> BOOL write_system_message;
> ...
> char execute_command[256];
> ...
> char display_fgcolor[32];
> } ALARM_CLASS;
>
> so problems will clearly occur when I change the size and try to grab it:
>
> ALARM_CLASS ac;
> status = db_get_record1(hDB, hkeyclass, &ac, &size, 0, strcomb(alarm_class_str));
>
> I guess that similar problems also occur if you edit the string for ALARM or
> PROGRAM_INFO instances. These problems do not occur when I change my strings
> with odbedit, which doesn't resize strings below 256.
>
> I'm not sure what the proper solution is. A temporary solution is that the
> mhttpd ODB editor shouldn't resize strings if the new size is less than 256
> characters; in that case the size should be left as 256 characters.
>
> This test was done with MIDAS git repository as of today:
> commit 45a90dc329554f528485da121501daf6ecde100d |
20 Jun 2017, Richard Longland, Forum, High Rate
|
|
07 Jun 2017, Alberto Remoto, Forum, Increase MAX_EVENT_SIZE
|
Hello,
I am using a CAEN v1720 to digitise signal coming from 5 PMTs and I need to extend the read-
out window to 1ms.
Given the sampling frequency of 250 MHz, each event would consist of about 4.78 MB
Accordingly to the documentation I found in:
https://midas.triumf.ca/MidasWiki/index.php/Event_Buffer
- I modified the value of ODB /Experiment/MAX_EVENT_SIZE to 8 MB (I overestimated it in case
I will readout all 8 channels of the v1720)
- I modified the ODB key /Experiment/Buffer Sizes/SYSTEM to 512 MB (which allow to contain
about 100 events in the buffer)
The max_event_size in the frontend source code is set to 32 MB while the event_buffer size is
200 times the max_event_size. So I did not modify those values.
When I start a new run, the MIDAS crash and the ODB gets corrupted:
$ odbedit
[ODBEdit,ERROR] [odb.c:1134:db_open_database,ERROR] Different database format: Shared
memory is 262148000, program is 3
[ODBEdit,ERROR] [midas.c:2157:cm_connect_experiment1,ERROR] cannot open database
Unexpected error #326
Do you have any idea of what might be the problem?
The same thing happen if I reduce the buffer size to 128 MB.
The computer running MIDAS has 2 Quad CPU @ 2.83GHz and 4 GB RAM.
Thank you in advance for any help!
Alberto |
13 Apr 2017, Andreas Suter, Bug Report, stop form odbedit broken
|
when I try to stop a run from odbedit I get a core dump.
[ODBEdit1,INFO] Run #31 stopped odbedit: src/system.c:1223: ss_shm_flush:
Assertion `size == mmap_size[handle]' failed. Aborted (core dumped)
midas commit 53af92a5d0...
-----
I checked what happens if I try to stop a run via the mhttpd web-page: this
works! So what is different?
-----
I placed a issue (# 47) on bitbucket as well.
What is the preferred channel to report potential bugs (elog / bitbucket issues)? |
13 Apr 2017, Andreas Suter, Bug Report, stop form odbedit broken
|
> when I try to stop a run from odbedit I get a core dump.
>
> [ODBEdit1,INFO] Run #31 stopped odbedit: src/system.c:1223: ss_shm_flush:
> Assertion `size == mmap_size[handle]' failed. Aborted (core dumped)
>
> midas commit 53af92a5d0...
>
> -----
>
> I checked what happens if I try to stop a run via the mhttpd web-page: this
> works! So what is different?
>
> -----
>
> I placed a issue (# 47) on bitbucket as well.
>
> What is the preferred channel to report potential bugs (elog / bitbucket issues)?
I think I found the problem. Some ODB String values which are **automatically**
generated:
CSS File = STRING : [1024] mhttpd.css
Sqlite dir = STRING : [1024]
History dir = STRING : [1024]
Sound = STRING : [1000] alarm.mp3
are exceeding the MAX_STRING_LENGTH 256 (defined in msystem.h)
It looks as if this screws up quite a bit of the system! When deleting .ODB.SHM and
afterwards try to reload the ODB via a dump I previously made with odbedit, the
following is happening:
1) I get the error message that some strings are too long (exceeding
MAX_STRING_LENGTH). Unfortunately the underlying routine doesn't tell which ODB
variables this is.
2) After this reload, essentially nothing is working anymore. Any client I tried to
start just crashed.
Since it seems that the string length of MAX_STRING_LENGTH is very crucial I would
suggest that db_create_record (or whatever routine is dealing with it) checks for
STRING variables and ensures that they cannot exceed MAX_STRING_LENGTH.
When I shortened in my dump the above variables to MAX_STRING_LENGTH, regenerated the
ODB, also the 'stop' Problem in odbedit is gone. |
15 Apr 2017, Konstantin Olchanski, Bug Report, stop form odbedit broken
|
> > when I try to stop a run from odbedit I get a core dump.
> >
> > [ODBEdit1,INFO] Run #31 stopped odbedit: src/system.c:1223: ss_shm_flush:
> > Assertion `size == mmap_size[handle]' failed. Aborted (core dumped)
> >
I am quite puzzled by this situation. We have seen the above error before, tried to track it down, failed. I was
always thinking this is some kind of strange size mismatch between odb size and shared memory size and
shared memory save file odb.shm size.
Now with your information, it looks like it is memory corruption.
I always thought there is no length limit to odb strings, except for the odb api problem where you have to
know the maximum string length for db_get_value() & co otherwise long strings will be corrupted. Today
nobody uses fixed size buffers, either db_get_value() allocates the string of correct size (replacing buffer
overflow errors with memory leak errors), or return std::string.
I shall check on the use of MAX_STRING_SIZE at least in odb itself...
The default value 256 seems to be too small for today's use. (if you want to store json data, web page
fragments, etc).
K.O.
> > midas commit 53af92a5d0...
> >
> > -----
> >
> > I checked what happens if I try to stop a run via the mhttpd web-page: this
> > works! So what is different?
> >
> > -----
> >
> > I placed a issue (# 47) on bitbucket as well.
> >
> > What is the preferred channel to report potential bugs (elog / bitbucket issues)?
>
> I think I found the problem. Some ODB String values which are **automatically**
> generated:
>
> CSS File = STRING : [1024] mhttpd.css
> Sqlite dir = STRING : [1024]
> History dir = STRING : [1024]
> Sound = STRING : [1000] alarm.mp3
>
> are exceeding the MAX_STRING_LENGTH 256 (defined in msystem.h)
>
> It looks as if this screws up quite a bit of the system! When deleting .ODB.SHM and
> afterwards try to reload the ODB via a dump I previously made with odbedit, the
> following is happening:
>
> 1) I get the error message that some strings are too long (exceeding
> MAX_STRING_LENGTH). Unfortunately the underlying routine doesn't tell which ODB
> variables this is.
>
> 2) After this reload, essentially nothing is working anymore. Any client I tried to
> start just crashed.
>
> Since it seems that the string length of MAX_STRING_LENGTH is very crucial I would
> suggest that db_create_record (or whatever routine is dealing with it) checks for
> STRING variables and ensures that they cannot exceed MAX_STRING_LENGTH.
>
> When I shortened in my dump the above variables to MAX_STRING_LENGTH, regenerated the
> ODB, also the 'stop' Problem in odbedit is gone. |
15 Apr 2017, Konstantin Olchanski, Bug Report, MAX_STRING_LENGTH, stop form odbedit broken
|
>
> I shall check on the use of MAX_STRING_LENGTH at least in odb itself...
>
Ok, I looked at the use of MAX_STRING_LENGTH in ODB (odb.c):
a) it is not used in any critical places for the database itself, so it is not a limit on maximum length of TID_STRING data. good.
b) it is used in the code for saving/loading odb from .odb files (old format), not sure how it works against overlong strings, but probably
truncates/corrupts/crashes.
c) it is used in the code for saving odb to odb.xml files. Overlong strings are truncated (I added a message about it).
d) code for loading/saving to json files handles overlong strings okey.
e) odbedit "ls" truncates overlong strings, mhttpd has some oddities against overlong strings.
f) db_sprintf() truncates string text to MAX_STRING_LENGTH to avoid output buffer overflow (should use db_snprintf() instead).
Conclusion, overlong strings should be okey, but do not use the old .odb and .xml save files. (mlogger saves odb to output .mid file in xml
format, we should switch it to use json format).
> > CSS File = STRING : [1024] mhttpd.css
> > Sqlite dir = STRING : [1024]
> > History dir = STRING : [1024]
> > Sound = STRING : [1000] alarm.mp3
> > are exceeding the MAX_STRING_LENGTH 256 (defined in msystem.h)
So these should not cause any corruption or problem unless actual content length exceeds 255 bytes,
even then they are okey if odb is only saved and loaded into json files.
> > 1) I get the error message that some strings are too long (exceeding
> > MAX_STRING_LENGTH). Unfortunately the underlying routine doesn't tell which ODB
> > variables this is.
this is in db_check_record(), where it compares odb content with user-supplied data descriptions (there is no system-supplied
data descriptions with strings longer than MAX_STRING_LENGTH).
so I think what happened is you created a data structure with overlong strings, passed it to db_paste() or something,
db_check_record() complained about it, and db_paste() corrupted memory.
> >
> > 2) After this reload, essentially nothing is working anymore. Any client I tried to start just crashed.
> >
Somebody corrupted some shared memory, most likely it was db_paste() corrupted odb shared memory.
K.O. |
15 Apr 2017, Konstantin Olchanski, Bug Report, MAX_STRING_LENGTH, stop form odbedit broken
|
> >
> > I shall check on the use of MAX_STRING_LENGTH at least in odb itself...
> >
>
> Ok, I looked at the use of MAX_STRING_LENGTH in ODB (odb.c):
>
Fixed a small buglet, now saving and reloading odb in the old ".odb" format will silently truncate all overlong strings to 256 bytes. (I think it always did that).
K.O. |
19 Apr 2017, Stefan Ritt, Bug Report, MAX_STRING_LENGTH, stop form odbedit broken
|
> Fixed a small buglet, now saving and reloading odb in the old ".odb" format will silently truncate all overlong strings to 256 bytes. (I think it always did that).
Not sure that we want that. There might be cases where people want to store long strings. I would remove the truncation completely when saving .odb or .xml files, and fix the load routines to
deal with overlong strings.
Stefan |
22 Apr 2017, Konstantin Olchanski, Bug Report, MAX_STRING_LENGTH, stop form odbedit broken
|
> > Fixed a small buglet, now saving and reloading odb in the old ".odb" format will silently truncate all overlong strings to 256 bytes. (I think it always did that).
>
> Not sure that we want that. There might be cases where people want to store long strings. I would remove the truncation completely when saving .odb or .xml files, and fix the load routines to
> deal with overlong strings.
>
Since I just looked at the code for reading/writing .odb format, I see that it uses fixed size buffer for reading lines from a file,
currently 2*MAX_STRING_LENGTH). I am not in the mood to rewrite and retest all that code. Never looked at the xml reader,
probably has same problem (xml writer truncates long strings via truncation in db_sprintf()).
Since we already have the json odb reader/writer that handles unlimited string length correctly (also handles unicode and
unusual odb names), perhaps we should make json as the default and be done with it.
K.O. |
06 Jun 2017, Konstantin Olchanski, Bug Report, MAX_STRING_LENGTH, stop form odbedit broken
|
> ... the xml reader, probably has same problem
> ... xml writer truncates long strings via truncation in db_sprintf()
Removed truncation of overlong strings in the xml writer and confirmed that xml reader handles them correctly (always loaded overlong strings correctly).
Both JSON and XML odb dumps now handle strings of unlimited size correctly.
K.O. |
19 Apr 2017, Stefan Ritt, Bug Report, MAX_STRING_LENGTH, stop form odbedit broken
|
ODB name lengths (the name of a key) are limited to 256 characters, the length of strings in the ODB should NOT be limited. At some point we wanted to have complete web pages inside the ODB,
which for sure are longer than 256 characters. While this was the idea, I see now that db_paste & co. is hopelessly broken. To fix it, everything should be changed to std::string which is in my opinion
the only 'clean' solution. That would also remove the cumbersome strlcpy and strlcat.
But looking at odb.c, replacing everything with std::string would probably take a brave programmer a couple of weeks. Not sure if we should dive into that adventure right now. The quick fix would be:
a) The strings "CSS File", "Sqlite dir" etc. reported below get reduced to 256 characters (MAX_STRING_LENGTH). The value of 256 characters came from the file system limitation in linux (some many
years ago), where a full path of a file could not exceed 256 characters. Not sure if this limit is still valid today, but having all file names in the ODB limited to 256 characters is maybe not a bad idea
anyhow (who wants to type in file names with more than 256 characters ???).
b) Change the max string length in db_paste to 1024 to cover the few exceptions above.
If we go with a), KO has to change his ODB file names, in case of b) I can do the change.
So what is your opinion?
Best regards,
Stefan
> >
> > I shall check on the use of MAX_STRING_LENGTH at least in odb itself...
> >
>
> Ok, I looked at the use of MAX_STRING_LENGTH in ODB (odb.c):
>
> a) it is not used in any critical places for the database itself, so it is not a limit on maximum length of TID_STRING data. good.
> b) it is used in the code for saving/loading odb from .odb files (old format), not sure how it works against overlong strings, but probably
> truncates/corrupts/crashes.
> c) it is used in the code for saving odb to odb.xml files. Overlong strings are truncated (I added a message about it).
> d) code for loading/saving to json files handles overlong strings okey.
> e) odbedit "ls" truncates overlong strings, mhttpd has some oddities against overlong strings.
> f) db_sprintf() truncates string text to MAX_STRING_LENGTH to avoid output buffer overflow (should use db_snprintf() instead).
>
> Conclusion, overlong strings should be okey, but do not use the old .odb and .xml save files. (mlogger saves odb to output .mid file in xml
> format, we should switch it to use json format).
>
> > > CSS File = STRING : [1024] mhttpd.css
> > > Sqlite dir = STRING : [1024]
> > > History dir = STRING : [1024]
> > > Sound = STRING : [1000] alarm.mp3
> > > are exceeding the MAX_STRING_LENGTH 256 (defined in msystem.h)
>
> So these should not cause any corruption or problem unless actual content length exceeds 255 bytes,
> even then they are okey if odb is only saved and loaded into json files.
>
> > > 1) I get the error message that some strings are too long (exceeding
> > > MAX_STRING_LENGTH). Unfortunately the underlying routine doesn't tell which ODB
> > > variables this is.
>
> this is in db_check_record(), where it compares odb content with user-supplied data descriptions (there is no system-supplied
> data descriptions with strings longer than MAX_STRING_LENGTH).
>
> so I think what happened is you created a data structure with overlong strings, passed it to db_paste() or something,
> db_check_record() complained about it, and db_paste() corrupted memory.
>
> > >
> > > 2) After this reload, essentially nothing is working anymore. Any client I tried to start just crashed.
> > >
>
> Somebody corrupted some shared memory, most likely it was db_paste() corrupted odb shared memory.
>
> K.O. |
22 Apr 2017, Konstantin Olchanski, Bug Report, MAX_STRING_LENGTH, stop form odbedit broken
|
> ODB name lengths (the name of a key) are limited to 256 characters, the length of strings in the ODB should NOT be limited.
Right, I was not ever aware of such limitation until I just now looked at the .odb and .xml writing code. Definitely string length
is truncated to MAX_STRING_LENGTH on writing, chokes or truncates on reading.
The new json reader/writer handles overlength strings correctly. I would say we should deprecate the old formats and go forward
with json. Most current software can work with json data much easier than xml or custom .odb.
> I see now that db_paste & co. is hopelessly broken. To fix it, everything should be changed to std::string which is in my opinion
> the only 'clean' solution. That would also remove the cumbersome strlcpy and strlcat.
Yes, that's the code for reading .odb format.
>
> But looking at odb.c, replacing everything with std::string would probably take a brave programmer a couple of weeks. Not sure if we should dive into that adventure right now.
>
I agree. Too much of an adventure.
Simpler solution could be add a db_get_data(), db_get_value() that allocates a data buffer of correct size (user has to remember to free it).
> a) The strings "CSS File", "Sqlite dir" etc. reported below get reduced to 256 characters (MAX_STRING_LENGTH).
We should fix the inconsistency, my vote is it should be either MAX_STRING_LENGTH or PATH_MAX (from limits.h).
K.O. |
02 May 2017, Konstantin Olchanski, Bug Report, mhttpd inline-editor and web MAX_STRING_LENGTH, stop form odbedit broken
|
> > I shall check on the use of MAX_STRING_LENGTH at least in odb itself...
Also tested the web interface:
In the odb editor, overlong strings show truncated to MAX_STRING_LENGTH (via db_sprintf()),
but the odb inline-editor can handle overlong strings correctly.
The inline-editor implementation that uses ODBSet() had a string length limitation to maximum
URL length (ODBSet uses AJAX jset with call parameters encoded into the URL).
I now converted the inline-editor to use the json-rpc api (uses http post) and I confirm that this can handle
arbitrary long strings.
K.O. |
24 Apr 2017, Stefan Ritt, Bug Report, stop form odbedit broken
|
> CSS File = STRING : [1024] mhttpd.css
> Sqlite dir = STRING : [1024]
> History dir = STRING : [1024]
> Sound = STRING : [1000] alarm.mp3
After a quick discussion with Konstantin, I changed these strings to a length of 256 chars
(MAX_STRING_LENGTH). Actually all changes I had to made was on code introduced by KO, so I hope I
did everything correctly. He should carefully check my changes (actually I would have preferred if he
could change his code himself...).
I agree with KO that the preferred format for saving the ODB should be JSON, but there might be
experiments with have some old ODB dumps in other formats, so we should not remove the possibility to
read those formats back.
Stefan |
15 Apr 2017, Konstantin Olchanski, Bug Report, where to report bugs, stop form odbedit broken
|
>
> What is the preferred channel to report potential bugs (elog / bitbucket issues)?
>
I prefer that bugs be reported on this forum here. Most bugs affect every midas user, so best to notify the
whole community.
Bitbucket have a nice bug tracking system, but there is a couple of problems:
a) only a couple of people see the bug reports for midas, minimizing probability of fix.
b) bug reports on bitbucket stay on bitbucket, we do not have backups and archives
of bug reports, if tomorrow bitbucket goes belly-up, our bug database goes poof! with them.
c) I can search the bug report on this forum using "grep" (i am sure there is a "find" button
on the bitbucket web page and it finds what I am looking for right away).
So if you have a bug report that others should know about (i.e. the "+" button on the status page does
not work), I say use this forum.
If you have a bug that you think is unique to you, not interesting to others (i.e. my midas crashes when I
do X), file it on bitbucket. If you see no activity on the bitbucket for a week or two, repost it here.
K.O. |
15 Apr 2017, Konstantin Olchanski, Bug Report, stop form odbedit broken
|
> when I try to stop a run from odbedit I get a core dump.
> [ODBEdit1,INFO] Run #31 stopped odbedit: src/system.c:1223: ss_shm_flush:
> Assertion `size == mmap_size[handle]' failed. Aborted (core dumped)
>
I am puzzled. The crash is at the very end of everything (save odb shared memory to odb.shm),
does the run actually stop, or the crash is before the run is fully stopped? (I guess if you want
to run more odbedit commands after stopping the run, so you care about not crashing).
K.O. |
16 May 2017, Konstantin Olchanski, Bug Report, problem with odb strings and db_get_record()
|
Suddenly the mhttpd odb inline editor is truncating the odb string entries to the actual length of the
stored string value, this causes db_get_record() explode with "structure mismatch" errors. (Not my
fault, You Honor! Honest!). For example, I see these errors from al_check() after changing
"/programs/foo/start command" - suddenly it cannot get the program_info record.
What a mess.
Actually, this is not a new mess, midas was always been rather brittle with db_get_record()
and db_open_record(), always unhappy if something goes wrong in odb, like a lost
entry in equipment statistics or an extra variable in equipment common, etc.
To patch it all up, I added a function db_get_record1() which knows the structure of the data
and can call db_check_record() to fix the odb structure and make db_get_record() happy.
Many places in midas now use it, making odb structure mismatches "self healing" in a way.
But when looking at uses of db_get_record(), I notices that in many places it can be trivially
replaced by one or two db_get_value(). I did change this in a couple of places in mhttpd.
This way of coding is more robust against unexpected contents in odb and is easier
to maintain going forward, when new odb entries must be added for new functionality.
Most uses of db_get_record() are now converted to db_get_record1(), except where it is
used in together with of db_open_record(). (which uses db_get_record() internally).
To fix the db_open_record() uses, I considered adding db_open_record1() which would
also know the data structure and automatically repair any mismatch, but I think instead of that,
I will switch them to use db_watch() (in conjunction with manual db_get_record()/get_record1()
and plain db_get_value()).
When adding automatic repair mechanism like this, one should beware of "update wars",
where two midas programs built against slightly different versions of midas would
each try to change odb in it's way, in an endless loop. (yes, it did happen, more than once).
One solution to this is to assign an "owner" to each data structure, the "consumers"
of the data have to deal with anything missing or unexpected. If they use db_get_value()
it should all be happy. (if the owner has to be reassigned, back to the wars again, until
everything is rebuilt against the same version of midas).
P.S. In languages lacking reflection, like C and C++, it is impossible to trivially implement
a mapping from a data structure to an external entity, such as db_get_record() to map C struct
into ODB. Many attempts have been made, i.e. ROOT CINT, all of them brittle, hard
to maintain, generally unsatisfactory. Java was the first mainstream language
to have reflection. Modern languages, such as Go, have reflection from day 1. Of course
all scripting languages, perl, python, javascript, always had reflection. The C++ language
standard will get reflections some day. Today one can easily do reflection in C++ using the Clang
compiler, the main reason for ROOT v6 switching from CINT to Clang.
K.O. |
31 May 2017, Konstantin Olchanski, Bug Report, problem with odb strings and db_get_record()
|
> What a mess.
The mess with db_get_record() and db_open_record() is even deeper than I thought. There are several anomalies.
Records opened by db_open_record() are later accessed via db_get_record() which requires
that the odb structure and the C structure match exactly.
Of course anybody can modify anything in odb at any time, so there are protections against
modifying the odb structures "from under" db_open_record():
a) db_open_record(MODE_WRITE) makes the odb structure immutable by setting the "exclusive" flag. This works well. In the past
there were problems with "exclusive mode" getting stuck behind dead clients, but these days it is efficiently cleaned and recovered
by the odb validation code at the start of all midas programs.
b) db_create_record(), db_reorder_key() and db_delete_key() refuse to function on watched/hotlinked odb structures. One would
think this is good, but there is a side-effect. If I run "odbedit watch /", all odb delete operations fail (including deletion of temporary
items in /system/tmp).
c) db_create_key() and db_set_data()/db_set_value() do not have such protections, and they can (and do) add new odb entries and
change size of existing entries (especially size of strings), and make db_get_record() fail. note that db_get_record() inside
db_open_record() fails silently and odb hotlinks mysteriously stop working.
One could keep fixing this by adding protections against modification of hotlinked odb structures, but unfortunately, one cannot tell
db_watch() hotlinks from db_open_record() hotlinks. Only the latter ones require protection. db_watch() does not require such
protections because it does not use db_get_record() internally, it leaves it to the user to sort out any mismatches.
Also it would be nice if "odbedit watch /" did not have the nasty side effect of making all odb unchangable (presently it only makes
things undeletable).
To sort it all out, I am moving in this direction:
1) replace all uses of db_get_record() with db_get_record1() which automatically cures any structure mismatch
2) replace all uses of db_open_record(MODE_READ) with db_watch() in conjunction with db_get_record1(). This is done in mfe.c
and seems to work ok.
2a) automatic repair of structure mismatch is presently defeated by db_create_record() refusing to work on hotlinked odb entries.
3) with db_get_record() and db_open_record(MODE_READ) removed from use, turn off hotlink protection in item (b) above. This will
fix problem (2a).
4) maybe replace db_open_record(MODE_WRITE) with explicit db_set_record(). I personally do not like it's "magical" operation,
when in fact, it is just a short hand for "db_get_key/db_set_record" hidden inside db_send_changed_records().
4a) db_open_record(MODE_WRITE) works well enough right now, no need to touch it.
K.O. |
31 May 2017, Konstantin Olchanski, Bug Report, problem with odb strings and db_get_record()
|
> 2) replace all uses of db_open_record(MODE_READ) with db_watch() in conjunction with db_get_record1().
Done to all in-tree programs, except for mana.c (not using it), sequencer.cxx (cannot test it) and a few places where watching a TID_INT.
Nothing more needs to be done, other than turn off the check for hotlink in db_create_record() & co (removed #define CHECK_OPEN_RECORD in odb.c).
K.O.
$ grep db_open_record src/* | grep MODE_READ
src/lazylogger.cxx: status = db_open_record(hDB, hKey, &run_state, sizeof(run_state), MODE_READ, NULL, NULL); // watch a TID_INT
src/mana.cxx: db_open_record(hDB, hkey, NULL, 0, MODE_READ, banks_changed, NULL);
src/mana.cxx: db_open_record(hDB, hkey, NULL, 0, MODE_READ, banks_changed, NULL);
src/mana.cxx: db_open_record(hDB, hkey, &out_info, sizeof(out_info), MODE_READ, NULL, NULL);
src/mana.cxx: db_open_record(hDB, hKey, ar_info, sizeof(AR_INFO), MODE_READ, update_request,
src/midas.c: status = db_open_record(hDB, hKey, &_requested_transition, sizeof(INT), MODE_READ, NULL, NULL);
src/mlogger.cxx: status = db_open_record(hDB, hKey, hist_log[index].buffer, size, MODE_READ, log_history, NULL);
src/mlogger.cxx: db_open_record(hDB, hVarKey, NULL, varkey.total_size, MODE_READ, log_system_history, (void *) (POINTER_T) index);
src/mlogger.cxx: db_open_record(hDB, hHistKey, NULL, size, MODE_READ, log_system_history, (void *) (POINTER_T) index);
src/odbedit.cxx: db_open_record(hDB, hKey, data, size, MODE_READ, key_update, NULL);
src/sequencer.cxx: status = db_open_record(hDB, hKey, &seq, sizeof(seq), MODE_READ, NULL, NULL);
8s-macbook-pro:midas 8ss$ |
06 Jun 2017, Konstantin Olchanski, Bug Report, problem with odb strings and db_get_record()
|
> Done to all in-tree programs, except for mana.c (not using it), sequencer.cxx (cannot test it) and a few places where watching a TID_INT.
> Nothing more needs to be done, other than turn off the check for hotlink in db_create_record() & co (removed #define CHECK_OPEN_RECORD in odb.c).
Fixed a bug in mfe.c - it was overwriting odb /eq/xxx/common with default values. fixed now.
Running with CHECK_OPEN_RECORD seems to work okey so far. Will test some more before proposing to make it the default.
K.O. |
02 Jun 2017, Stefan Ritt, Bug Report, problem with odb strings and db_get_record()
|
That all makes sense to me.
Stefan
> > What a mess.
>
> The mess with db_get_record() and db_open_record() is even deeper than I thought. There are several anomalies.
>
> Records opened by db_open_record() are later accessed via db_get_record() which requires
> that the odb structure and the C structure match exactly.
>
> Of course anybody can modify anything in odb at any time, so there are protections against
> modifying the odb structures "from under" db_open_record():
>
> a) db_open_record(MODE_WRITE) makes the odb structure immutable by setting the "exclusive" flag. This works well. In the past
> there were problems with "exclusive mode" getting stuck behind dead clients, but these days it is efficiently cleaned and recovered
> by the odb validation code at the start of all midas programs.
>
> b) db_create_record(), db_reorder_key() and db_delete_key() refuse to function on watched/hotlinked odb structures. One would
> think this is good, but there is a side-effect. If I run "odbedit watch /", all odb delete operations fail (including deletion of temporary
> items in /system/tmp).
>
> c) db_create_key() and db_set_data()/db_set_value() do not have such protections, and they can (and do) add new odb entries and
> change size of existing entries (especially size of strings), and make db_get_record() fail. note that db_get_record() inside
> db_open_record() fails silently and odb hotlinks mysteriously stop working.
>
> One could keep fixing this by adding protections against modification of hotlinked odb structures, but unfortunately, one cannot tell
> db_watch() hotlinks from db_open_record() hotlinks. Only the latter ones require protection. db_watch() does not require such
> protections because it does not use db_get_record() internally, it leaves it to the user to sort out any mismatches.
>
> Also it would be nice if "odbedit watch /" did not have the nasty side effect of making all odb unchangable (presently it only makes
> things undeletable).
>
> To sort it all out, I am moving in this direction:
>
> 1) replace all uses of db_get_record() with db_get_record1() which automatically cures any structure mismatch
> 2) replace all uses of db_open_record(MODE_READ) with db_watch() in conjunction with db_get_record1(). This is done in mfe.c
> and seems to work ok.
> 2a) automatic repair of structure mismatch is presently defeated by db_create_record() refusing to work on hotlinked odb entries.
> 3) with db_get_record() and db_open_record(MODE_READ) removed from use, turn off hotlink protection in item (b) above. This will
> fix problem (2a).
> 4) maybe replace db_open_record(MODE_WRITE) with explicit db_set_record(). I personally do not like it's "magical" operation,
> when in fact, it is just a short hand for "db_get_key/db_set_record" hidden inside db_send_changed_records().
> 4a) db_open_record(MODE_WRITE) works well enough right now, no need to touch it.
>
>
> K.O. |
31 May 2017, Konstantin Olchanski, Info, modified db_watch() arguments
|
for reasons unknown, db_watch() did not have an "info" parameter passed through to the callback
handler function, like it is done with db_open_record().
This omission makes it difficult to write db_watch handler functions that must watch multiple odb
trees - db_watch only delivers the hkey of the modified item inside the tree, leaving us with no
simple way to tell which tree it came from. An example of this is mfe.c watching the Common
structure for multiple equipments. There are other
uses for the "info" parameter, for example it is needed to implement c++ wrapper classes.
this omission is now corrected at the cost of changing the definition db_watch().
all uses of db_watch() in the midas tree have been corrected, but all out-of-tree programs
will not compile. For quick conversion, add a NULL parameter to db_watch() calls and add a
"void*info" parameter to your watch handler function.
sorry about this disturbance,
K.O. |
09 May 2017, Andreas Suter, Bug Report, mhttpd / history / export data
|
A handy feature of the history of the mhttpd is to export the data. However, this
seems to be broken. It currently only works if the run marker flag is activated by
fails otherwise. |
16 May 2017, Konstantin Olchanski, Bug Report, mhttpd / history / export data
|
> A handy feature of the history of the mhttpd is to export the data. However, this
> seems to be broken. It currently only works if the run marker flag is activated by
> fails otherwise.
imo, it never worked properly. I think the best hope for working "export" button
is an "export as json" which gives you basically the output of hs_read_buffer() in the json
format. With options for "raw data" or "binned, with mean, rms, min, max for each bin".
K.O. |
26 Apr 2017, Konstantin Olchanski, Info, added db_get_value_string()
|
Since we have been regularly running into problems with db_get_xxx(TID_STRING) and string buffers of mismatched size,
I now implemented db_get_value_string(hdb, hkey, key_name, index, &string, create).
It works the same as db_get_value(TID_STRING), except that the string value is returned into an std::string object,
memory allocation is handled by std::string and there is no string length limit (other than std::string limits).
Accessing string arrays is done explicitly via an "index" parameter, if index is bigger than odb array size DB_OUT_OF_RANGE is returned
without logging an error message (e.g. db_get_data_index() will log an error). This makes is safe to iterate over array entries with a simple
loop of index from 0 and up until db_get returns an error.
As before, if the odb entry does not exist, it will be created (if create==true) and initialized with the value of the string parameter (zero-terminated in odb).
There is also newly added db_set_value_string() and cm_get_path_string(). if you want more of these, please ask, or send patches.
K.O. |
26 Apr 2017, Stefan Ritt, Info, added db_get_value_string()
|
Just some thought for discussion:
Rather than "spicing up" the MIDAS library here and there with C++ objects such as std::string, wouldn't it make more sense to "cleanly" wrap an ODB value in a C++ class? We could use then
both APIs in parallel, and encourage the C++ API for new developments. We could then write things like:
ODBKEY<std::string> name("/Experiment/Name"); // constructor calls automatically db_get_value
name = "New Name"; // overloading the "=" operator, will call db_set_value()
or even
ODBKEY<std::vector, std::string> nameArray("...");
for (auto &s : nameArray)
std::cout << s << std::endl; // print all elements of string array
so we treat ODB arrays as vectors, which fixes array boundary violations nicely.
If the key does not exist, we could properly throw exceptions and forget about tons of nested return parameters for error conditions.
Many nice things could be done, common errors could be prevented, and we can do a "smooth" migration: We don't have to change the whole library completely, just where we feel it's currently
needed. So over time the code would be "objectified". Would be nice if we could rely on C++11 (like the "auto" feature above). Not sure about VxWorks, but every other OS should be fine.
Stefan
> Since we have been regularly running into problems with db_get_xxx(TID_STRING) and string buffers of mismatched size,
> I now implemented db_get_value_string(hdb, hkey, key_name, index, &string, create).
>
> It works the same as db_get_value(TID_STRING), except that the string value is returned into an std::string object,
> memory allocation is handled by std::string and there is no string length limit (other than std::string limits).
>
> Accessing string arrays is done explicitly via an "index" parameter, if index is bigger than odb array size DB_OUT_OF_RANGE is returned
> without logging an error message (e.g. db_get_data_index() will log an error). This makes is safe to iterate over array entries with a simple
> loop of index from 0 and up until db_get returns an error.
>
> As before, if the odb entry does not exist, it will be created (if create==true) and initialized with the value of the string parameter (zero-terminated in odb).
>
> There is also newly added db_set_value_string() and cm_get_path_string(). if you want more of these, please ask, or send patches.
>
> K.O. |
02 May 2017, Konstantin Olchanski, Info, added db_get_value_string()
|
> Just some thought for discussion:
Even more thoughts:
- c++ interface for odb. been there, done that. see VirtualODB in rootana. Can access live ODB, XML odb dump from midas file, even ODB through http/mhttpd (needs to be converted to json rpc api).
- c++11. the ROOT team made the decision for us, for all practical reasons. RH/SL/CentOS <= 6 are left for dead. (but we still have machines as old as SL4).
- odb interface via severe operator overloading. writing "let x=42;" to simulate the universe from the big band to thermal death is elegant (overload operator= of class "let")
but there is a surprise for naive programmer (long run time, large memory consumption)
- c++ exceptions. defective by design, as they do not carry enough debug information (i.e. java exceptions carry the full stack trace). in the typical case, it is impossible to tell
who and why is throwing exceptions. error handling is reduced to "main() { try { real_main } catch exception { printf("sorry!"); }}.
see http://stackoverflow.com/questions/1736146/why-is-exception-handling-bad
- converting midas to a new simplified odb api. typical use via db_get_value() is already one (or two) line of code that cannot be reduced (have to specify odb path, tid, etc),
so little is gained from using a different api. getting rid of db_find_key()/db_get_key() would be helpful, but with db_get_value(), they are hardly ever used in new code.
There are weaknesses in the current api, would be nice to fix them some day, and a c++ api seems like the right way to go:
- fix the race condition between db_enum_key() and db_delete_key(). (it is same as between "ls" and "rm" - with nfs, try to "rm" on one client while running "ls" on another, fun!)
- fix the race condition between odb handles (pointers into shared memory) and db_delete_key() (and whatever else moves the keys around). This means using full odb paths for
all odb api functions.
- make it all work nice multithreaded - the above race conditions would become only worse if we encourage heavy use of threads in midas.
And I do need a "no-odb" odb api for my "no-midas" midas frontend framework (where I can build and run the frontend without linking and connecting with a real midas),
in practice it means all api "get" calls have to take a "default" value that is returned right back to me when I am not connected (or linked) with a real odb.
Good fodder for this summer discussions.
K.O.
>
> Rather than "spicing up" the MIDAS library here and there with C++ objects such as std::string, wouldn't it make more sense to "cleanly" wrap an ODB value in a C++ class? We could use then
> both APIs in parallel, and encourage the C++ API for new developments. We could then write things like:
>
> ODBKEY<std::string> name("/Experiment/Name"); // constructor calls automatically db_get_value
> name = "New Name"; // overloading the "=" operator, will call db_set_value()
>
> or even
>
> ODBKEY<std::vector, std::string> nameArray("...");
> for (auto &s : nameArray)
> std::cout << s << std::endl; // print all elements of string array
>
> so we treat ODB arrays as vectors, which fixes array boundary violations nicely.
>
> If the key does not exist, we could properly throw exceptions and forget about tons of nested return parameters for error conditions.
>
> Many nice things could be done, common errors could be prevented, and we can do a "smooth" migration: We don't have to change the whole library completely, just where we feel it's currently
> needed. So over time the code would be "objectified". Would be nice if we could rely on C++11 (like the "auto" feature above). Not sure about VxWorks, but every other OS should be fine.
>
> Stefan
>
> > Since we have been regularly running into problems with db_get_xxx(TID_STRING) and string buffers of mismatched size,
> > I now implemented db_get_value_string(hdb, hkey, key_name, index, &string, create).
> >
> > It works the same as db_get_value(TID_STRING), except that the string value is returned into an std::string object,
> > memory allocation is handled by std::string and there is no string length limit (other than std::string limits).
> >
> > Accessing string arrays is done explicitly via an "index" parameter, if index is bigger than odb array size DB_OUT_OF_RANGE is returned
> > without logging an error message (e.g. db_get_data_index() will log an error). This makes is safe to iterate over array entries with a simple
> > loop of index from 0 and up until db_get returns an error.
> >
> > As before, if the odb entry does not exist, it will be created (if create==true) and initialized with the value of the string parameter (zero-terminated in odb).
> >
> > There is also newly added db_set_value_string() and cm_get_path_string(). if you want more of these, please ask, or send patches.
> >
> > K.O. |
02 May 2017, Konstantin Olchanski, Info, added db_resize_string()
|
> Since we have been regularly running into problems with db_get_xxx(TID_STRING) and string buffers of mismatched size,
> I now implemented db_get_value_string(hdb, hkey, key_name, index, &string, create).
I run into problems with string arrays - non-array strings have unlimited length, but string arrays have fixed string length, usually set at creation time.
This causes a problem with growing arrays using db_get_value_string(), when converting a non-array variable to an array, the wrong
string length gets used, and one gets an array with useless string length. There is no way to specify the correct array string length
without adding more parameters to db_get_value_string() and confusing and complicating it for the typical case where it is used
against simple (non-array) odb entries.
To clarify the situation, db_get_value_string() was changed to reject attempts to resize an array and
calls of db_get_value_string(index>0 and create==TRUE) now return an error.
To create and resize string arrays, I added a new function - db_resize_array(hdb, hkey, key_name, num_values, max_string_size).
Here,
num_values is the new array size, making it possible to grow or shrink an array
max_string_size is the new string size, making it possible to change the array string length after the array was created (there was no midas function to do this before now).
I added a json-rpc call for db_resize_string().
But it still needs to be added to odbedit and mhttpd.
K.O. |
18 Apr 2017, Andreas Suter, Bug Report, run start/stop oddity
|
I stumbled over an oddity which I would like to understand better. Here the
boundaries:
- Enable non-localhost RPC -> y
- Disable RPC hosts check -> y
1) I am starting a run from ODBedit (start now -v):
07:13:11.272 2017/04/19 [ODBEdit,INFO] Run #26 started
07:13:25.516 2017/04/19 [Logger,LOG] File '/data/max/dlog/lem17_0026.root'
CRC32C checksum: 0x05ca4e7e, 1523383 bytes
On this little test experiment there is not much running, but it already shows
the effect I wanted to understand.
2) I am stopping the run from ODBedit (stop -v):
07:13:25.519 2017/04/19 [ODBEdit,INFO] Run #26 stopped
So, everything looks perfectly fine up to this point.
3) Now the 'strange' thing happens. To any point in time after this, I will stop
ODBEdit which results in the following messages:
07:13:32.335 2017/04/19 [ODBEdit,INFO] Program ODBEdit on host pc7962 stopped
07:13:32.335 2017/04/19 [Logger,ERROR] [midas.c:14079:rpc_server_receive,ERROR]
rpc check ok, abort canceled
This I do NOT understand! It looks as if the Logger (or any other client which
gets the run state transition) thinks that some Client (here ODBEdit) has a
broken connection. At least this is how I understand the comment in midas.c /
rpc_server_receive(). Is something broken in the de-registration from the RPC
server? By the way, all clients where running on the localhost, i.e. no remote
connection used here.
All this only happens if a run transition took place.
Unfortunately I do not understand the system well enough to suggest any fix to
this :-( and hence would appreciate any help. |
02 May 2017, Konstantin Olchanski, Bug Report, run start/stop oddity
|
I should really get around to fix this junk error message:
> 07:13:32.335 2017/04/19 [Logger,ERROR] [midas.c:14079:rpc_server_receive,ERROR]
> rpc check ok, abort canceled
What happens is this. For each run transition, cm_transition does RPC calls
to each client telling them to transition. So even if you run only on localhost, there is still
tcp connections being created and broken to do these RPCs. These connections are
typically created and left open, but when you stop odbedit, it's connections would
be closed/broken. Now in the midas rpc code there is confusion between the main rpc
connection for remote clients and temporary rpc connections for run transitions. This
confusion is the cause of these junk error messages - first the code thinks that the main rpc
connection is closed it it should commit suicide (abort), then it find that it was
just an rpc connection and there is no need to die.
https://bitbucket.org/tmidas/midas/issues/44/junk-messages-about-rpc-check-ok-abort
>
> - Enable non-localhost RPC -> y
> - Disable RPC hosts check -> y
>
this is unsafe:
if you only run on localhost, "enable non-localhost rpc" should be "n" and midas will no listen to any
outside connections (except for mhttpd, of course).
if you have remote clients, enable non-localhost rpc and enter their hostnames to the access control list.
"disable rpc hosts check" is for the case where you do not know the hostnames of your remote clients,
for example when they come from dynamic ip addresses on a wifi network.
In this case you tell midas to accept connections from everybody everywhere in the world
and hopefully you have a firewall somewhere to prevent the evil hackers from actually connecting.
I hope this is not your situation.
K.O. |
|