20 Nov 2003, Stefan Ritt, , Implementation of db_check_record()
|
As Konstantin pointed out correctly, the db_create_record() call is pretty
heavy since it copies whole structures around the ODB. Therefore, it
should not used frequently. It might be that several problems are caused
by that, for example the "phantom" records reported in elog:40 .
I have therefore implemented the function
db_check_record(HNDLE hDB, HNDLE hKey, char *keyname, char *rec_str,
BOOL correct)
which takes an ASCII structure in the same way as db_create_record(), but
only checks this ASCII structure against the ODB contents without writing
anything to the ODB.
If the record does not exist at all, it is created via db_create_record().
This is useful for example with the /Runinfo structure on a virgin ODB.
If the parameter "correct" is FALSE, the function returns
DB_STRUCT_MISMATCH if the ODB contents is wrong (wrong order of variables,
wrong name of variables, wrong type or array size). The calling function
should then abort, since a subsequent db_open_record() would fail. Note
that although abort() is useful, one should add cm_disconnect_experiment()
just before the abort() in order to have the application "log out" from
the ODB gracefully. If the parameter "correct" is TRUE, the function
db_create_record() is called internally to correct a mismatching record.
I have changed most calls of db_create_record() in mhttpd.c, mfe.c, mana.c
and mlogger.c. Pierre, could you do the same for lazylogger.c?
I also started to put assert()'s everywhere and encourage everyone to
follow. Under Windows, the asserts() are removed automatically if
compiling in "Release" mode.
So I committed many changes, did some quick tests, but am not 100%
convinced that all the changes are good. So please use the new code
cautiously, and let me know if there is any new problem. I also would like
to get some feedback if the whole thing becomes more stable now. |
27 Nov 2003, Konstantin Olchanski, , Implementation of db_check_record()
|
> I have therefore implemented the function
> db_check_record(HNDLE hDB, HNDLE hKey, char *keyname, char *rec_str, BOOL
correct)
Stephan, something is very wrong with the new code. My
"/logger/channels/0/settings" is being destroyed on "begin run". Midas
checkout from october 31st is okey. This is a show stopper, but I am in a rush
and cannot debug it. I am falling back to the Oct 31st version... K.O. |
30 Nov 2003, Konstantin Olchanski, , Implementation of db_check_record()
|
> > I have therefore implemented the function
> > db_check_record(HNDLE hDB, HNDLE hKey, char *keyname, char *rec_str, BOOL
> correct)
>
> Stephan, something is very wrong with the new code. My
> "/logger/channels/0/settings" is being destroyed on "begin run".
Okey. I found the problem in db_check_record(): when we decide that we have a
mismatch, we call db_create_record(...,rec_str), but by this time, rec_str no
longer points to the beginning of the ODB string because we started parsing it.
I tried this solution: save rec_str into rec_str_orig, then when we decide that
we have a mismatch, call db_create_record() with this saved rec_str_orig. It
fixes my immediate problem (destruction of "/logger/channels/0/settings"), but is
it correct?
I would like to fix it ASAP to get cvs-head working again: our mhttpd dumps core
on an assert() failure in db_create_record() and the set of db_check_record()
changes might fix it for me.
Here is the CVS diff:
RCS file: /usr/local/cvsroot/midas/src/odb.c,v
retrieving revision 1.73
diff -r1.73 odb.c
7810a7811
> char *rec_str_orig = rec_str;
7820c7821
< return db_create_record(hDB, hKey, keyname, rec_str);
---
> return db_create_record(hDB, hKey, keyname, rec_str_orig);
7838c7839
< return db_create_record(hDB, hKey, keyname, rec_str);
---
> return db_create_record(hDB, hKey, keyname, rec_str_orig);
8023c8024
< return db_create_record(hDB, hKey, keyname, rec_str);
---
> return db_create_record(hDB, hKey, keyname, rec_str_orig);
8037c8038
< return db_create_record(hDB, hKey, keyname, rec_str);
---
> return db_create_record(hDB, hKey, keyname, rec_str_orig);
K.O. |
30 Nov 2003, Stefan Ritt, , Implementation of db_check_record()
|
Fixed and committed. Can you check if it's working? |
01 Dec 2003, Konstantin Olchanski, , Implementation of db_check_record()
|
> Fixed and committed. Can you check if it's working?
Yes, it is fixed. Thanks. K.O. |
25 Nov 2003, Suzannah Daviel, , delete key followed by create record leads to empty structure in experim.h
|
Hi,
I have noticed a problem with deleting a key to an array in odb, then
recreating the record as in the code below. The record is recreated
successfully, but when viewing it with mhttpd, a spurious blank line
(coloured orange) is visible, followed by the rest of the data as normal.
This blank line causes trouble with experim.h because it
produces an empty structure e.g. :
#define CYCLE_SCALERS_SETTINGS_DEFINED
typedef struct {
struct {
} ;
char names[60][32];
} CYCLE_SCALERS_SETTINGS;
rather than :
#define CYCLE_SCALERS_SETTINGS_DEFINED
typedef struct {
char names[60][32];
} CYCLE_SCALERS_SETTINGS;
This empty structure causes a compilation error when rebuilding clients that
use experim.h
SD
CYCLE_SCALERS_TYPE1_SETTINGS_STR(type1_str);
CYCLE_SCALERS_TYPE2_SETTINGS_STR(type2_str);
Both type1_str and type2_str have been defined as in
experim.h
i.e.
#define CYCLE_SCALERS_TYPE1_SETTINGS_STR(_name) char *_name[] = {\
"[.]",\
"Names = STRING[60] :",\
"[32] Back%BSeg00",\
"[32] Back%BSeg01",\
........
........
"[32] General%NeutBm Cycle Sum",\
"[32] General%NeutBm Cycle Asym",\
"",\
NULL }
#define CYCLE_SCALERS_TYPE2_SETTINGS_STR(_name) char *_name[] = {\
"[.]",\
"Names = STRING[60] :",\
"[32] Back%BSeg00",\
"[32] Back%BSeg01",\
...........
............
"[32] General%B/F Cumul -",\
"[32] General%Asym Cumul -",\
"",\
NULL }
if (db_find_key(hDB, 0, "/Equipment/Cycle_scalers/Settings/",&hKey) ==
DB_SUCCESS)
db_delete_key(hDB,hKey,FALSE);
if ( strncmp(fs.input.experiment_name,"1",1) == 0) {
exp_mode = 1; /* Imusr type - scans */
status =
db_create_record(hDB,0,"/Equipment/Cycle_scalers/Settings/",strcomb(type1_str));
}
else {
exp_mode = 2; /* TDmusr types - noscans */
status =
db_create_record(hDB,0,"/Equipment/Cycle_scalers/Settings/",strcomb(type2_str));
} |
01 Dec 2003, Stefan Ritt, , delete key followed by create record leads to empty structure in experim.h
|
> I have noticed a problem with deleting a key to an array in odb, then
> recreating the record as in the code below. The record is recreated
> successfully, but when viewing it with mhttpd, a spurious blank line
> (coloured orange) is visible, followed by the rest of the data as normal.
>
> db_create_record(hDB,0,"/Equipment/Cycle_scalers/Settings/",strcomb(type1_str));
> }
> else {
> exp_mode = 2; /* TDmusr types - noscans */
> status =
> db_create_record(hDB,0,"/Equipment/Cycle_scalers/Settings/",strcomb(type2_str));
> }
The first problem is that the db_create_record has a trailing "/" in the key name
after Settings. This causes the (empty) subsirectory which causes your trouble.
Simple removing it fixes the problem. I agree that this is not obvious, so I
added some code in db_create_record() which removes such a trailing slash if
present. New version under CVS.
Second, the db_create_record() call is deprecated. You should use the new
function db_check_record() instead, and remove your db_delete_key(). This avoids
possible ODB trouble since the structure is not re-created each time, but only
when necessary.
- Stefan |
30 Nov 2003, Konstantin Olchanski, , bad call to cm_cleanup() in fal.c
|
fal.c does not compile: it calls cm_cleanup() with one argument when there
should be two arguments. K.O. |
30 Nov 2003, Stefan Ritt, , bad call to cm_cleanup() in fal.c
|
> fal.c does not compile: it calls cm_cleanup() with one argument when there
> should be two arguments. K.O.
Fixed and committed. |
20 Nov 2003, Konstantin Olchanski, , midas timeout wraparound
|
While reviving midas on midtig01 after it was not used for a while, we see
this. Notice negative "last called" numbers. Looks like a time_t wraparound
somewhere...
[local:tigress:S]/>scl -w
Name Host Timeout Last called
mhttpd midtig01.triumf.ca 10000 -2037131082
Logger midtig01.triumf.ca 10000 -2037131166
Analyzer midtig01.triumf.ca 10000 -2037131048
JACQ midtig01.triumf.ca 10000 -2037131667
mhttpd1 midtig01.triumf.ca 10000 325
ODBEdit midtig01.triumf.ca 10000 829
K.O. |
20 Nov 2003, Konstantin Olchanski, , cannot shutdown defunct clients
|
> While reviving midas on midtig01 after it was not used for a while ...
> [local:tigress:S]/>scl -w
> Name Host Timeout Last called
> mhttpd midtig01.triumf.ca 10000 -2037131082
These clients cannot be deleted. I tried:
1) shutdown from mhttpd "programs" page -> "cannot shutdown client"
2) "sh mhttpd" from odbedit ->
[midas.c:5298:cm_shutdown] cannot connect to client mhttpd on host
midtig01.triumf.ca, port 32853
Client mhttpd not active
3) in odbedit: "cd /system/clients; rm xxxx"
refuses to delete the key
Lacking any better ideas, I deleted them via brain surgery on the odb file:
1) stop everything
2) ipcrm the SYSV shared memory segment
3) odbedit -> save xxx.odb
4) xemacs xxx.odb, delete offending odb entries
5) rm .ODB.SHM
6) odbedit -> load xxx.odb
7) voila, bad clients gone, gone, gone.
K.O. |
20 Nov 2003, Stefan Ritt, , cannot shutdown defunct clients
|
> 1) shutdown from mhttpd "programs" page -> "cannot shutdown client"
> 2) "sh mhttpd" from odbedit ->
> [midas.c:5298:cm_shutdown] cannot connect to client mhttpd on host
> midtig01.triumf.ca, port 32853
> Client mhttpd not active
> 3) in odbedit: "cd /system/clients; rm xxxx"
> refuses to delete the key
Have you tried a "cleanup" in ODBEdit?
The "last_activity" is a 32-bit int, filled with milliseconds. So indeed it
wraps around after about one month. So if a all clients are stopped
simultaneously the hard way (such that nobody's watchdog can clean any other
client from the ODB), like with a power off, and you start the thing one
month later, there might be a problem. I never tried that before. So next
time to a cleanup. If that does not help, we should change last_activity
from INT to DWORD. This way it's alway positive and the wraparound does not
hurt. |
20 Nov 2003, Konstantin Olchanski, , cannot shutdown defunct clients
|
> > 1) shutdown from mhttpd "programs" page -> "cannot shutdown client"
> Have you tried a "cleanup" in ODBEdit?
Nope. Will try next time...
> The "last_activity" is a 32-bit int, filled with milliseconds. So indeed it
> wraps around after about one month.... change last_activity
> from INT to DWORD. This way it's alway positive and the wraparound does not
> hurt.
INT == "int", wraparound in 1 month
DWORD == "unsigned int", wraparound in 2 months
should we make it the 64-bit "long long" (or C98's "int64_t")?
K.O. |
20 Nov 2003, Stefan Ritt, , cannot shutdown defunct clients
|
> INT == "int", wraparound in 1 month
> DWORD == "unsigned int", wraparound in 2 months
>
> should we make it the 64-bit "long long" (or C98's "int64_t")?
Won't work on all supported compilers. The point is that DWORD wraps around in
2 months, but the difference of two DWORDs is alywas positive, never negative
like you had it. We only have to distinguish if the difference of the current
time (im ms) minus the last_activity of a client is larget than the timeout,
typically 10 seconds or so. If you have a wraparound on 32-bit DWORD, the
difference is still ok. Like
current "time" : 0x0000 0100
last_activity: 0xFFFF FF00
then current_time - last_activity = 0x00000100 - 0xFFFFFF00 = 0x00000200 if
calculated with 32-bit values. |
20 Nov 2003, Renee Poutissou, , cannot shutdown defunct clients
|
Indeed the ODB command "cleanup" really works. I have used it several
times with the TWIST DAQ and regularly with the BNMR/MUSR setups where
we have these stubborn clients (ie feepics) that do not want to shutdown
cleanly.
But there is one problem with "cleanup". It has a hardwired timeout of
2 seconds. This is a problem for tasks like lazylogger which set a timeout
of 60 seconds when moving the tape. So BEWARE, if you issue the "cleanup"
command, it might kill some clients who have setup their timeout to longer
than 2 seconds.
I have asked Stefan to change this before. He said that, to be effective,
the timeout value used for "cleanup" has to be rather short.
One possibility, would be to allow for a user entered "cleanup" timeout.
The default could stay at 2 seconds.
> > Have you tried a "cleanup" in ODBEdit?
>
> Nope. Will try next time...
> |
24 Nov 2003, Stefan Ritt, , cannot shutdown defunct clients
|
> But there is one problem with "cleanup". It has a hardwired timeout of
> 2 seconds. This is a problem for tasks like lazylogger which set a timeout
> of 60 seconds when moving the tape. So BEWARE, if you issue the "cleanup"
> command, it might kill some clients who have setup their timeout to longer
> than 2 seconds.
>
> I have asked Stefan to change this before. He said that, to be effective,
> the timeout value used for "cleanup" has to be rather short.
> One possibility, would be to allow for a user entered "cleanup" timeout.
> The default could stay at 2 seconds.
I changed the behaviour of cleanup by adding an extra parameter
ignore_timeout to cm_cleanup(). Now, in ODBEdit, a "clanup" obeys the
timeout set by the clients. The problem with that is if the logger crashes
for example, and it's timeout is set o 5 min., it cannot be clean-up'ed any
more for the next five minutes, and therefor not be restarted wasting
precious beam time. That's why I hard-wired originally the "cleanup" timout
to 2 sec. Now I added a flag "-f" to the ODBEdit cleanup command which works
in the old fashion with a 2 sec. timeout. So a "cleanup" alone won't kill a
looger which currently rewinds a tape or so, but a "cleanup -f" does.
I also changed internal timeouts from INT to DWORD, which should fix the
problem Konstantin reported recently (re-starting an experiment after
several weeks). New changes are commited, but I only did basic tests. So
please try the new code and tell me if there is any problem.
- Stefan |
17 Nov 2003, Stefan Ritt, , Revised MVMESTD
|
Let me propose a revised scheme for midas standard VME calls (mvmestd.h).
Pierre mentioned some limitations before, and I find now also some fields
to improve. Right now, the vme_open() call retrieves a handle. For some
interfaces (like SBS/Bit3), one has to obtain separate handles for
different addressing modes A24D32/A32D32 and so on, which I find a bit
troublesome. I would rather keep the handle internally, invisible to the
user, and use ioctl() statments to change the address/data mode.
So the API could look like:
vme_open() Deprecated, will be removed
vme_init(void) Standard initialization, open device(s), stores handles
internally in a table
vme_exit(void) Deallocates any memory, close handles
vme_read(void *dst, DWORD vme_addr, DWORD size)
vme_write(void *src, DWORD vme_addr, DWORD size)
vme_ioctl(int request, int *param)
Request is one of
VME_IOCTL_CRATE_SET/GET
Sets VME crate (in case several interfaces are
plugged into singlePC, meaningless for embedded CPUs)
VME_IOCTL_DEST_SET/GET
VME_BUS/VME_RAM/VME_LM for VME bus, RAM in VME
interface, or LM for local memory (used in Bit3
interface)
VME_IOCTL_AMOD_SET/GET
Sets/Retrieves VME AMOD (= VME_AMOD_xxx as currently
defined in mvmestd.h)
VME_IOCTL_DSIZE_SET/GET
Sets/Retrieves VME data size (D8/D16/D32/D64)
VME_IOCTL_DMA_SET/GET
Enable/Disable DMA, should be independent of AMOD
VME_IOCTL_INTR_ATTACH/DETACH/ENABLE/DISABLE
Set VME interrupts
VME_IOCTL_AUTO_INCR_SET/GET
Set autoincremet of source pointer, can be disabled
for FIFO readout
vme_mmap(void **ptr, DWORD vme_addr, DWORD size)
vme_unmap(void *ptr, DWORD size)
Map/Unmap VME to local memory
vme_read2(void *dst, DWORD vme_addr, DWORD size, DWORD flags)
vme_write2(void *src, DWORD vme_addr, DWORD size, DWORD flags)
With these functions one can directly specify the flags
usually managed by vme_ioctl(). Usefule for applications
where the address modifier for example has to be
different in each read/write operation.
Note that the vme_read/write functions do not have a VME handle any more,
nor an address modifier. This is all accomplished with vme_ioctl() calls.
Please have a look at this proposal, compare it with what you do currently
in VME, and let me know if we should add/modify something. I volunteer to
implement the API for the SBS/Bit3 617 and the Struck SIS1100/3100
interfaces, for VxWorks somebody at TRIUMF should take care. |
20 Nov 2003, Pierre-André Amaudruz, Konstantin Olchanski, , Revised MVMESTD
|
Before we try to merge the different access scheme for the different VME hardware,
we present the "optimal" configuration for the VMIC setup. This is a first shot so take it
with caution.
From these definitions, we should be able to workout a compromise and come up with
a satisfactory standard.
A) The VMIC vme_slave_xxx() options are not considered.
B) The interrupt handling can certainly match the 4 entries required in the user frontend
code i.e. Attach, Detach, Enable, Disable.
I don't understand your argument that the handle should be hidden. In case of multiple
interfaces, how do you refer to a particular one if not specified?
The following scheme does require a handle for refering to the proper (device AND window).
1 ) deviceHandle = vme_init(int devNumber);
Even though the VMIC doesn't deal with multiple devices,
the SIS/PCI does and needs to init on a specific PCI card.
Internally:
opening of the device (/dev/sisxxxx_1) (ignored in case of VMIC).
Possible including a mapping to a default VME region of default size with default AM
(VMIC :16MB, A24). This way in a single call you get a valid handle for full VME access
in A24 mode. Needs to be elaborate this option. But in principle you need to declare the
VME region that you want to work on (vme_map).
2) mapHandle = vme_map(int deviceHandle, int vmeAddress, int am, int size);
Return a mapHandle specific to a device and window. The am has to be specified.
What ever are the operation to get there, the mapHandle is a reference to thas setting.
It could just fill a map structure.
Internally:
WindowHandle[deviceHandle] = vme_master_create(BusHandle[deviceHandle], ...
WindowPtr[WindowHandle] = vme_master_window(BusHandle[deviceHandle]
, WindowHandle[deviceHandle]...
3) vme_setmode(mapHandle, const int DATA_SIZE, const int AM
, const BOOL ENA_DMA, const BOOL ENA_FIFO);
Mainly used for the vme_block_read/write. Define for following read the data size and
am in case of DMA (could use orther DMA mode than window definition for optimal
transfer).
Predefine the mode of access:
DATA_SIZE : D8, D16, D32
AM : A16, A24, A32, etc...
enaDMA : optional if available.
enaFIFO : optional for block read for autoincrement source pointer.
Remark:
PAA- I can imagine this function to be a vme_ioctl (int mapHandle, int *param)
such that extension of functionality is possible. But by passing cons int
arguments, the optimizer is able to substitute and reduce the internal code.
4)
uint_8Value = vme_readD8 (int mapHandle, uint_64 vmeSrceOffset)
uint_16Value = vme_readD16 (int mapHandle, uint_64 vmeSrceOffset)
uint_32Value = vme_readD32 (int mapHandle, uint_64 vmeSrceOffset)
Single VME read access. In the VMIC case, this access is always through mapping.
Value = *(WindowPtr[WindowHandle] + vmeSrceOffset)
or
Value = *(WindowStruct->WindowPtr[WindowHandle] + vmeSrceOffset)
5)
status = vme_writeD8 (int mapHandle, uint_64 vmeSrceOffset, uint_8 Value)
status = vme_writeD16 (int mapHandle, uint_64 vmeSrceOffset, uint_16 Value)
status = vme_writeD32 (int mapHandle, uint_64 vmeSrceOffset, uint_32 Value)
Single VME write access.
6)
nBytes = vme_block_read(mapHandle, char * pDest, uint_64 vmeSrceOffset, int size);
Multiple read access. Can be done through standard do loop or DMA if available.
nBytes < 0 : error
Incremented pDest = (pDest + nBytes); Don't need to pass **pDest for autoincrement.
7)
nBytes = vme_block_write(mapHandle, uint_64 vmeSrceOffset, char *pSrce, int size);
Multiple write access.
nBytes < 0 : error
Incremented pSrce = (pSrce + nBytes); Don't need to pass **pSrce for autoincrement.
8) status = vme_unmap(int mapHandle)
Cleanup internal pointers or structure of given mapHandle only.
9) status = vme_exit()
Cleanup deviceHandle and release device. |
21 Nov 2003, Stefan Ritt, , Revised MVMESTD
|
Thanks for your contribution. Let me try to map your functionality to mvmestd calls:
> A) The VMIC vme_slave_xxx() options are not considered.
We could maybe do that through mvme_mmap(SLAVE, ...) instead of mvme_mmap(MASTER, ...)
> B) The interrupt handling can certainly match the 4 entries required in the user frontend
> code i.e. Attach, Detach, Enable, Disable.
vmve_ioctl(VME_IOCTL_INTR_ATTACHE/DETACH/ENABLE/DISABLE, func())
> I don't understand your argument that the handle should be hidden. In case of multiple
> interfaces, how do you refer to a particular one if not specified?
> The following scheme does require a handle for refering to the proper (device AND window).
Four reasons for that:
1) For the SBS/Bit3, you need a handle for each address mode. So if I have two crates (and I do in our
current experiment), and have to access modules in A16, A24 and A32 mode, I need in total 6 handles.
Sometimes I mix them up by mistake, and wonder why I get bus errors.
2) Most installations will only have single crates (as your VMIC). So if there is only one crate, why
bother with a handle? If you have hunderds of accesses in your code, you save some redundant typing work.
3) A handle is usually kept global, which is considered not good coding style.
4) Our MCSTD and MFBSTD functions also do not use a handle, so people used to those libraries will find it
more natural not to use one.
> 1 ) deviceHandle = vme_init(int devNumber);
> Even though the VMIC doesn't deal with multiple devices,
> the SIS/PCI does and needs to init on a specific PCI card.
> Internally:
> opening of the device (/dev/sisxxxx_1) (ignored in case of VMIC).
> Possible including a mapping to a default VME region of default size with default AM
> (VMIC :16MB, A24). This way in a single call you get a valid handle for full VME access
> in A24 mode. Needs to be elaborate this option. But in principle you need to declare the
> VME region that you want to work on (vme_map).
Just vme_init(); (like fb_init()).
This function takes the first device, opens it, and stores the handle internally. Sets the AM to a default
value, and creates a mapping table which is initially empty or mapped to a default VME region. If one wants
to access a secondary crate, one does a vme_ioctl(VME_IOCTL_CRATE_SET, 2), which opens the secondary crate,
and stores the new handle in the internal table if applicable.
> 2) mapHandle = vme_map(int deviceHandle, int vmeAddress, int am, int size);
> Return a mapHandle specific to a device and window. The am has to be specified.
> What ever are the operation to get there, the mapHandle is a reference to thas setting.
> It could just fill a map structure.
> Internally:
> WindowHandle[deviceHandle] = vme_master_create(BusHandle[deviceHandle], ...
> WindowPtr[WindowHandle] = vme_master_window(BusHandle[deviceHandle]
> , WindowHandle[deviceHandle]...
The best would be if a mvme_read(...) to an unmapped region would automatically (internally) trigger a
vme_map() call, and store the WindowHandle and WindowPtr internally. The advantage of this is that code
written for the SIS for example (which does not require this kind of mapping) would work without change
under the VMIC. The disadvantage is that for each mvme_read(), the code has to scan the internal mapping
table to find the proper window handle. Now I don't know how much overhead this would be, but I guess a
single for() loop over a couple of entries in the mapping table is still faster than a microsecond or so,
thus making it negligible in a block transfer.
> 3) vme_setmode(mapHandle, const int DATA_SIZE, const int AM
> , const BOOL ENA_DMA, const BOOL ENA_FIFO);
> Mainly used for the vme_block_read/write. Define for following read the data size and
> am in case of DMA (could use orther DMA mode than window definition for optimal
> transfer).
>
> Predefine the mode of access:
> DATA_SIZE : D8, D16, D32
> AM : A16, A24, A32, etc...
> enaDMA : optional if available.
> enaFIFO : optional for block read for autoincrement source pointer.
>
> Remark:
> PAA- I can imagine this function to be a vme_ioctl (int mapHandle, int *param)
> such that extension of functionality is possible. But by passing cons int
> arguments, the optimizer is able to substitute and reduce the internal code.
Right. mvme_ioctl(VME_IOCTL_AMOD_SET/DSIZE_SET/DMA_SET/AUTO_INCR_SET, ...)
> uint_8Value = vme_readD8 (int mapHandle, uint_64 vmeSrceOffset)
> uint_16Value = vme_readD16 (int mapHandle, uint_64 vmeSrceOffset)
> uint_32Value = vme_readD32 (int mapHandle, uint_64 vmeSrceOffset)
> Single VME read access. In the VMIC case, this access is always through mapping.
> Value = *(WindowPtr[WindowHandle] + vmeSrceOffset)
> or
> Value = *(WindowStruct->WindowPtr[WindowHandle] + vmeSrceOffset)
mvme_read(*dst, DWORD vme_addr, DWORD size); would cover this in a single call. Note that the SIS for
example does not have memory mapping, so if one consistently uses mvme_read(), it will work on both
architectures. Again, this takes some overhead. Consider for example a possible VMIC implementation
mvme_read(char *dst, DWORD vme_addr, DWORD size)
{
for (i=0 ; table[i].valid ; i++)
{
if (table[i].start >= vme_addr && table[i].end < vme_addr+size)
break;
}
if (!table[i].valid)
{
vme_master_crate(...)
table[i].window_handle = vme_master_window(...)
}
if (size == 2)
mvme_ioctl(VME_IOCTL_DSIZE_SET, D16);
else if (size == 1)
mvme_ioctl(VME_IOCTL_DSIZE_SET, D8);
memcpy(dst, table[i].window_handle + vme_addr - table[i].start, size);
}
Note this is only some rough code, would need more checking etc. But you see that for each access the for()
loop has to be evaluated. Now I know that for the SBS/Bit3 and for the SIS a single VME access takes
~0.5us. So the for() loop could be much faster than that. But one has to try. If one experiment needs the
ultimate speed, it can use the native VMIC API, but then looses the portability. I'm not sure if one needs
the automatic DSIZE_SET, maybe it works without.
> status = vme_writeD8 (int mapHandle, uint_64 vmeSrceOffset, uint_8 Value)
> status = vme_writeD16 (int mapHandle, uint_64 vmeSrceOffset, uint_16 Value)
> status = vme_writeD32 (int mapHandle, uint_64 vmeSrceOffset, uint_32 Value)
> Single VME write access.
Dito. mvme_write(void *dst, DWORD vme_addr, DWORD size);
> nBytes = vme_block_read(mapHandle, char * pDest, uint_64 vmeSrceOffset, int size);
> Multiple read access. Can be done through standard do loop or DMA if available.
> nBytes < 0 : error
> Incremented pDest = (pDest + nBytes); Don't need to pass **pDest for autoincrement.
vmve_ioctl(VME_IOCTL_DMA_SET, TRUE);
n = mvme_read(char *pDest, DWORD vmd_addr, DWORD size);
> nBytes = vme_block_write(mapHandle, uint_64 vmeSrceOffset, char *pSrce, int size);
> Multiple write access.
> nBytes < 0 : error
> Incremented pSrce = (pSrce + nBytes); Don't need to pass **pSrce for autoincrement.
Dito.
> 8) status = vme_unmap(int mapHandle)
> Cleanup internal pointers or structure of given mapHandle only.
mvme_unmap(DWORD vme_addr, DWORD size)
Scan through internal table to find handle, then calls vme_unmap(mapHandle);
> 9) status = vme_exit()
> Cleanup deviceHandle and release device.
mvme_exit();
Let me know if this all makes sense to you...
- Stefan |
20 Nov 2003, Konstantin Olchanski, , set-uid-root midas programs
|
I see that MIDAS installs several set-uid-root programs into /usr/local/bin.
In this age and time of evil computer hackers, this is not a good idea and
we should Do Something (TM) about it. Here is my risk assessment:
[olchansk@midtis06 midas]$ ls -l /usr/local/bin | grep wsr
-rwsr-sr-x 1 root root 25811 Nov 20 09:27 dio
-rwsr-sr-x 1 root root 344553 Nov 20 09:27 mhttpd
-rwsr-sr-x 1 root root 70736 Nov 20 09:27 webpaw
dio- is required to be setuid-root to gain I/O permissions. I looked at it a
few times, and it is probably safe, but I would like to get a second
opinion. Stephan, can you should it to your local security geeks?
mhttpd- definitely unsafe. It has more buffer overflows than I can shake a
stick at. Why is it suid-root anyway?
webpaw- what is it?!?
K.O. |
20 Nov 2003, Stefan Ritt, , set-uid-root midas programs
|
> dio- is required to be setuid-root to gain I/O permissions. I looked at it a
> few times, and it is probably safe, but I would like to get a second
> opinion. Stephan, can you should it to your local security geeks?
>
> mhttpd- definitely unsafe. It has more buffer overflows than I can shake a
> stick at. Why is it suid-root anyway?
>
> webpaw- what is it?!?
dio was written by Pierre.
mhttpd and webpaw both are web servers. webpaw is used to display PAW
pictures over the web. If you run these programs at a port <1024, and most
people do run them at port 80 (at least at PSI), they need to be setuid-root.
Unless you know a better way to do that... |
15 Nov 2003, Konstantin Olchanski, , Phantom "open records"
|
Sometimes (maybe after a client uncleanly exits?), I see phantom "open
records", for example:
[local:twist:Running]Gas>sor
/Equipment/Gas/Common open 2 times by fe1hp
/Equipment/Gas/Variables open 1 times by Logger
/Equipment/Gas/Variables/Flow1 open 2 times by uBeamTcl1 uBeamTcl
/Equipment/Gas/Settings/Command open 2 times by fe1hp
/Equipment/Gas/Statistics open 1 times by
Note the blank client name in the "/Equipment/Gas/Statistics" line.
This causes these warnings from mfe.c:
Cannot init equipment record, probably other FE is using it
Cannot delete statistics record, error 320
Cannot create statistics record, error 320
Cannot open statistics record, error 318. Probably other FE is using it
Then the number of generated events for this front end is never incremented.
Also attempts to delete this "open" record fail:
[local:twist:Running]Gas>del /Equipment/Gas/Statistics
Are you sure to delete the key
"/Equipment/Gas/Statistics"
and all its subkeys? (y/[n]) y
key is open by other client
How do I go about writing the db_validate_xxx() code to cleanup this
bogosity? I am not too familiar with the implementation of "open record"...
K.O. |
16 Nov 2003, Stefan Ritt, , Phantom
|
I have seen the same behaviour and it annoys me, too. What I did in the past
is a "cleanup" in ODBEdit which removes these open records. I have soem code
in cm_watchdog(), which should take care of that. If a client is dead, it
gets removed from the ODB, and its open records should get its notify_count
decremented. So obviously this code has some bug. I plan to do in the
following week (now I got some spare time) the following:
- exchange most db_create_record() by something better. Maybe
db_check_record(..., correct_flag), which creates the record only if it does
not exist at all, otherwise checks the structure. If correct_flag is TRUE, it
corrects the strucure (by calling db_create_record()), if it's false it just
returns an error code. This way one can decide from case to case which option
is better. Like for the /Runinfo, the flag would be FALSE, maybe with a
notification that the /Runinfo is different from the compiled-in structure,
and one hast to recompile the application.
- revisit the open record issue from dying frontends. I remember vaguely that
I tried to kill a frontend (kill -9), wait until the watchdog cleans up its
entries, and it worked fine. So it's more the problem to reproduce the issue
described in the previous elog entry. |
20 Nov 2003, Stefan Ritt, , Phantom
|
I tried to reproduce the problem, but without success. So in case this happens
again, one should debug the code im cm_watchdog() next to the line
/* decrement notify_count for open records and clear exclusive mode */
...
So if a killed client is removed from the ODB via the watchdog (or a "cleanup"
is done in ODBEdit), the notify_count should be decreased and thus the "open
records" should be closed. |
17 Nov 2003, Pierre-André Amaudruz, , Lazylogger application
|
- Remove temporary "/Programs/Lazy" creation.
- Fix Rate calculation for Web display.
- Change FTP channel description (see help). |
31 Oct 2003, Konstantin Olchanski, , more odb "run number" error checking
|
I added error checking to the places where we read "/runinfo/run number". In
general, I do this:
status = db_get_value("/runinfo/run number",&run_number);
assert(status==SUCCESS);
assert(run_number >= 0); (and run_number>0, where appropriate)
Here is the rationale: if we cannot read the run number, something must be
very terribly wrong. I cannot think of any recovery action other than
abort() and make a core dump for our debugging enjoyment.
I considered and rejected adding a "retry" loop: if we allow db_get_value()
to intermittently fail, then it's every use has to be wrapped in a retry
loop, which then should be inside db_get_value(), making it pointless to
have external "retry" loops.
I am now pondering on proposing a "db_get_value_cannot_possibly_fail()"
function (it would abort(), exit() with an error or commit harakiri if it
can't get the value). They way most db_xxx() functions are used in midas,
maybe they should be made "void" and "unfailible", with "STATUS
db_xxx_yes_I_can_fail_and_return_an_error_code()" evil twins. I guess this
is why "they" invented C/C++ exceptions. Anyway, something to think about.
Affected files:
src/lazylogger.c
src/odbedit.c
src/mlogger.c
src/mfe.c
src/odb.c
src/mana.c
src/midas.c
src/mhttpd.c
K.O. |
01 Nov 2003, Stefan Ritt, , more odb
|
> I added error checking to the places where we read "/runinfo/run number". In
> general, I do this:
> Affected files:
> src/lazylogger.c
> src/odbedit.c
> src/mlogger.c
> src/mfe.c
> src/odb.c
> src/mana.c
> src/midas.c
> src/mhttpd.c
Now YOU broke the system by editing all these files with something I consider
temporary debugging code. A run number of zero is *VALILD*. If I want to make
sure a new experiment starts with run number #1, I put a run number of 0 into
the ODB. So on the first start the number is incremented by one which results
in run number from one. So please remove those checks which prevents me of
doing that. Again, your "run number zero" problem is soemhow specific to your
environment, and I would not put all these tests into the distribution,
because this can have side effects, like that one I described above.
- Stefan |
01 Nov 2003, Konstantin Olchanski, , more odb
|
> > I added error checking to the places where we read "/runinfo/run number".
> Now YOU broke the system by editing all these files with something I consider
> temporary debugging code. A run number of zero is *VALILD*.
I think I broke nothing. I do know that run number 0 is a valid odb value. Here
is an audit of all places where I abort on invalid run numbers:
mana.c: line 3676: assert(current_run_number > 0);
we take the run number from an event and write it into ODB. Events cannot have
run number negative or zero.
mana.c:analyze_run(): line 4632: assert(run_number > 0);
we are asked to analyze run "run_number". zero or negative is not valid.
midas.c:assert(run_number > old_run_number);
midas.c:assert(run_number > 1);
this code is not in CVS.
odbedit.c: line 2563: assert(old_run_number >= 0);
run number zero is valid
odbedit.c: line 2641: assert(new_run_number > 0);
starting a new run number zero is not valid
mfe.c: line 1786: if (run_number<=0) cm_msg(MERROR, "main", "aborting on attempt
to use invalid run number %d", run_number);
auto restart from run 0 to 1 is not valid
midas.c: line 3917: if (run_number<=0) cm_msg(MERROR, "cm_transition", "aborting
on attempt to use invalid run number %d",run_number);
transition to run zero or negative is not valid
midas.c: line 16101: if (run_number<0) cm_msg(MERROR, "el_submit", "aborting on
attempt to use invalid run number %d", run_number);
negative run numbers are not valid
mlogger.c: line 3301: if (run_number<=0) cm_msg(MERROR, "main", "aborting on
attempt to use invalid run number %d", run_number);
auto restart from run 0 to run 1 is not valid
K.O. |
14 Nov 2003, Stefan Ritt, , more odb
|
Ok, I apologize. It's all ok. Thanks for clearifying. Concerning the assert's, it
would be nice to be able to disable them in release code. Under Windows, the
assert() is actually a macro which expands to zero if NDEBUG is defined. I
believe it's the same under linux, but I don't know about VxWorks. So we have
three options:
1) Keep asserts always. This might possible slow down a DAQ system, but I'm not
sure how much. Might be negligible.
2) Disable asserts by default (standard make). Only the "experts" can enable it
in the make file (by removing NDEBUG), since only they know what to do with the
assertation messages.
3) Let the user decide on the standard installation. Maybe have two libraries,
one debug, one no-debug. The no-debug can even have the compiler optimization
disabled, which makes debugging easier.
So what is your opinion (comments from others are welcome as well) of which way
to go? |
31 Oct 2003, Konstantin Olchanski, , Do not frob "/runinfo" in mhttpd.c
|
I found where we tickle the race condition in db_create_record().
1) in mhttpd.c, every time we show the status page, we call
db_create_record(hDB, 0, "/Runinfo", strcomb(runinfo_str));
2) internally db_create_record() deletes /RunInfo
3) other programs read "/runinfo/run number" while it is deleted do not
check for the db_get_value() error code and happily get a zero run number.
Stephan fixed the race condition, and now I commited an mhttpd.c change that
only calls db_create_record(hDB, 0, "/Runinfo", strcomb(runinfo_str)); if
/runinfo does not exist. This seems to be redundant with a similar call in
cm_connect_experiment1(), called each time a new client starts up.
Files changed:
src/mhttpd.c
K.O. |
01 Nov 2003, Stefan Ritt, , Do not frob
|
> I found where we tickle the race condition in db_create_record().
>
> 1) in mhttpd.c, every time we show the status page, we call
> db_create_record(hDB, 0, "/Runinfo", strcomb(runinfo_str));
> 2) internally db_create_record() deletes /RunInfo
> 3) other programs read "/runinfo/run number" while it is deleted do not
> check for the db_get_value() error code and happily get a zero run number.
>
> Stephan fixed the race condition, and now I commited an mhttpd.c change that
> only calls db_create_record(hDB, 0, "/Runinfo", strcomb(runinfo_str)); if
> /runinfo does not exist. This seems to be redundant with a similar call in
> cm_connect_experiment1(), called each time a new client starts up.
The reason for the db_create_record() is the following: Assume that we change
the /runinfo structure, by adding an additional variable in the future. If we
run a "new" mhttpd on an "old" experiment, the "runinfo" C structure does not
match the ODB contents. The db_create_record() ensures that the ODB structure
exactly matches the C structure. I agree with you that this can cause
potential problems. But most of them should be fixed by the additional lock()
I added recently. So other programs cannot read the run number while it is
deleted.
One could think of checking the record size, and re-creating the runinfo if
the ODB record size does not match the C record size. But this does not
prevent the potential error that some variable are reversed in order. They
are then mapped wrongly to the C runinfo structure.
I see that you work very hard now on all possible checks for the run number.
But I would not commit that and make it part of the distribution, since all
experiments at PSI for example do not have this run number problem. Run it
locally, determine the cause of your problem (the discovery of the race
condition was already very good, I'm glad that your found it, should make the
system much more stable), and we'll fix it. Puttin ASSERT's is a good idea, I
should have done it from the very beginning. But if you start now, please put
it in all other 100000 places (;-)
I would not add a db_get_value_cannot_possibly_fail() into the standard
distribution, because it probably cannot correct the initial problem and then
just will go into an infinite loop. We should tackle problems always at their
source.
If you cannot resolve your zero run number problem, do the following: There
is a cm_msg(MDEBUG, ...) which only puts a message into the shared memory,
but not in midas.log. This can be used for real time debugging. Add those
message temporarily in db_get_value() etc. to see what is going on. As soon
as the run number goes to zero, stop all processes immediately (for example
by locking the database with db_lock_database), and the look backwards in the
sysmsg buffer to see what happened *before* the run number went to zero.
- Stefan |
01 Nov 2003, Konstantin Olchanski, , Do not frob
|
> > I found where we tickle the race condition in db_create_record().
> The reason for the db_create_record() is the following: Assume that we change
> the /runinfo structure...
I think there is a deep fundamental problem with changing data structures "on the
fly". Calling db_create_record("/runinfo") at every show_status_page() does not
fix it.
If I change the runinfo structure, rebuild, relink and restart "mhttpd", the
db_create_record("/runinfo") from cm_connect_experiment() will update the runinfo
structure in ODB. In this case, the call from show_status_page() is redundant. As
a side effect, when we do this, we break every running ODB client- they still
have the old runinfo layout. Not good...
If I change the runinfo structure, rebuild, relink and restart all applications,
*except* for mhttpd, "/runinfo" in ODB will be updated when the first updated
client connects to ODB via the db_create_record("/runinfo") from
cm_connect_experiment(). Then, the old mhttpd will restore the old layout via the
db_create_record("/runinfo") in show_status_page(), breaking everything. Not good...
If I change the runinfo structure, rebuild, relink and restart everything,
"/runinfo" in ODB will be updated when the first client connects to ODB via the
db_create_record("/runinfo") from cm_connect_experiment(). In this case, the call
from show_status_page() is redundant. This is the only corruption-free scenario.
This lack of integrity enforcement vs version skew in binary data structures is,
I think, an ODB design error. Perhaps, ODB applications should be prohibited from
direct access to ODB "C" data structures: we cannot ensure that the data layout
in the application and in ODB are the same.
> One could think of checking the record size, and re-creating the runinfo if
> the ODB record size does not match the C record size. But this does not
> prevent the potential error that some variable are reversed in order. They
> are then mapped wrongly to the C runinfo structure.
Exacto.
> I see that you work very hard now on all possible checks for the run number.
> But I would not commit that and make it part of the distribution...
This is a philosophical issue.
My checks are in line with the "design by contract" school of programming. In a
nutshell, this ideology requires that before I do anything, I should enforce the
validity of my inputs and after I am done, I should enforce the validity of my
outputs. In practice, this translates into liberal use of assert()'s *in
production code*.
To ensure that old bugs stay fixed, and that new bugs are promptly discovered, it
is essential that the "contract checks" stay in the production code forever.
But let better writers argue programming philosophy in the literature.
Personally, when hunting down bugs in unstable code, I find this technique to be
vastly superior to the more common appoach of "This program has no bugs. Error
checking and assert()s are wasteful. Let's close our eyes and hope no bad things
happen to us (again)".
> But if you start now, please put [asserts] in all other 100000 places (;-)
I know that no good deed goes unpunished, but pewleeze!!!
> If you cannot resolve your zero run number problem, do the following: ...
> [lock ODB, freeze the experiment, look at log files]
This technique is obsolete. Today, we instrument the code with sanity checks
and validity tests. Then all the bugs find themselves with minimal manual
intervention.
K.O. |
31 Oct 2003, Konstantin Olchanski, , mana.c without ROOT and HBOOK
|
Stephan, why did you prohibit building mana.c without ROOT and HBOOK
support? I think such a configuration is valid and should be allowed.
Also, this prohibition broke the Midas Makefile, it now bombs building
mana.c. The Makefile is setup for building hmana.c with HBOOK support,
rmana.c with ROOT support (if ROOTSYS is set) and mana.c without HBOOK and
ROOT support (currently bombs on #error in mana.c).
K.O. |
01 Nov 2003, Stefan Ritt, , mana.c without ROOT and HBOOK
|
> Stephan, why did you prohibit building mana.c without ROOT and HBOOK
> support? I think such a configuration is valid and should be allowed.
Oops, sorry, my fault. I forgto that people use mana.c without ROOT and
HBOOK. The reason I made the change was that people forgot the -DHVAE_HBOOK
in their makefile. In that case, no HBOOK init is done in mana.c and the
first histogram booking in the user code crashes HBOOK.
So please take the #error statement out of mana.c (I'm away in two hours for
one week), but think about preventing the above mentionend problem. I don't
know any way for the makefile or mana.c to figure out if there is any HF1
call in the user code. Actually HF1 should return a "proper" error message
than just crashing.
One possibility is that we put an additional layer on top of the histogram
boooking/filling. These macros are converted to their HBOOK or ROOT
equivalents depending on the HAVE_HBOOK/HAVE_ROOT. If none of both is
present, the histogram booking macro can produce a runtime error. This has
the additional advantage that users can switch from HBOOK to ROOT without
change of their user code. |
01 Nov 2003, Konstantin Olchanski, , mana.c without ROOT and HBOOK
|
> > Stephan, why did you prohibit building mana.c without ROOT and HBOOK
> > support? I think such a configuration is valid and should be allowed.
>
> Oops, sorry, my fault. I forgto that people use mana.c without ROOT and
> HBOOK. The reason I made the change was that people forgot the -DHVAE_HBOOK
> in their makefile. In that case, no HBOOK init is done in mana.c and the
> first histogram booking in the user code crashes HBOOK.
Ahem. There is only so much rope we can give out to prevent people from shooting
themselves in the foot...
> So please take the #error statement out of mana.c
Done.
> One possibility is that we put an additional layer on top of the histogram
> boooking/filling. These macros are converted to their HBOOK or ROOT
> equivalents depending on the HAVE_HBOOK/HAVE_ROOT. If none of both is
> present, the histogram booking macro can produce a runtime error. This has
> the additional advantage that users can switch from HBOOK to ROOT without
> change of their user code.
I can't think of anything other than wrapping every HBOOK call with "if
(!hbook_is_initialized) initialize_hbook();". But then, where is PAWC
coming from anyway?!?
We could also print a warning message "This mana.c has no HBOOK support. If you
see HBOOK crashes, please relink with hmana,c". Ugly, but informative, plus it
points anybody who knows how to read towards a solution.
K.O. |
31 Oct 2003, Konstantin Olchanski, , Disable "tab"s in xemacs
|
The default C indentation style in xemacs uses "tab" characters, violating
the MIDAS coding convention. To disable this misfeature in xemacs (emacs
too?), put this incantation in your .xemacs/custom.el file:
(custom-set-variables
'(indent-tabs-mode nil))
K.O. |
30 Oct 2003, Stefan Ritt, , Fixed several potential problems for ODB corruption
|
I just realized that db_set_value, db_set_data, db_set_num_values and
db_merge_data do not check for num_values == 0. With such a parameter the
ODB can become corrupted, since zero length ODB entries are not allowed. I
fixed the according places in odb.c and committed the changes. Everyone
with ODB corruption problems should update that code. |
30 Oct 2003, Stefan Ritt, , 'umask' added to lazylogger for FTP connections
|
I had to add a 'umask' opiton to the loggers (lazy and mlogger) for the new
PSI archive. One can now put a filename into the settings like:
archive,21,user,pw,dir,run%05d.mid,026
where the optional last parameter is used for a "umask 026" command just
sent to the FTP server after the connection has been established. This
changes the mode bits of the newly transferred file. We needed that so that
the files are group readable, since several people from one group want to
read the data.
I committed mlogger.c and ybos.c which contains the ftp code (should
actually go into lazylogger.c instead of ybos.c). |
16 Oct 2003, David Morris, , Updated thread functions
|
ss_thread_create now returns the thread ID on success, and zero on failure.
Previously returned SS_SUCCESS or SS_NO_THREAD. User must now test the
return value to determine result.
ss_thread_kill added to kill the passed thread ID. Returns SS_SUCCESS or
SS_NO_THREAD.
Any thread creation must be verified now, and old code must be examined to
ensure the return value is checked. |
28 Oct 2003, Stefan Ritt, , Updated thread functions
|
> ss_thread_create now returns the thread ID on success, and zero on failure.
> Previously returned SS_SUCCESS or SS_NO_THREAD. User must now test the
> return value to determine result.
>
> ss_thread_kill added to kill the passed thread ID. Returns SS_SUCCESS or
> SS_NO_THREAD.
>
> Any thread creation must be verified now, and old code must be examined to
> ensure the return value is checked.
Thank you for that post. Internally, threads are not use in midas, so there
should be no problem. Only experiments using threads explicitly should take
care. |
15 Oct 2003, Konstantin Olchanski, , test
|
test
test
test |
15 Oct 2003, Konstantin Olchanski, , test
|
> test
> test
> test
another test
K.O. |
15 Oct 2003, Stefan Ritt, , test
|
> > test
> > test
> > test
>
> another test
>
> K.O.
I got the two email notifications, if you have tried that... |
12 Oct 2003, Konstantin Olchanski, , mhttpd: add Elog text to outgoing email.
|
This commit adds the elog message text to the outgoing email message. This
functionality has been requested a logn time ago, but I guess nobody got
around to implement it, until now. I also added assert() traps for the most
common array overruns in the Elog code.
Here is the cvs diff:
Index: src/mhttpd.c
===================================================================
RCS file: /usr/local/cvsroot/midas/src/mhttpd.c,v
retrieving revision 1.252
diff -r1.252 mhttpd.c
768a769
> #include <assert.h>
3740c3741
< char mail_to[256], mail_from[256], mail_text[256], mail_list[256],
---
> char mail_to[256], mail_from[256], mail_text[10000], mail_list[256],
3921a3923,3925
> // zero out the array. needed because later strncat() does not
always add the trailing '\0'
> memset(mail_text,0,sizeof(mail_text));
>
3931a3936,3945
>
> assert(strlen(mail_text) + 100 < sizeof(mail_text)); // bomb out
on array overrun.
>
> strcat(mail_text+strlen(mail_text),"\n");
> // this strncat() depends on the mail_text array being zeroed out:
> // strncat() does not always add the trailing '\0'
>
strncat(mail_text+strlen(mail_text),getparam("text"),sizeof(mail_text)-strlen(mail_text)-50);
> strcat(mail_text+strlen(mail_text),"\n");
>
> assert(strlen(mail_text) < sizeof(mail_text)); // bomb out on
array overrun.
Index: src/midas.c
===================================================================
RCS file: /usr/local/cvsroot/midas/src/midas.c,v
retrieving revision 1.192
diff -r1.192 midas.c
604a605
> #include <assert.h>
16267a16269,16270
>
> assert(strlen(message) < sizeof(message)); // bomb out on array overrun.
K.O. |
13 Oct 2003, Stefan Ritt, , mhttpd: add Elog text to outgoing email.
|
> around to implement it, until now. I also added assert() traps for the most
> common array overruns in the Elog code.
In addition to the assert() one should use strlcat() and strlcpy() all over
the code to avoid buffer overruns. The ELOG standalone code does that already
properly.
- Stefan |
13 Oct 2003, Konstantin Olchanski, , mhttpd: add Elog text to outgoing email.
|
> > around to implement it, until now. I also added assert() traps for the most
> > common array overruns in the Elog code.
>
> In addition to the assert() one should use strlcat() and strlcpy() all over
> the code to avoid buffer overruns. The ELOG standalone code does that already
> properly.
>
> - Stefan
Yes, the original authors should have used strlcat(). Now that I uncovered this source of mhttpd
memory corruption, maybe some volunteer will fix it up properly.
K.O. |
13 Oct 2003, Stefan Ritt, , mhttpd: add Elog text to outgoing email.
|
> > > around to implement it, until now. I also added assert() traps for the
most
> > > common array overruns in the Elog code.
> >
> > In addition to the assert() one should use strlcat() and strlcpy() all
over
> > the code to avoid buffer overruns. The ELOG standalone code does that
already
> > properly.
> >
> > - Stefan
>
> Yes, the original authors should have used strlcat(). Now that I uncovered
this source of mhttpd
> memory corruption, maybe some volunteer will fix it up properly.
>
> K.O.
I am the original author and will fix all that once I merged mhttpd and elog.
Due to my current task list, this will happen probably in November.
- Stefan |
12 Oct 2003, Konstantin Olchanski, , Array overruns in mhttpd.c::submit_elog()
|
While adding new functionality to submit_elog() (add the message text to the
outgoing email), I noticed that the email text is being stored into an array
of size 256, mail_text[256], without any checks for array overrun. This
cannot be good. How should this be corrected?
K.O. |
12 Oct 2003, Konstantin Olchanski, , Array overruns in mhttpd.c::submit_elog()
|
> While adding new functionality to submit_elog() (add the message text to the
> outgoing email), I noticed that the email text is being stored into an array
> of size 256, mail_text[256], without any checks for array overrun. This
> cannot be good. How should this be corrected?
> K.O.
Similar problem exists in midas.c::el_submit(). The array "message[10000]" is
easy to overrun by submitting a long elog message.
K.O. |
13 Oct 2003, Stefan Ritt, , Array overruns in mhttpd.c::submit_elog()
|
> > While adding new functionality to submit_elog() (add the message text to
the
> > outgoing email), I noticed that the email text is being stored into an
array
> > of size 256, mail_text[256], without any checks for array overrun. This
> > cannot be good. How should this be corrected?
> > K.O.
>
> Similar problem exists in midas.c::el_submit(). The array "message[10000]"
is
> easy to overrun by submitting a long elog message.
>
> K.O.
The whole elog functionality in mhttpd will be replaced (sometime) by the
standalone ELOG package, linked against mhttpd. The ELOG functionality is
much richer and does not conatin all the mentioned problems which have been
fixed there some time ago. For the time being it might however be worth to
fix the mentioned problems, but without spending too much time on it. |
13 Oct 2003, Konstantin Olchanski, , Array overruns in mhttpd.c::submit_elog()
|
> > > While adding new functionality to submit_elog() ....
>
> The whole elog functionality in mhttpd will be replaced (sometime) ...
I humbly submit that this has been the standard reply for the last 2 years since I was aware of
the "last N days does not always work" problem (just saw it again yesterday).
K.O. |
12 Oct 2003, Konstantin Olchanski, , Refuse to set run number zero
|
I am debugging the frequent problem where the run number is mysteriously
reset to zero. As a first step, I am commiting changes to mhttpd.c and midas.c:
- abort on obviously corrupted "run number < 0"
- abort on cm_transition() to run 0 (the only place where the run number is
explicitely written to ODB)
- in the mhttpd "Start run" form, reject user setting the run number to <= 0.
Here is the CVS diff:
===================================================================
RCS file: /usr/local/cvsroot/midas/src/mhttpd.c,v
retrieving revision 1.253
diff -r1.253 mhttpd.c
2451a2452,2457
> if (run_number < 0)
> {
> cm_msg(MERROR, "show_elog_new", "aborting on attempt to use invalid
run number %d",run_number);
> abort();
> }
>
2506a2513,2519
>
> if (run_number < 0)
> {
> cm_msg(MERROR, "show_elog_new", "aborting on attempt to use invalid
run number %d",run_number);
> abort();
> }
>
3582a3596,3602
>
> if (run_number < 0)
> {
> cm_msg(MERROR, "show_form_query", "aborting on attempt to use invalid
run number %d",run_number);
> abort();
> }
>
5730a5751,5756
> if (rn < 0) // value "zero" is okey
> {
> cm_msg(MERROR, "show_start_page", "aborting on attempt to use invalid
run number %d",rn);
> abort();
> }
>
9684a9711,9719
> if (i <= 0)
> {
> cm_msg(MERROR, "interprete", "Start run: invalid run number %d",i);
> memset(str,0,sizeof(str));
> snprintf(str,sizeof(str)-1,"Invalid run number %d",i);
> show_error(str);
> return;
> }
>
Index: src/midas.c
===================================================================
RCS file: /usr/local/cvsroot/midas/src/midas.c,v
retrieving revision 1.193
diff -r1.193 midas.c
3786c3786
< status = cm_transition(_requested_transition | TR_DEFERRED, 0,
str, 256, SYNC, FALSE);
---
> status = cm_transition(_requested_transition | TR_DEFERRED, 0,
str, sizeof(str), SYNC, FALSE);
3906a3907,3912
> if (run_number <= 0)
> {
> cm_msg(MERROR, "cm_transition", "aborting on attempt to use invalid
run number %d",run_number);
> abort();
> }
>
16069a16076,16081
> }
>
> if (run_number < 0)
> {
> cm_msg(MERROR, "el_submit", "aborting on attempt to use invalid run
number %d", run_number);
> abort();
K.O. |
12 Oct 2003, Konstantin Olchanski, , Refuse to set run number zero
|
> I am debugging the frequent problem where the run number is mysteriously
> reset to zero. As a first step, I am commiting changes to mhttpd.c and midas.c:
> - abort on obviously corrupted "run number < 0"
> - abort on cm_transition() to run 0 (the only place where the run number is
> explicitely written to ODB)
> - in the mhttpd "Start run" form, reject user setting the run number to <= 0.
- abort on cm_transition() from run 0 to 1 during auto restart in mlogger.
Cvs diff:
RCS file: /usr/local/cvsroot/midas/src/mlogger.c,v
retrieving revision 1.65
diff -r1.65 mlogger.c
3277a3278,3283
> if (run_number <= 0)
> {
> cm_msg(MERROR, "main", "aborting on attempt to use invalid run
number %d", run_number);
> abort();
> }
>
K.O. |
11 Aug 2003, Konstantin Olchanski, , mhttpd crash on corrupted ODB /RunInfo
|
Invalid values of ODB /RunInfo/State cause mhttpd crash in
show_status_page() because of an out of bounds access to the array of state
names. Suggest this fix: remove array of state names, use existing ladder of
if/else statements to explicitely set state name. Verified the fix works for
TWIST. Will commit this into MIDAS CVS unless get feedback.
src/mhttpd.c:show_status_page() {
...
rsprintf("<tr align=center><td>Run #%d", runinfo.run_number);
if (runinfo.state == STATE_STOPPED)
rsprintf("<td colspan=1 bgcolor=#FF0000>Stopped");
else if (runinfo.state == STATE_PAUSED)
rsprintf("<td colspan=1 bgcolor=#FFFF00>Paused");
else if (runinfo.state == STATE_RUNNING)
rsprintf("<td colspan=1 bgcolor=#00FF00>Running");
else
rsprintf("<td colspan=1 bgcolor=#FFFFFF>Unknown");
if (runinfo.requested_transition)
...
K.O. |
10 Oct 2003, Konstantin Olchanski, , mhttpd crash on corrupted ODB /RunInfo
|
There was no feedback. This code has been commited. K.O.
> Invalid values of ODB /RunInfo/State cause mhttpd crash in
> show_status_page() because of an out of bounds access to the array of state
> names. Suggest this fix: remove array of state names, use existing ladder of
> if/else statements to explicitely set state name. Verified the fix works for
> TWIST. Will commit this into MIDAS CVS unless get feedback.
>
> src/mhttpd.c:show_status_page() {
> ...
> rsprintf("<tr align=center><td>Run #%d", runinfo.run_number);
>
> if (runinfo.state == STATE_STOPPED)
> rsprintf("<td colspan=1 bgcolor=#FF0000>Stopped");
> else if (runinfo.state == STATE_PAUSED)
> rsprintf("<td colspan=1 bgcolor=#FFFF00>Paused");
> else if (runinfo.state == STATE_RUNNING)
> rsprintf("<td colspan=1 bgcolor=#00FF00>Running");
> else
> rsprintf("<td colspan=1 bgcolor=#FFFFFF>Unknown");
>
> if (runinfo.requested_transition)
> ...
>
> K.O. |
|