Back Midas Rome Roody Rootana
  Midas DAQ System, Page 128 of 146  Not logged in ELOG logo
New entries since:Wed Dec 31 16:00:00 1969
    Reply  09 Aug 2023, Konstantin Olchanski, Bug Report, Error accessing history files 
I confirm I see same on the agmini system. Two problems: (a) error message is wrong, it's a 
short read, not a read error (clue: read() syscall does not return "no such file"). (b) 
mlogger is supposed to write history in record-size blocks, read in the same record size 
blocks. UNIX file semantics require that both reader and writer see read() and write() as 
atomic, even on NFS, so mhttpd should never see partially written history records. I can 
debug this on the agmini system. Probably should.

Problem (a) fixed in commit bb423c8680cc67220312534403840442868f2b3b, if you update, you 
should see error messages about "short read" and the read sizes it reports are very 
interesting, please put them in the elog here.

K.O.


> We sporadically (like once per few hours) have an error message when we access the 
> history plots through mhttpd:
> 
> 07:21:35.109 2023/08/03 [mhttpd,ERROR] 
> [history_schema.cxx:2345:FileHistory::read_data,ERROR] Cannot read 
> '/data2/history/mhf_1690890685_20230801_dc_hv.dat', read() errno 2 (No such file 
> or directory)
> 
> When I log in to the machine, I properly see the file and also can access it
> 
> [meg@megon02 history]$ ls -l mhf_1690890685_20230801_dc_hv.dat
> -rw-rw-r--. 1 meg meg 34176312 Aug  3 07:23 mhf_1690890685_20230801_dc_hv.dat
> 
> and I also can dump that file. 
> 
> When I try again with mhttpd, I properly see that file. 
> 
> Now in principle this is not a problem, but the error message is annoying, since this 
> is the only error we get in 24 hours. I attached a 24h log to see what I mean. If this 
> is an OS issue, I wonder if we should add code to retry the file access in case we get 
> that error.
> 
> Anybody seen a similar thing?
> 
> Best,
> Stefan
    Reply  16 Aug 2023, Stefan Ritt, Bug Report, Error accessing history files 
Tonight we got another error of that type after the update:

04:17 - [mhttpd,ERROR] [history_schema.cxx:2913:FileHistory::read_data,ERROR] Cannot read 
'/data2/history/mhf_1692128214_20230815_gassystem.dat', read() errno 2 (No such file or directory)

This morning I looked at the file, and it was there:

[meg@megon02 history]$ ls -alg mhf_1692128214_20230815_gassystem.dat
-rw-rw-r--. 1 meg 4663228 Aug 17 08:50 mhf_1692128214_20230815_gassystem.dat
[meg@megon02 history]$


Stefan
    Reply  17 Aug 2023, Konstantin Olchanski, Bug Report, Error accessing history files 
Confirmed. The error message is wrong. It is printed after a short read(), but short read() does not 
set errno, and errno reported by the error message is from some previous syscall. Corrected error 
message is already committed. K.O.


> Tonight we got another error of that type after the update:
> 
> 04:17 - [mhttpd,ERROR] [history_schema.cxx:2913:FileHistory::read_data,ERROR] Cannot read 
> '/data2/history/mhf_1692128214_20230815_gassystem.dat', read() errno 2 (No such file or directory)
> 
> This morning I looked at the file, and it was there:
> 
> [meg@megon02 history]$ ls -alg mhf_1692128214_20230815_gassystem.dat
> -rw-rw-r--. 1 meg 4663228 Aug 17 08:50 mhf_1692128214_20230815_gassystem.dat
> [meg@megon02 history]$
> 
> 
> Stefan
    Reply  19 Aug 2023, Stefan Ritt, Bug Report, Error accessing history files 
Still get the same error with the latest version:

3:28 [mhttpd,ERROR] [history_schema.cxx:2913:FileHistory::read_data,ERROR] Cannot read 
'/data2/history/mhf_1692391703_20230818_hv_tc.dat', read() errno 2 (No such file or directory)

Stefan
    Reply  06 Oct 2023, Konstantin Olchanski, Bug Report, Error accessing history files 
> Still get the same error with the latest version:
> 3:28 [mhttpd,ERROR] [history_schema.cxx:2913:FileHistory::read_data,ERROR] Cannot read 
> '/data2/history/mhf_1692391703_20230818_hv_tc.dat', read() errno 2 (No such file or directory)

I figured it out. I claim defense of temporary insanity and old age senility.

1) I added the "short read" check in one place, missed the second place
2) writes of history were meant to be atomic, and they are atomic in my head, but not in the midas 
code:

history_schema.cxx:HsFileSchema::write_event()
...
   status = write(s->writer_fd, &t, 4);
   if (status != 4) {
      cm_msg(MERROR, "FileHistory::write_event", "Cannot write to \'%s\', write(timestamp) errno 
%d (%s)", s->file_name.c_str(), errno, strerror(errno));
      return HS_FILE_ERROR;
   }

   status = write(s->writer_fd, data, expected_size);
   if (status != expected_size) {
      cm_msg(MERROR, "FileHistory::write_event", "Cannot write to \'%s\', write(%d) errno %d 
(%s)", s->file_name.c_str(), data_size, errno, strerror(errno));
      return HS_FILE_ERROR;
   }
...

that's not atomic, that's two separate writes. history reader hits the history file between the 
two writes and gets a short read of 4 bytes timestamp instead of full record size. that's the 
error message reported by mhttpd.

two fixes forthcoming:
a) check for short read in the 2nd place that I missed
b) two write() are replaced by 2 memcpy() to a preallocated buffer and 1 write()

Overall, I am pretty happy that this is the only bug in the FILE history code found in N years, 
and it does not even cause data corruption...

K.O.
    Reply  06 Oct 2023, Konstantin Olchanski, Bug Report, Error accessing history files 
> two fixes forthcoming:
> a) check for short read in the 2nd place that I missed
> b) two write() are replaced by 2 memcpy() to a preallocated buffer and 1 write()

commit 713ec4a583365d57ffcd700ceeb09dcc14518295

K.O.
Entry  05 Apr 2017, Andreas Suter, Bug Report, Equipment Expand doesn't work anymore 
I'd liked very much the possibility to hide away Equipment on the main page. It
is also nice to have the '+' to get it quickly back when needed. However, this
seems not to work anymore (git c9d9d604803). Is this a feature or something went
wrong?
    Reply  10 Apr 2017, Stefan Ritt, Bug Report, Equipment Expand doesn't work anymore 
> I'd liked very much the possibility to hide away Equipment on the main page. It
> is also nice to have the '+' to get it quickly back when needed. However, this
> seems not to work anymore (git c9d9d604803). Is this a feature or something went
> wrong?

The expansion of the equipment list is handled by a Cookie ("expeq" being 1 or 0). When Konstantin 
implemented the mongoose server instead of the internal mhttp server, he neglected to evaluate 
this cookie. I fixed this now (also renamed the cookie to "midas_expeq") in the current development 
branch. Please check if it's working.

Stefan
    Reply  10 Apr 2017, Andreas Suter, Bug Report, Equipment Expand doesn't work anymore 
> > I'd liked very much the possibility to hide away Equipment on the main page. It
> > is also nice to have the '+' to get it quickly back when needed. However, this
> > seems not to work anymore (git c9d9d604803). Is this a feature or something went
> > wrong?
> 
> The expansion of the equipment list is handled by a Cookie ("expeq" being 1 or 0). When Konstantin 
> implemented the mongoose server instead of the internal mhttp server, he neglected to evaluate 
> this cookie. I fixed this now (also renamed the cookie to "midas_expeq") in the current development 
> branch. Please check if it's working.
> 
> Stefan

Tested it on two machines and expansion is back and working! Thanks a lot!

Andreas
    Reply  15 Apr 2017, Konstantin Olchanski, Bug Report, Equipment Expand doesn't work anymore 
> > > I'd liked very much the possibility to hide away Equipment on the main page. It
> > > is also nice to have the '+' to get it quickly back when needed. However, this
> > > seems not to work anymore (git c9d9d604803). Is this a feature or something went
> > > wrong?
> > 
> > The expansion of the equipment list is handled by a Cookie ("expeq" being 1 or 0). When Konstantin 
> > implemented the mongoose server instead of the internal mhttp server, he neglected to evaluate 
> > this cookie. I fixed this now (also renamed the cookie to "midas_expeq") in the current development 
> > branch. Please check if it's working.
> > 
> > Stefan
> 
> Tested it on two machines and expansion is back and working! Thanks a lot!
> 

Confirmed fixed. Thanks. Not sure how this got lost.

K.O.
Entry  17 Nov 2020, Stefan Ritt, Info, Equipment "common" settings in ODB 
Today I addressed a topic which bugged me since long time. The ODB contains 
settings under /Equipment/<name>/Common which are a "mirror" of the equipment[] 
setting in a frontend (using the mfe.cxx framework). If the "Common" entry in 
the ODB is not present (fresh experiment), the equipment[] settings from the 
frontend are copied to the ODB. But if it exists, it takes precedence over the 
equipment[] entries, which is wrong in my opinion. Like if you change some 
settings in equipment[] (like the logging period of the history), then recompile 
and restart the frontend, the old values in the ODB are kept and your 
modification in the frontend code has no effect.

Starting on commit c3017c6c on Nov. 17th 2020 I reversed the precedence: Now, on 
each start of the frontend program, the values from equipment[] are written to 
the ODB. They are still "live". If one changes them when the frontend is 
running, that change takes effect immediately. But on the next restart of the 
frontend, the old values from equipment[] is put back there.

I fell too many times into this trap, and I hope the modification helps 
everybody. If there are however experiments which rely on the fact that the 
common settings in the ODB are NOT overwritten by the frontend, please let me 
know and I can put a flag "EQUIPMENT_FE_PRECEDENCE = FALSE" somewhere to restore 
the old behaviour.

Stefan
    Reply  20 Nov 2020, Pierre-Andre Amaudruz, Info, Equipment "common" settings in ODB 
Indeed this "mirror" of the ODB in settings option can cause frustration in 
particular when we think the ODB is empty but is not.
In the other hand, over time the settings are adjusted to a particular 
configuration or touched or not by the individual run preset parameters. Later, if 
a bug or code correction requires multiple restart of the fe, for every start of 
the application, you loose the latest configuration. This can be frustrating as 
well until you force a post-setting or report the specifics parameters in the fe 
code.
BTW I believe, we originally went for the ODB priority for that specific reason.
 
I would be in favour for having a general flag (FALSE) in /experiment which would 
define this global behaviour.  
PAA

> Today I addressed a topic which bugged me since long time. The ODB contains 
> settings under /Equipment/<name>/Common which are a "mirror" of the equipment[] 
> setting in a frontend (using the mfe.cxx framework). If the "Common" entry in 
> the ODB is not present (fresh experiment), the equipment[] settings from the 
> frontend are copied to the ODB. But if it exists, it takes precedence over the 
> equipment[] entries, which is wrong in my opinion. Like if you change some 
> settings in equipment[] (like the logging period of the history), then recompile 
> and restart the frontend, the old values in the ODB are kept and your 
> modification in the frontend code has no effect.
> 
> Starting on commit c3017c6c on Nov. 17th 2020 I reversed the precedence: Now, on 
> each start of the frontend program, the values from equipment[] are written to 
> the ODB. They are still "live". If one changes them when the frontend is 
> running, that change takes effect immediately. But on the next restart of the 
> frontend, the old values from equipment[] is put back there.
> 
> I fell too many times into this trap, and I hope the modification helps 
> everybody. If there are however experiments which rely on the fact that the 
> common settings in the ODB are NOT overwritten by the frontend, please let me 
> know and I can put a flag "EQUIPMENT_FE_PRECEDENCE = FALSE" somewhere to restore 
> the old behaviour.
> 
> Stefan
    Reply  27 Nov 2020, Konstantin Olchanski, Info, Equipment "common" settings in ODB 
> Today I addressed a topic which bugged me since long time.

Right. No easy subject. For me, too, this has been a problem in MIDAS for a long time.

> Now, on each start of the frontend program, the values from equipment[] are written to 
> the ODB. They are still "live". If one changes them when the frontend is 
> running, that change takes effect immediately. But on the next restart of the 
> frontend, the old values from equipment[] is put back there.

There is a downside from this behaviour.

If some values in equipment/common are "live" and the user is expected to change them,
the user will be unpleasantly surprised when their changes magically disappear (after reboot,
after frontend crash, after run restart if experiment requires restarting some frontends
before starting a new run).

This change will also break some experiments that rely in things like specifying
event buffer names through ODB. But experiments can adapt and specify buffer names
through command line switch instead of ODB.

This new way also it makes the "live" Common/Period unusable. Sure I can speed up or slow
down a frontend even during the run, but if my change does not "stick", what good is it?

Personally, I think there is no easy solution for all these troubles.

I would advocate the following approach:

- think of MIDAS as a "mature" system,
- treasure backward compatibility
- (if we must break backward compatibility to introduce a new "must have" improvement, so be it)
- document how things work. if it is clearly written down what different fields in "common" do, fewer people 
"get burned" by unexpected or illogical things. (and any non-trivial system has plenty of those).

Going back to ODB equipment/common, my experience with midas and odb tells me
that one should avoid mixing together ODB entries set by user and ODB entries set by code.

For example, separating them as equipment/settings and equipment/variables works well. Mixing
them as in equipment/common and sequencer/state causes trouble.

So perhaps we should split Equipment/common into two pieces, user settable fields like
"Period" and "event buffer name" would move to equipment/settings or whatever.

This will open the discussion of which items in equipment/common should be user settable,
and some people would want event buffer specified in the code to prevail, while other
people would want the name from odb to prevail, and both are valid but conflicting preferences.

Or we could bite the bullet and say, equipment/common is controlled by the frontend code,
the user should not change it. (and mark it read-only in ODB).

For all the pain this may cause, at least this will make it self-consistent.

Per this proposal, in addition to Stefan's change, the hotlink on equipment/common goes away,
"period" is no longer "live" and the whole subdirectory is made "read-only".

K.O.
    Reply  27 Nov 2020, Stefan Ritt, Info, Equipment "common" settings in ODB 
Ok, so what about the following proposal:

- I change back the mfe.cxx code to behave like before (ODB has precedence and does not get overwritten when the 
front-end restarts)

- I add a global flag

BOOL equipment_common_overwrite;

and pre-set it to FALSE;

- So if nothing is changed the flag stays false and ODB keeps precedence

- If a frontend wants to overwrite equipment/common on each start, the user sets

BOOL equipment_common_overwrite = TRUE;

near the equipment[] structure in the front-end code. 

- If the flag is true, the mfe.cxx init code copies the equipment[] structure to the ODB on each frontend start

I believe this way we can keep backward compatibility, and add the new way with minimal effort. The only downside 
is that all frontends on this plane have to add at least "BOOL equipment_common_overwrite = FALSE;" in their 
code.

I know global variables are evil, but this way the user can just add the line above to the equipment[] array, so 
one sees this when one edits the equipment[] array, giving motivation to change as needed. So the code would be



BOOL equipment_common_overwrite = TRUE;

EQUIPMENT equipment[] = {
 ....
}



An alternative way would be to add a function

  set_equipment_common_overwrite(TRUE);

into the frontend_init() code. That's somehow cleaner (still needs an internal global variable), but it has to go 
into frontend_init() so won't be at the same place as the EQUIPMENT list in the frontend.

Thoughts?

Best,
Stefan
    Reply  27 Nov 2020, Konstantin Olchanski, Info, Equipment "common" settings in ODB 
Yes, I think this will work.

For old mfe.c frontends, global variable set to "do it the new way" should be okey,
new experiments will have it the new way. Old experiments, will be forced to add a one-line definition
of this global variable (otherwise mfe.o will not link), at that time they get to chose "new way" or "old way".

For the new TMFE c++ frontend, this will work naturally when they create the Equipment Common object,
in the object constructor, you can see how it explicitly honors or overwrites the ODB common entries.

The TMFE frontend does not do a live "period", so there should be no issue with that.

Should I open a bitbucket issue "update TMFE frontend to new Equipment/Common scheme", to make sure
I do not forget about it?

K.O.
    Reply  30 Nov 2020, Stefan Ritt, Info, Equipment "common" settings in ODB 
Ok, I implemented it the following way:

- Added a boolean flag "equipment_common_overwrite", which must be contained in EACH frontend, preferably just 
before the EQUIPMENT structure, such as:

BOOL equipment_common_overwrite = TRUE;

EQUIPMENT equipment[] = {
...
};

- If that flag is TRUE, then the contents of the "equipment" structure is copied to the ODB on each start of the 
front-end

- If the flag is FALSE, then the ODB values are kept on the start of the front-end

The setting of the flag depends now on the philosophy of the experiment. Some experiments say that everything 
needed should be in the front-end code, so when it starts everything gets set correctly. They don't change the 
values in the ODB, but in the frontend code, which then goes into their repository. Other experiments just need 
some default values from the frontend code, and the fine-tune things by changing values in the ODB. These 
experiments should set this flag to FALSE.

*****

Please note that EVERY frontend now needs this flag, so all of you have to add it to all of your front-ends, 
otherwise the front-end will not compile! I could not figure out how to this could be done without this 
requirement, since you can define a global variable only once.

*****


Stefan
    Reply  30 Nov 2020, Stefan Ritt, Info, Equipment "common" settings in ODB 
One more change: 

After using the new code for some hours, we realized that the "enabled" flag should not come from the frontend code, 
but always be defined by the ODB. So if you quickly have to disable some equipment because the associated hardware is 
off, you want to change this flag only in the ODB and not have to recompile the frontend. So we exclude that flag from 
being set by the frontend. It is anyhow special, because one sees all disable equipment in the main midas status page, 
so one knows what's on and what's off.

Please comment here if you think that change causes problem. Anyhow it's working now for the enabled flag as before 
all these changes.

Stefan
    Reply  30 Nov 2020, Konstantin Olchanski, Info, Equipment "common" settings in ODB 
> One more change: 
> 
> After using the new code for some hours, we realized that the "enabled" flag should not come from the frontend code, 
> but always be defined by the ODB. So if you quickly have to disable some equipment because the associated hardware is 
> off, you want to change this flag only in the ODB and not have to recompile the frontend. So we exclude that flag from 
> being set by the frontend. It is anyhow special, because one sees all disable equipment in the main midas status page, 
> so one knows what's on and what's off.
> 
> Please comment here if you think that change causes problem. Anyhow it's working now for the enabled flag as before 
> all these changes.
> 

Good catch. I still think this is fundamentally impossible to "get right". But good, you
are now in the same boat with me. The documentation will read: "if flag is TRUE, these data fields
are read from ODB, if flag is FALSE, those other fields are read from ODB". I will have to check
how this will work out for the TMFE C++ frontend (I think both mfe.c and TMFE frontends should
work "the same").

I think we have at least one month to play with this, I do not think we can do the next release
of midas until January.

K.O.
Entry  23 Nov 2005, Stefan Ritt, Bug Fix, Endian swapping in mana.c 
It was reported that following code in mana.c :
  /* swap event header if in wrong format */
  if (pevent->serial_number > 0x1000000) {
     WORD_SWAP(&pevent->event_id);
     WORD_SWAP(&pevent->trigger_mask);
     DWORD_SWAP(&pevent->serial_number);
     DWORD_SWAP(&pevent->time_stamp);
     DWORD_SWAP(&pevent->data_size);
  }

does not work correctly for events having a true serial number above 16777216 (=0x10000000). After some considerations, I concluded that there is no good way to determine automatically the endian format of midas events, without adding another field in the header, which would break the compatibility with all recorded data up to date. I therefore changed the above code to
  /* swap event header if in wrong format */
#ifdef SWAP_EVENTS
  WORD_SWAP(&pevent->event_id);
  WORD_SWAP(&pevent->trigger_mask);
  DWORD_SWAP(&pevent->serial_number);
  DWORD_SWAP(&pevent->time_stamp);
  DWORD_SWAP(&pevent->data_size);
#endif

So if one wants to analyze events with the midas analyzer on a PC system for example where the events come from a VxWorks system with the opposite endian encoding, one has to set the flag -DSWAP_EVENTS when compiling the analyzer for that type of analysis.
Entry  26 Aug 2013, Konstantin Olchanski, Bug Fix, Enable cross-site requests in mhttpd 
Javascript "AJAX" functions (and their MIDAS wrappers - ODBGet/ODBSet) are subject to something called 
"same origin policy" intended to prevent something called "cross-site scripting attacks", i.e. see
http://en.wikipedia.org/wiki/Same-origin_policy

In practice it means that if you load the MIDAS custom web page from test.foo.com and try to access 
mhttpd at midas.foo.com, ODBSet/ODBGet will not work.

I always thought that this meant that the requests are blocked by the browser and are a form of 
protection of the web server - only scripts loaded from mhttpd can do AJAX (ODBGet/ODBSet) to mhttpd.

It turns out that I was wrong. This is what actually happens: the "cross-site" requests are still sent to the 
server (mhttpd), the response it received, parsed and discarded if "same origin" conditions are not met.

This means that the "same origin" policy does not protect mhttpd at all - any script from any page 
anywhere can issue AJAX requests into any mhttpd, these requests will be successfully sent, received
and processed by mhttpd, including requests for writing into ODB ("jset" command using the HTTP GET 
method).

So for the case of MIDAS, "same origin" does not prevent malicious (or buggy) scripts from writing into the 
wrong mhttpd of the wrong experiment.

All it does is prevent desired and intentional access to mhttpd (ODBGet) from scripts that happen to have 
been loaded outside of mhttpd (i.e. from a developer own test page).

Then it turns out that there is an "official" way to disable this unwanted protection policy, called CORS, see
http://www.w3.org/TR/cors/

I have now implemented this in mhttpd and added an mhttpd.js function ODBSetURL() to explicitly set the 
URL of mhttpd that we want to talk to.

This work is on the feature/ajax branch, to be merged soon. For the impatient, here is what you need to 
do in mhttpd:

diff --git a/src/mhttpd.cxx b/src/mhttpd.cxx
index 1d9d1cc..0460cec 100755
--- a/src/mhttpd.cxx
+++ b/src/mhttpd.cxx
@@ -1070,6 +1070,7 @@ void show_text_header()
 {
    rsprintf("HTTP/1.0 200 Document follows\r\n");
    rsprintf("Server: MIDAS HTTP %d\r\n", mhttpd_revision());
+   rsprintf("Access-Control-Allow-Origin: *\r\n");
    rsprintf("Pragma: no-cache\r\n");
    rsprintf("Expires: Fri, 01 Jan 1983 00:00:00 GMT\r\n");
    rsprintf("Content-Type: text/plain; charset=iso-8859-1\r\n\r\n");

K.O.
ELOG V3.1.4-2e1708b5