>
> I see you put some switches into the environment ("MIDAS_INVALID_STRING_IS_OK"). Do you think this is a good idea? Most variables are
> sitting in the ODB (/experiment/xxx), except those which cannot be in the ODB because we need it before we open the ODB, like MIDAS_DIR.
> Having them in the ODB has the advantage that everything is in one place, and we see a "list" of things we can change. From an empty
> environment it is not clear that such a thing like "MIDAS_INVALID_STRING_IS_OK" does exist, while if it would be an ODB key it would be
> obvious. Can I convince you to move this flag into the ODB?
>
Some additional explanation.
Time passed, the world turned, and the current web-compatible standard for text strings is UTF-8 encoded Unicode, see
https://en.wikipedia.org/wiki/UTF-8
(ObCanadianContent, UTF-8 was invented the Canadian Rob Pike https://en.wikipedia.org/wiki/Rob_Pike)
(and by some other guy https://en.wikipedia.org/wiki/Ken_Thompson).
It turns out that not every combination of 8-bit characters (char*) is valid UTF-8 Unicode.
In the MIDAS world we run into this when MIDAS ODB strings are exported to Javascript running inside web
browsers ("custom pages", etc). ODB strings (TID_STRING) and ODB key names that are not valid UTF-8
make such web pages malfunction and do not work right.
One solution to this is to declare that ODB strings (TID_STRING) and ODB key names *must* be valid UTF-8 Unicode.
The present commits implemented this solution. Invalid UTF-8 is rejected by db_create() & co and by the ODB integrity validator.
This means some existing running experiment may suddenly break because somehow they have "old-style" ODB entries
or they mistakenly use TID_STRING to store arbitrary binary data (use array of TID_CHAR instead).
To permit such experiments to use current releases of MIDAS, we include a "defeat" device - to disable UTF-8 checks
until they figure out where non-UTF-8 strings come from and correct the problem.
Why is this defeat device non an ODB entry? Because it is not a normal mode of operation - there is no use-case where
an experiment will continue to use non-UTF-8 compatible ODB indefinitely, in the long term. For example, as the MIDAS user
interface moves to more and more to HTML+Javascript+"AJAX", such experiments will see that non-UTF-8 compatible ODB entries
cause all sorts of problems and will have to convert.
K.O. |