> Some additional explanation.
>
> Time passed, the world turned, and the current web-compatible standard for text strings is UTF-8 encoded Unicode, see
> https://en.wikipedia.org/wiki/UTF-8
> (ObCanadianContent, UTF-8 was invented the Canadian Rob Pike https://en.wikipedia.org/wiki/Rob_Pike)
> (and by some other guy https://en.wikipedia.org/wiki/Ken_Thompson).
>
> It turns out that not every combination of 8-bit characters (char*) is valid UTF-8 Unicode.
>
> In the MIDAS world we run into this when MIDAS ODB strings are exported to Javascript running inside web
> browsers ("custom pages", etc). ODB strings (TID_STRING) and ODB key names that are not valid UTF-8
> make such web pages malfunction and do not work right.
>
> One solution to this is to declare that ODB strings (TID_STRING) and ODB key names *must* be valid UTF-8 Unicode.
>
> The present commits implemented this solution. Invalid UTF-8 is rejected by db_create() & co and by the ODB integrity validator.
>
> This means some existing running experiment may suddenly break because somehow they have "old-style" ODB entries
> or they mistakenly use TID_STRING to store arbitrary binary data (use array of TID_CHAR instead).
>
> To permit such experiments to use current releases of MIDAS, we include a "defeat" device - to disable UTF-8 checks
> until they figure out where non-UTF-8 strings come from and correct the problem.
>
> Why is this defeat device non an ODB entry? Because it is not a normal mode of operation - there is no use-case where
> an experiment will continue to use non-UTF-8 compatible ODB indefinitely, in the long term. For example, as the MIDAS user
> interface moves to more and more to HTML+Javascript+"AJAX", such experiments will see that non-UTF-8 compatible ODB entries
> cause all sorts of problems and will have to convert.
>
>
> K.O.
Ok, I agree.
Stefan |