Back Midas Rome Roody Rootana
  Midas DAQ System  Not logged in ELOG logo
Entry  18 Feb 2020, Lukas Gerritzen, Bug Report, RPC Error: ACK or other control chars from "db_get_values" Screenshot_from_2020-02-18_10-46-22.png
    Reply  18 Feb 2020, Stefan Ritt, Bug Report, RPC Error: ACK or other control chars from "db_get_values" 
    Reply  20 Feb 2020, Konstantin Olchanski, Bug Report, RPC Error: ACK or other control chars from "db_get_values" 
Message ID: 1837     Entry time: 20 Feb 2020     In reply to: 1835
Author: Konstantin Olchanski 
Topic: Bug Report 
Subject: RPC Error: ACK or other control chars from "db_get_values" 
> The unexpected token is \0x6
> RPC Error json parser exception: SyntaxError: JSON.parse: bad control character in string literal at line 80 column 30 of the JSON data, method: "db_get_valus", params: [object Object], id: 1582020074098.

Yes, there is a problem.

Traditionally, midas strings in ODB have no restriction on the content (I think even the '\0x0' char is permitted).

But web browser javascript strings are supposed to be valid unicode (UTF-16, if I read this right: https://tc39.es/ecma262/#sec-ecmascript-language-types-string-type).

The collision between the two happens when ODB values are json-encoded by midas, then json-decoded by the web browser.

The midas json encoder (mjson.h, mjson.cxx) encodes ODB strings according to JSON rules, but does not ensure that the result is valid UTF-8. (valid UTF-8 is not required, if I read the specs correctly http://www.ecma-
international.org/publications/files/ECMA-ST/ECMA-404.pdf and https://www.json.org/json-en.html)

The web browser json decoder requires valid UTF-8 and throws exceptions if it does not like something. Different browsers it slightly differently, so we have an error handler for this in the mjsonrpc results processor.

What does this mean in practice?

Now that MIDAS is very web oriented, MIDAS strings must be web browser friendly, too:

a) all ODB key names (subdirectory names, link names, etc) must be UTF-8 unicode, and this has been enforced by ODB for some time now.
b) all ODB string values must be valid UTF-8 unicode. This is not enforced right now.

Historically, it was okey to use ODB TID_STRING to store arbitrary binary data, but now, I think, we must deprecate this,
at least for any ODB entries that could be returned to a web browser (which means all of them, after we implement a fully
html+javascript odb editor). For storing binary data, arrays of TID_CHAR, TID_DWORD & co are probably a better match.

The MIDAS and ROOTANA json decoders (the same mjson.h, mjson.cxx) do not care about UTF-8, so ODB dumps
in JSON format are not affected by any of this. (But I am not sure about the JSON decoder in ROOT).

Bottom line:

I think db_validate() should check for invalid UTF-8 in ODB key names and in TID_STRING values
and at least warn the user. (I am not sure if invalid UTF-8 can be fixed automatically). db_create()
should reject key names that are not valid UTF-8 (it already does this, I think). db_set_value(TID_STRING) should
probably reject invalid UTF-8 strings, this needs to be discussed some more.

https://bitbucket.org/tmidas/midas/issues/215/everything-in-odb-must-be-valid-utf-8

K.O.
ELOG V3.1.4-2e1708b5