> I've recently run into issues when using JSON.parse on ODB keys containing
> 8-bit data.
I am tempted to take a hard line and say that in general MIDAS TID_STRING data should be valid
UTF-8 encoded Unicode. In the modern mixed javascript/json/whatever environment I think
it is impractical to handle or permit invalid UTF-8 strings.
Certainly in the general case, replacing all control characters with something else or escaping them or
otherwise changing the value if TID_STRING data would wreck *valid* UTF-8 strings, which I would
assume to be the normal use.
In other words, non-UTF-8 strings are following non-IEEE-754 floating point values into oblivion - as
we do not check the TID_FLOAT and TID_DOUBLE is valid IEEE-754 values, we should not check
that TID_STRING is valid UTF-8.
But in your specific case, why do you have random control characters in your TID_STRING data?
Maybe you are using TID_STRING as general storage instead of arrays of TID_CHAR or
TID_DWORD?
K.O.
>
> For JSON.parse to successfully parse a string, (A) the string must be valid
> UTF-8, (B) several whitespace characters, control characters, and the
> characters " and \ must be escaped, and (C) you've got to follow the key-
> value rules laid out in http://www.json.org/.
>
> The web browser takes care of (A), and I verified that for this key Midas
> handled (C) correctly. In principle, the function json_write in odb.c
> handles (B) - but json_write does not escape control characters.
>
> To manage this problem, I modified json_write (in odb.c) to replace any
> control character with the more-inocuous character, 'C'. My default case
> now looks like:
>
> default:
> {
> // if a char is a control character,
> // print 'C' in its place
> // note that this loses data:
> // a more-correct method would be to print
> // \uXXXX, where XXXX is the character in hex
> if(iscntrl(*s)){
> (*buffer)[(*buffer_end)++] = 'C';
> s++;
> } else {
> (*buffer)[(*buffer_end)++] = *s++;
> }
> }
>
> Where the call to iscntrl(*s) requires the addition of the ctype.h header
> file.
>
> I'm guessing a blanket replacement of control characters with 'C' isn't
> something all Midas users would want to do. Replacing the control character
> with its hex value seems like a good choice - but not without adding bounds
> checking!
>
> An alternative to changing odb.c could be to add a regex to Midas response
> text which removes all control characters (U+0000 - U+001F):
>
> var resp_lint = req.response.replace(/[\u{0000}-\u{001F}]/gmu, '');
> var json_obj = JSON.parse(resp_lint);
>
> Unfortunately, the 'u' regex flax doesn't work on the Firefox version
> included in Scientific Linux 6.8. |