Back Midas Rome Roody Rootana
  Midas DAQ System  Not logged in ELOG logo
Entry  08 Sep 2016, Amy Roberts, Bug Report, control characters not sanitized by json_write - can cause JSON.parse of mhttpd result to fail 
    Reply  30 Sep 2016, Konstantin Olchanski, Bug Report, control characters not sanitized by json_write - can cause JSON.parse of mhttpd result to fail 
       Reply  25 Oct 2016, Thomas Lindner, Bug Report, control characters not sanitized by json_write - can cause JSON.parse of mhttpd result to fail 
       Reply  01 Dec 2016, Thomas Lindner, Bug Report, control characters not sanitized by json_write - can cause JSON.parse of mhttpd result to fail odb_modifications.txt
          Reply  15 Jan 2017, Thomas Lindner, Bug Report, control characters not sanitized by json_write - can cause JSON.parse of mhttpd result to fail 
             Reply  23 Jan 2017, Thomas Lindner, Bug Report, control characters not sanitized by json_write - can cause JSON.parse of mhttpd result to fail 
                Reply  30 Jan 2017, Stefan Ritt, Bug Report, control characters not sanitized by json_write - can cause JSON.parse of mhttpd result to fail 
                   Reply  01 Feb 2017, Konstantin Olchanski, Bug Report, control characters not sanitized by json_write - can cause JSON.parse of mhttpd result to fail 
                      Reply  01 Feb 2017, Stefan Ritt, Bug Report, control characters not sanitized by json_write - can cause JSON.parse of mhttpd result to fail 
Message ID: 1204     Entry time: 30 Sep 2016     In reply to: 1196     Reply to this: 1217   1223
Author: Konstantin Olchanski 
Topic: Bug Report 
Subject: control characters not sanitized by json_write - can cause JSON.parse of mhttpd result to fail 
> I've recently run into issues when using JSON.parse on ODB keys containing 
> 8-bit data.

I am tempted to take a hard line and say that in general MIDAS TID_STRING data should be valid 
UTF-8 encoded Unicode. In the modern mixed javascript/json/whatever environment I think
it is impractical to handle or permit invalid UTF-8 strings.

Certainly in the general case, replacing all control characters with something else or escaping them or 
otherwise changing the value if TID_STRING data would wreck *valid* UTF-8 strings, which I would 
assume to be the normal use.

In other words, non-UTF-8 strings are following non-IEEE-754 floating point values into oblivion - as 
we do not check the TID_FLOAT and TID_DOUBLE is valid IEEE-754 values, we should not check 
that TID_STRING is valid UTF-8.

But in your specific case, why do you have random control characters in your TID_STRING data? 
Maybe you are using TID_STRING as general storage instead of arrays of TID_CHAR or 
TID_DWORD?

K.O.



> 
> For JSON.parse to successfully parse a string, (A) the string must be valid 
> UTF-8, (B) several whitespace characters, control characters, and the 
> characters " and \ must be escaped, and (C) you've got to follow the key-
> value rules laid out in http://www.json.org/.
> 
> The web browser takes care of (A), and I verified that for this key Midas 
> handled (C) correctly.  In principle, the function json_write in odb.c 
> handles (B) - but json_write does not escape control characters.
> 
> To manage this problem, I modified json_write (in odb.c) to replace any 
> control character with the more-inocuous character, 'C'.  My default case 
> now looks like:
> 
> default:
>          {
>            // if a char is a control character,
>            // print 'C' in its place
>            // note that this loses data:
>            // a more-correct method would be to print
>            // \uXXXX, where XXXX is the character in hex
>            if(iscntrl(*s)){
>              (*buffer)[(*buffer_end)++] = 'C';
>              s++;
>            } else {
>              (*buffer)[(*buffer_end)++] = *s++;
>            }
>          }
>       
> Where the call to iscntrl(*s) requires the addition of the ctype.h header 
> file.
> 
> I'm guessing a blanket replacement of control characters with 'C' isn't 
> something all Midas users would want to do.  Replacing the control character 
> with its hex value seems like a good choice - but not without adding bounds 
> checking!
> 
> An alternative to changing odb.c could be to add a regex to Midas response 
> text which removes all control characters (U+0000 - U+001F): 
> 
> var resp_lint = req.response.replace(/[\u{0000}-\u{001F}]/gmu, '');
> var json_obj = JSON.parse(resp_lint);
> 
> Unfortunately, the 'u' regex flax doesn't work on the Firefox version 
> included in Scientific Linux 6.8.  
ELOG V3.1.4-2e1708b5