Back Midas Rome Roody Rootana
  Midas DAQ System  Not logged in ELOG logo
Entry  08 Sep 2016, Amy Roberts, Bug Report, control characters not sanitized by json_write - can cause JSON.parse of mhttpd result to fail 
    Reply  30 Sep 2016, Konstantin Olchanski, Bug Report, control characters not sanitized by json_write - can cause JSON.parse of mhttpd result to fail 
       Reply  25 Oct 2016, Thomas Lindner, Bug Report, control characters not sanitized by json_write - can cause JSON.parse of mhttpd result to fail 
       Reply  01 Dec 2016, Thomas Lindner, Bug Report, control characters not sanitized by json_write - can cause JSON.parse of mhttpd result to fail odb_modifications.txt
          Reply  15 Jan 2017, Thomas Lindner, Bug Report, control characters not sanitized by json_write - can cause JSON.parse of mhttpd result to fail 
             Reply  23 Jan 2017, Thomas Lindner, Bug Report, control characters not sanitized by json_write - can cause JSON.parse of mhttpd result to fail 
                Reply  30 Jan 2017, Stefan Ritt, Bug Report, control characters not sanitized by json_write - can cause JSON.parse of mhttpd result to fail 
                   Reply  01 Feb 2017, Konstantin Olchanski, Bug Report, control characters not sanitized by json_write - can cause JSON.parse of mhttpd result to fail 
                      Reply  01 Feb 2017, Stefan Ritt, Bug Report, control characters not sanitized by json_write - can cause JSON.parse of mhttpd result to fail 
Message ID: 1223     Entry time: 01 Dec 2016     In reply to: 1204     Reply to this: 1227
Author: Thomas Lindner 
Topic: Bug Report 
Subject: control characters not sanitized by json_write - can cause JSON.parse of mhttpd result to fail 
> > I've recently run into issues when using JSON.parse on ODB keys containing 
> > 8-bit data.
> 
> I am tempted to take a hard line and say that in general MIDAS TID_STRING data should be valid 
> UTF-8 encoded Unicode. In the modern mixed javascript/json/whatever environment I think
> it is impractical to handle or permit invalid UTF-8 strings.
> 
> Certainly in the general case, replacing all control characters with something else or escaping them or 
> otherwise changing the value if TID_STRING data would wreck *valid* UTF-8 strings, which I would 
> assume to be the normal use.
> 
> In other words, non-UTF-8 strings are following non-IEEE-754 floating point values into oblivion - as 
> we do not check the TID_FLOAT and TID_DOUBLE is valid IEEE-754 values, we should not check 
> that TID_STRING is valid UTF-8.

I agree that I think we should start requiring strings to be UTF-8 encoded unicode. 

I'd suggest that before worrying about the TID_STRING data, we should start by sanitizing the ODB key names.
 I've seen a couple cases where the ODB key name is a non-UTF-8 string.  It is very awkward to use odbedit
to delete these keys.

I attach a suggested modification to odb.c that rejects calls to db_create_key with non-UTF-8 key names.  It
uses some random function I found on the internet that is supposed to check if a string is valid UTF-8.  I
checked a couple of strings with invalid UTF-8 characters and it correctly identified them.  But I won't
claim to be certain that this is really identifying all UTF-8 vs non-UTF-8 cases.  Maybe others have a
better way of identifying this.
Attachment 1: odb_modifications.txt  3 kB  | Show | Show all
ELOG V3.1.4-2e1708b5