Midas Talk

Midas DAQ System

Not logged in

Back | Find | Login | Help

08 Sep 2016, Amy Roberts, Bug Report, control characters not sanitized by json_write - can cause JSON.parse of mhttpd result to fail

30 Sep 2016, Konstantin Olchanski, Bug Report, control characters not sanitized by json_write - can cause JSON.parse of mhttpd result to fail

25 Oct 2016, Thomas Lindner, Bug Report, control characters not sanitized by json_write - can cause JSON.parse of mhttpd result to fail

01 Dec 2016, Thomas Lindner, Bug Report, control characters not sanitized by json_write - can cause JSON.parse of mhttpd result to fail

15 Jan 2017, Thomas Lindner, Bug Report, control characters not sanitized by json_write - can cause JSON.parse of mhttpd result to fail

23 Jan 2017, Thomas Lindner, Bug Report, control characters not sanitized by json_write - can cause JSON.parse of mhttpd result to fail

30 Jan 2017, Stefan Ritt, Bug Report, control characters not sanitized by json_write - can cause JSON.parse of mhttpd result to fail

01 Feb 2017, Konstantin Olchanski, Bug Report, control characters not sanitized by json_write - can cause JSON.parse of mhttpd result to fail

01 Feb 2017, Stefan Ritt, Bug Report, control characters not sanitized by json_write - can cause JSON.parse of mhttpd result to fail

Message ID: 1223 Entry time: 01 Dec 2016 In reply to: 1204 Reply to this: 1227

Author:	Thomas Lindner
Topic:	Bug Report
Subject:	control characters not sanitized by json_write - can cause JSON.parse of mhttpd result to fail

> > I've recently run into issues when using JSON.parse on ODB keys containing 
> > 8-bit data.
> 
> I am tempted to take a hard line and say that in general MIDAS TID_STRING data should be valid 
> UTF-8 encoded Unicode. In the modern mixed javascript/json/whatever environment I think
> it is impractical to handle or permit invalid UTF-8 strings.
> 
> Certainly in the general case, replacing all control characters with something else or escaping them or 
> otherwise changing the value if TID_STRING data would wreck *valid* UTF-8 strings, which I would 
> assume to be the normal use.
> 
> In other words, non-UTF-8 strings are following non-IEEE-754 floating point values into oblivion - as 
> we do not check the TID_FLOAT and TID_DOUBLE is valid IEEE-754 values, we should not check 
> that TID_STRING is valid UTF-8.

I agree that I think we should start requiring strings to be UTF-8 encoded unicode. 

I'd suggest that before worrying about the TID_STRING data, we should start by sanitizing the ODB key names.
 I've seen a couple cases where the ODB key name is a non-UTF-8 string.  It is very awkward to use odbedit
to delete these keys.

I attach a suggested modification to odb.c that rejects calls to db_create_key with non-UTF-8 key names.  It
uses some random function I found on the internet that is supposed to check if a string is valid UTF-8.  I
checked a couple of strings with invalid UTF-8 characters and it correctly identified them.  But I won't
claim to be certain that this is really identifying all UTF-8 vs non-UTF-8 cases.  Maybe others have a
better way of identifying this.

Attachment 1:

odb_modifications.txt 3 kB | Show | Show all

ELOG V3.1.4-2e1708b5