Midas Talk

Midas DAQ System

Not logged in

Back | Find | Login | Help

08 Sep 2016, Amy Roberts, Bug Report, control characters not sanitized by json_write - can cause JSON.parse of mhttpd result to fail

30 Sep 2016, Konstantin Olchanski, Bug Report, control characters not sanitized by json_write - can cause JSON.parse of mhttpd result to fail

25 Oct 2016, Thomas Lindner, Bug Report, control characters not sanitized by json_write - can cause JSON.parse of mhttpd result to fail

01 Dec 2016, Thomas Lindner, Bug Report, control characters not sanitized by json_write - can cause JSON.parse of mhttpd result to fail

15 Jan 2017, Thomas Lindner, Bug Report, control characters not sanitized by json_write - can cause JSON.parse of mhttpd result to fail

23 Jan 2017, Thomas Lindner, Bug Report, control characters not sanitized by json_write - can cause JSON.parse of mhttpd result to fail

30 Jan 2017, Stefan Ritt, Bug Report, control characters not sanitized by json_write - can cause JSON.parse of mhttpd result to fail

01 Feb 2017, Konstantin Olchanski, Bug Report, control characters not sanitized by json_write - can cause JSON.parse of mhttpd result to fail

01 Feb 2017, Stefan Ritt, Bug Report, control characters not sanitized by json_write - can cause JSON.parse of mhttpd result to fail

Message ID: 1217 Entry time: 25 Oct 2016 In reply to: 1204

Author:	Thomas Lindner
Topic:	Bug Report
Subject:	control characters not sanitized by json_write - can cause JSON.parse of mhttpd result to fail

> > I've recently run into issues when using JSON.parse on ODB keys containing 
> > 8-bit data.
> 
> I am tempted to take a hard line and say that in general MIDAS TID_STRING data should be valid 
> UTF-8 encoded Unicode. In the modern mixed javascript/json/whatever environment I think
> it is impractical to handle or permit invalid UTF-8 strings.
> ....
> But in your specific case, why do you have random control characters in your TID_STRING data? 
> Maybe you are using TID_STRING as general storage instead of arrays of TID_CHAR or 
> TID_DWORD?

I'm a little confused by this report and want to make sure I understand the situation.  Konstantin points
out that the TID_STRING should be valid UTF-8.  But I think that Amy agreed that the string was valid UTF-8.
 My understanding was that Amy's contention was that the valid UTF-8 string didn't get returned as valid JSON.

But I am having trouble reproducing your behaviour Amy.  I created a ODB string variable with a tab control
control character

  sprintf(mystring,"first line \t second line");
  status = db_set_value(hDB, 0,"/test2/mystring", &mystring, size, 1, TID_STRING);

and what I tried to pull the ODB using jcopy

http://neut18:8081/?cmd=jcopy&odb=/test2/mystring&format=json

I got 

{
"mystring/key" : { "type" : 12, "item_size" : 32, "access_mode" : 7, "last_written" : 1477416322 },
"mystring" : "first line \t second line"
}

which seems to be valid JSON.  

I only tried this with tab.  Are there other control characters that you are having trouble with?  Or maybe
I misunderstand the question?





> 
> > 
> > For JSON.parse to successfully parse a string, (A) the string must be valid 
> > UTF-8, (B) several whitespace characters, control characters, and the 
> > characters " and \ must be escaped, and (C) you've got to follow the key-
> > value rules laid out in http://www.json.org/.
> > 
> > The web browser takes care of (A), and I verified that for this key Midas 
> > handled (C) correctly.  In principle, the function json_write in odb.c 
> > handles (B) - but json_write does not escape control characters.
> > 
> > To manage this problem, I modified json_write (in odb.c) to replace any 
> > control character with the more-inocuous character, 'C'.  My default case 
> > now looks like:
> > 
> > default:
> >          {
> >            // if a char is a control character,
> >            // print 'C' in its place
> >            // note that this loses data:
> >            // a more-correct method would be to print
> >            // \uXXXX, where XXXX is the character in hex
> >            if(iscntrl(*s)){
> >              (*buffer)[(*buffer_end)++] = 'C';
> >              s++;
> >            } else {
> >              (*buffer)[(*buffer_end)++] = *s++;
> >            }
> >          }
> >       
> > Where the call to iscntrl(*s) requires the addition of the ctype.h header 
> > file.
> > 
> > I'm guessing a blanket replacement of control characters with 'C' isn't 
> > something all Midas users would want to do.  Replacing the control character 
> > with its hex value seems like a good choice - but not without adding bounds 
> > checking!
> > 
> > An alternative to changing odb.c could be to add a regex to Midas response 
> > text which removes all control characters (U+0000 - U+001F): 
> > 
> > var resp_lint = req.response.replace(/[\u{0000}-\u{001F}]/gmu, '');
> > var json_obj = JSON.parse(resp_lint);
> > 
> > Unfortunately, the 'u' regex flax doesn't work on the Firefox version 
> > included in Scientific Linux 6.8.

ELOG V3.1.4-2e1708b5