> > I've recently run into issues when using JSON.parse on ODB keys containing
> > 8-bit data.
>
> I am tempted to take a hard line and say that in general MIDAS TID_STRING data should be valid
> UTF-8 encoded Unicode. In the modern mixed javascript/json/whatever environment I think
> it is impractical to handle or permit invalid UTF-8 strings.
> ....
> But in your specific case, why do you have random control characters in your TID_STRING data?
> Maybe you are using TID_STRING as general storage instead of arrays of TID_CHAR or
> TID_DWORD?
I'm a little confused by this report and want to make sure I understand the situation. Konstantin points
out that the TID_STRING should be valid UTF-8. But I think that Amy agreed that the string was valid UTF-8.
My understanding was that Amy's contention was that the valid UTF-8 string didn't get returned as valid JSON.
But I am having trouble reproducing your behaviour Amy. I created a ODB string variable with a tab control
control character
sprintf(mystring,"first line \t second line");
status = db_set_value(hDB, 0,"/test2/mystring", &mystring, size, 1, TID_STRING);
and what I tried to pull the ODB using jcopy
http://neut18:8081/?cmd=jcopy&odb=/test2/mystring&format=json
I got
{
"mystring/key" : { "type" : 12, "item_size" : 32, "access_mode" : 7, "last_written" : 1477416322 },
"mystring" : "first line \t second line"
}
which seems to be valid JSON.
I only tried this with tab. Are there other control characters that you are having trouble with? Or maybe
I misunderstand the question?
>
> >
> > For JSON.parse to successfully parse a string, (A) the string must be valid
> > UTF-8, (B) several whitespace characters, control characters, and the
> > characters " and \ must be escaped, and (C) you've got to follow the key-
> > value rules laid out in http://www.json.org/.
> >
> > The web browser takes care of (A), and I verified that for this key Midas
> > handled (C) correctly. In principle, the function json_write in odb.c
> > handles (B) - but json_write does not escape control characters.
> >
> > To manage this problem, I modified json_write (in odb.c) to replace any
> > control character with the more-inocuous character, 'C'. My default case
> > now looks like:
> >
> > default:
> > {
> > // if a char is a control character,
> > // print 'C' in its place
> > // note that this loses data:
> > // a more-correct method would be to print
> > // \uXXXX, where XXXX is the character in hex
> > if(iscntrl(*s)){
> > (*buffer)[(*buffer_end)++] = 'C';
> > s++;
> > } else {
> > (*buffer)[(*buffer_end)++] = *s++;
> > }
> > }
> >
> > Where the call to iscntrl(*s) requires the addition of the ctype.h header
> > file.
> >
> > I'm guessing a blanket replacement of control characters with 'C' isn't
> > something all Midas users would want to do. Replacing the control character
> > with its hex value seems like a good choice - but not without adding bounds
> > checking!
> >
> > An alternative to changing odb.c could be to add a regex to Midas response
> > text which removes all control characters (U+0000 - U+001F):
> >
> > var resp_lint = req.response.replace(/[\u{0000}-\u{001F}]/gmu, '');
> > var json_obj = JSON.parse(resp_lint);
> >
> > Unfortunately, the 'u' regex flax doesn't work on the Firefox version
> > included in Scientific Linux 6.8. |