json: Reject invalid UTF-8 sequences

We reject bytes that can't occur in valid UTF-8 (\xC0..\xC1,
\xF5..\xFF in the lexer.  That's insufficient; there's plenty of
invalid UTF-8 not containing these bytes, as demonstrated by
check-qjson:

* Malformed sequences

  - Unexpected continuation bytes

  - Missing continuation bytes after start bytes other than
    \xC0..\xC1, \xF5..\xFD.

* Overlong sequences with start bytes other than \xC0..\xC1,
  \xF5..\xFD.

* Invalid code points

Fixing this in the lexer would be bothersome.  Fixing it in the parser
is straightforward, so do that.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20180823164025.12553-23-armbru@redhat.com>
This commit is contained in:
Markus Armbruster 2018-08-23 18:39:49 +02:00
parent a89d3104a2
commit e59f39d403
4 changed files with 122 additions and 105 deletions

View file

@ -13,6 +13,7 @@
#include "qemu/osdep.h"
#include "qemu/cutils.h"
#include "qemu/unicode.h"
#include "qapi/error.h"
#include "qemu-common.h"
#include "qapi/qmp/qbool.h"
@ -133,6 +134,10 @@ static QString *qstring_from_escaped_str(JSONParserContext *ctxt,
const char *ptr = token->str;
QString *str;
char quote;
int cp;
char *end;
ssize_t len;
char utf8_buf[5];
assert(*ptr == '"' || *ptr == '\'');
quote = *ptr++;
@ -194,12 +199,15 @@ static QString *qstring_from_escaped_str(JSONParserContext *ctxt,
goto out;
}
} else {
char dummy[2];
dummy[0] = *ptr++;
dummy[1] = 0;
qstring_append(str, dummy);
cp = mod_utf8_codepoint(ptr, 6, &end);
if (cp <= 0) {
parse_error(ctxt, token, "invalid UTF-8 sequence in string");
goto out;
}
ptr = end;
len = mod_utf8_encode(utf8_buf, sizeof(utf8_buf), cp);
assert(len >= 0);
qstring_append(str, utf8_buf);
}
}