json: Nicer recovery from lexical errors

When the lexer chokes on an input character, it consumes the
character, emits a JSON error token, and enters its start state.  This
can lead to suboptimal error recovery.  For instance, input

    0123 ,

produces the tokens

    JSON_ERROR    01
    JSON_INTEGER  23
    JSON_COMMA    ,

Make the lexer skip characters after a lexical error until a
structural character ('[', ']', '{', '}', ':', ','), an ASCII control
character, or '\xFE', or '\xFF'.

Note that we must not skip ASCII control characters, '\xFE', '\xFF',
because those are documented to force the JSON parser into known-good
state, by docs/interop/qmp-spec.txt.

The lexer now produces

    JSON_ERROR    01
    JSON_COMMA    ,

Update qmp-test for the nicer error recovery: QMP now reports just one
error for input %p instead of two.  Also drop the newline after %p; it
was needed to tease out the second error.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20180831075841.13363-5-armbru@redhat.com>
[Conflict with commit ebb4d82d88 resolved]
This commit is contained in:
Markus Armbruster 2018-08-31 09:58:39 +02:00
parent c0ee3afa7f
commit 0f07a5d5f1
2 changed files with 30 additions and 18 deletions

View file

@ -76,10 +76,7 @@ static void test_malformed(QTestState *qts)
assert_recovered(qts);
/* lexical error: interpolation */
qtest_qmp_send_raw(qts, "%%p\n");
/* two errors, one for "%", one for "p" */
resp = qtest_qmp_receive(qts);
qmp_assert_error_class(resp, "GenericError");
qtest_qmp_send_raw(qts, "%%p");
resp = qtest_qmp_receive(qts);
qmp_assert_error_class(resp, "GenericError");
assert_recovered(qts);