使用Lark解析多项选择测验

时间:2019-05-20 14:37:06

标签: python-3.x parsing lalr lark-parser

我正在尝试使用以下语法来分析百灵鸟的多项选择测验。


    GRAMMAR = """
        start: question choice~3..5

        question: QUESTION_NUMBER _QUESTION_NUMBER_SEPARATOR question_body
        question_body:  LINE+ 
        QUESTION_NUMBER: DIGIT+ 
        _QUESTION_NUMBER_SEPARATOR: WS_INLINE* "." WS_INLINE*

        choice.3: CHOICE_NAME ")" choice_body
        choice_body: LINE+
        CHOICE_NAME.3: ("A" | "B" | "C" | "D" | "E")

        LINE: (WORD | PUNCTUATION | WS_INLINE )* NEWLINE 
        WORD: (LETTER | DIGIT | /[şŞöÖüÜçÇğĞıİâî]/)+
        PUNCTUATION: (SEPARATOR | GROUPER | MATHS | OTHER)
        SEPARATOR: ("," | "." | ":" | ";" | "?" | "!"| "-")
        GROUPER: ("<" | ">" | "[" | "]" | "(" | ")" )
        MATHS: ("–" | "+" | "/" | "=" | "÷")
        OTHER: /["'_\\\]/

        _EOL : WS_INLINE* _NL
        _NL : (NEWLINE | /\f/)

        %import common.NEWLINE
        %import common.LETTER
        %import common.DIGIT
        %import common.WS_INLINE
    """

    parser = lark.Lark(
            GRAMMAR,
            parser="lalr",
            lexer="contextual",
            keep_all_tokens=False,
            debug=True,
        )


问题类似于以下示例:

1. This is a section of question body.

Another part of the question body.  

A) Option A
B) Option B
C) Option C
D) Option D
E) Option E 

问题主体和选择主体都可能包含多行以及空行。

运行代码时出现以下错误:

lark.exceptions.UnexpectedCharacters: No terminal defined for 'n' at line 3 col 2
Another part of the question body.
 ^
Expecting: {'RPAR'}

显然,解析器试图处理该部分,就好像它是一个选择一样,并由于与A后面的“)”不匹配而失败。

选择的顺序无关紧要,例如,下一个也由于相同的原因而失败。

 1. This is a section of question body.

Be a part of the question body.  

A) Option A
B) Option B
C) Option C
D) Option D
E) Option E 

给出相同的错误:

lark.exceptions.UnexpectedCharacters: No terminal defined for 'e' at line 3 col 2
Be a part of the question body.
 ^
Expecting: {'RPAR'}

但是,所有不以“ ABCDE”开头的行都将成功解析为问题正文的一部分。例如,这有效:

 1. This is a section of question body.

Second part of the question body.  

A) Option A
B) Option B
C) Option C
D) Option D
E) Option E 

# program's output is as  
{
    "question": {
        "body": "This is a section of question body.\n\nSecond part of the question body.  \n\n",
        "number": "1",
    },
    "choices": (
        {"name": "A", "body": " Option A\n"},
        {"name": "B", "body": " Option B\n"},
        {"name": "C", "body": "Option C\n"},
        {"name": "D", "body": " Option D\n"},
        {"name": "E", "body": "Option E \n\n"},
    ),
}

我在语法上做错了什么?

0 个答案:

没有答案