Question

编辑：我做了第一个版本，Eike帮我推进了相当多的版本。我现在坚持一个更具体的问题，我将在下面描述。您可以查看history

中的原始问题

我正在使用pyparsing来解析用于从数据库请求特定数据的小语言。它具有许多关键字，运算符和数据类型以及布尔逻辑。

我正在尝试改进发出语法错误时发送给用户的错误消息，因为当前的错误信息不是很有用。我设计了一个小例子，类似于我正在用上述语言做的但是要小得多：

#!/usr/bin/env python                            

from pyparsing import *

def validate_number(s, loc, tokens):
    if int(tokens[0]) != 0:
        raise ParseFatalException(s, loc, "number musth be 0")

def fail(s, loc, tokens):
    raise ParseFatalException(s, loc, "Unknown token %s" % tokens[0])

def fail_value(s, loc, expr, err):
    raise ParseFatalException(s, loc, "Wrong value")

number =  Word(nums).setParseAction(validate_number).setFailAction(fail_value)
operator = Literal("=")

error = Word(alphas).setParseAction(fail)
rules = MatchFirst([
    Literal('x') + operator + number,
])

rules = operatorPrecedence(rules | error , [
    (Literal("and"), 2, opAssoc.RIGHT),
])

def try_parse(expression):
    try:
        rules.parseString(expression, parseAll=True)
    except Exception as e:
        msg = str(e)
        print("%s: %s" % (msg, expression))
        print(" " * (len("%s: " % msg) + (e.loc)) + "^^^")

基本上，我们用这种语言可以做的唯一事情就是编写x = 0系列，与and和括号一起编写。

现在，有些情况下，使用and和括号时，错误报告不是很好。请考虑以下示例：

>>> try_parse("x = a and x = 0") # This one is actually good!
Wrong value (at char 4), (line:1, col:5): x = a and x = 0
                                              ^^^
>>> try_parse("x = 0 and x = a")
Expected end of text (at char 6), (line:1, col:1): x = 0 and x = a
                                                         ^^^
>>> try_parse("x = 0 and (x = 0 and (x = 0 and (x = a)))")
Expected end of text (at char 6), (line:1, col:1): x = 0 and (x = 0 and (x = 0 and (x = a)))
                                                         ^^^
>>> try_parse("x = 0 and (x = 0 and (x = 0 and (x = 0)))")
Expected end of text (at char 6), (line:1, col:1): x = 0 and (x = 0 and (x = 0 and (xxxxxxxx = 0)))
                                                         ^^^

实际上，似乎如果解析器不能解析（并且解析这里很重要）在and之后，它就不会产生再好的错误消息:(

我的意思是解析，因为如果它可以解析5但是“验证”在解析操作中失败，它仍会产生一个好的错误消息。但是，如果它无法解析有效数字（如a）或有效关键字（如xxxxxx），则会停止生成正确的错误消息。

有什么想法吗？

Answer 1

Pyparsing总会有一些错误的错误消息，因为它会回溯。错误消息在解析器尝试的最后一个规则中生成。解析器无法知道错误的确切位置，只知道没有匹配规则。

要获得良好的错误消息，您需要一个尽早放弃的解析器。这些解析器不如Pyparsing灵活，但大多数传统的编程语言都可以使用这样的解析器进行解析。（C ++和Scala恕我直言不能。）

要改进Pyparsing中的错误消息，请使用-运算符，它的工作方式与+运算符类似，但它不会回溯。你可以这样使用它：

assignment = Literal("let") - varname - "=" - expression

以下是Pyparsing的作者的small article on improving error reporting。

修改

您还可以为执行验证的解析操作中的无效数字生成良好的错误消息。如果该数字无效，则引发一个未被Pyparsing捕获的异常。此异常可包含良好的错误消息。

解析操作可以有三个参数[1]：

s =正在解析的原始字符串（请参阅下面的注释）

loc =匹配子字符串的位置

toks =匹配的令牌列表，打包为ParseResults对象

还有三种有用的辅助方法可用于创建良好的错误消息[2]：

lineno(loc, string) - 用于给出字符串中位置的行号的函数;第一行是第1行，换行符开始新行。

col(loc, string) - 用于给出字符串中位置的列号的函数;第一列是第1列，换行符将列号重置为1。

line(loc, string) - 用于检索代表lineno(loc, string)的文本行的函数。在打印出异常的诊断消息时很有用。

您的验证解析操作将如下所示：

def validate_odd_number(s, loc, toks): value = toks[0] value = int(value) if value % 2 == 0: raise MyFatalParseException( "not an odd number. Line {l}, column {c}.".format(l=lineno(loc, s), c=col(loc, s)))

[1] http://pythonhosted.org/pyparsing/pyparsing.pyparsing.ParserElement-class.html#setParseAction

[2] HowToUsePyparsing

修改

这里[3]是问题的当前（2013-4-10）脚本的改进版本。它得到了正确的示例错误，但在错误的位置指示了其他错误。我相信我的Pyparsing版本（'1.5.7'）中存在错误，但也许我只是不明白Pyparsing是如何工作的。问题是：

ParseFatalException似乎并不总是致命的。当我使用自己的异常时，脚本按预期工作。

-运营商似乎无法运作。

[3] http://pastebin.com/7E4kSnkm

使用pyparsing改进错误消息

1 个答案: