Question

我在pyparsing中定义了一种简单的语言。解析工作正常，但问题在于错误消息。他们显示错误的行号。我在这里展示了代码的主要部分

communications = Group( Suppress(CaselessLiteral("communications")) + op + ZeroOrMore(communicationList) + cl + semicolon)

language = Suppress(CaselessLiteral("language")) + (CaselessLiteral("cpp")|CaselessLiteral("python")) + semicolon

componentContents = communications.setResultsName('communications') & language.setResultsName('language') & gui.setResultsName('gui') & options.setResultsName('options')

component = Suppress(CaselessLiteral("component")) + identifier.setResultsName("name") + op + componentContents.setResultsName("properties") + cl + semicolon

CDSL = idslImports.setResultsName("imports") + component.setResultsName("component")

仅在component之前报告正确的行号，但对于component内的任何错误（即在componentContents中），它只是说组件开始的行号。例如，这是要解析的文本的示例

import "/robocomp/interfaces/IDSLs/Test.idsl";

Component publish
{
    Communications
    {
        requires test;
        implements test;
    };
    language python;
};

如果我在python;之后或测试之后错过了分号。它会说(line:4, col:1)即{。

Answer 1

此行为是pyparsing的特征，而不是bug，并且需要额外注意（或解决）。

当pyparsing无法匹配复杂表达式中的某个位置时，它会将其解析堆栈展开回其最后一个完全完整的表达式替代。你知道在匹配“组件”后，之后的任何内容应该是组件定义中的错误，但是pyparsing不会。因此，当在打开关键字之后发生故障时，pyparsing将备份并报告关键字表达式（包括关键字）无法匹配。

当您拥有这样的命令语法时，关键字通常是明确的。例如，在匹配“组件”之后，任何不是标识符后跟括号中的参数列表的内容都将是错误的。您可以通过将'+'运算符替换为' - '运算符来指示pyparsing 不备份超过'component'。

看着你的语法，我会备份并写一篇简短的BNF（总是很好的练习）：

communications ::= 'communications' '(' communicationList* ')' ';'
language       ::= 'language' ('cpp' | 'python') ';'
componentContents ::= communications | language | gui | options
component      ::= 'component' identifier '(' component_contents+ ')' ';'
CDSL           ::= idslImports component

如果语法中有关键字，我建议您使用Keyword或CaselessKeyword，而不是Literal或CaselessLiteral。 Literal类不强制使用单词边界，因此如果我使用Literal("no")作为语法的一部分，它可以匹配'not'或'none'或'nothing'等前导'no'等。

以下是我将如何接近这个BNF。（我将使用setResultsName的快捷版本，我发现它可以使语法本身更清晰。）：

LBRACE,RBRACE,SEMI = map(Suppress, "{};")
identifier = pyparsing_common.identifier

# keywords - extend as needed
(IMPORT, COMMUNICATIONS, LANGUAGE, COMPONENT, CPP, 
 PYTHON, REQUIRES, IMPLEMENTS) = map(CaselessKeyword, """
    IMPORT COMMUNICATIONS LANGUAGE COMPONENT CPP PYTHON 
    REQUIRES IMPLEMENTS""".split())

# keyword-leading expressions, use '-' operator to prevent backtracking once significant keyword is parsed
communicationItem = Group((REQUIRES | IMPLEMENTS) - identifier + SEMI)
communications = Group( COMMUNICATIONS.suppress() - LBRACE + ZeroOrMore(communicationItem) + RBRACE + SEMI)
language = Group(LANGUAGE.suppress() - (CPP | PYTHON) + SEMI)

componentContents = communications('communications') & language('language') & gui('gui') & options('options')
component = Group(COMPONENT - identifier("name") + Group(LBRACE + componentContents + RBRACE)("properties") + SEMI)

CDSL = idslImports("imports") + component("component")

使用以下方法解析您的样本组件：

sample = """\
Component publish
{
    Communications
    {
        requires test;
        implements test;
    };
    language python;
};
"""

component.runTests([sample])

给出：

[['COMPONENT', 'publish', [[['REQUIRES', 'test'], ['IMPLEMENTS', 'test']], ['PYTHON']]]]
[0]:
  ['COMPONENT', 'publish', [[['REQUIRES', 'test'], ['IMPLEMENTS', 'test']], ['PYTHON']]]
  - name: 'publish'
  - properties: [[['REQUIRES', 'test'], ['IMPLEMENTS', 'test']], ['PYTHON']]
    - communications: [['REQUIRES', 'test'], ['IMPLEMENTS', 'test']]
      [0]:
        ['REQUIRES', 'test']
      [1]:
        ['IMPLEMENTS', 'test']
    - language: ['PYTHON']

（顺便说一下，我喜欢你使用'＆amp;'运算符来对不同内容与pyparsing的Each类进行无序匹配 - 我认为这样可以提供更友好，更健壮的解析器。事实证明{{{ 1}}与' - '运算符有轻微冲突，我将在下一个版本中解决此问题。）

解析异常中的错误行号

1 个答案: