UPDATE1：

Question

我有一个带有语法定义的Hello.g4语法文件：

definition : wordsWithPunctuation ;
words : (WORD)+ ;
wordsWithPunctuation : word ( word | punctuation word | word punctuation | '(' wordsWithPunctuation ')' | '"' wordsWithPunctuation '"' )*  ;
NUMBER : [0-9]+ ;
word : WORD ;
WORD : [A-Za-z-]+ ;
punctuation : PUNCTUATION ;
PUNCTUATION : (','|'!'|'?'|'\''|':'|'.') ;
WS : [ \t\r\n]+ -> skip ; // skip spaces, tabs, newlines

现在，如果我尝试从以下输入构建一个解析树：

a b c d of at of abc bcd of
a b c d at abc, bcd
a b c d of at of abc, bcd of

它返回错误：

Hello::definition:1:31: extraneous input 'of' expecting {<EOF>, '(', '"', WORD, PUNCTUATION}

虽然：

a b c d  at:  abc bcd!

工作正确。

语法或输入或翻译有什么问题？

如果我修改wordsWithPunctuation规则，则添加(... | 'of' | ',' word | ...)然后它会完全匹配输入，但它看起来很可疑 - 单词of与单词{{ 1}}或a？或者为什么abc与其他,字符不同（即，为什么它与punctuation或:匹配，而不是!？）？

UPDATE1：

我正在使用Eclipse的ANTLR4插件，因此项目构建会发生以下输出：

UPDATE2：

上面提到的语法只是部分来自：

ANTLR Tool v4.2.2 (/var/folders/.../antlr-4.2.2-complete.jar)
Hello.g4 -o /Users/.../eclipse_workspace/antlr_test_project/target/generated-sources/antlr4 -listener -no-visitor -encoding UTF-8

现在看来，grammar Hello; text : (entry)+ ; entry : blub 'abrr' '-' ('1')? '.' ('(' NUMBER ')')? sims '-' '(' definitionAndExamples ')' 'Hello' 'all' 'the' 'people' 'of' 'the' 'world'; blub : WORD ; sims : sim (',' sim)* ; sim : words ; definitionAndExamples : definitions (';' examples)? ; definitions : definition (';' definition )* ; definition : wordsWithPunctuation ; examples : example (';' example )* ; example : '"' wordsWithPunctuation '"' ; words : (WORD)+ ; wordsWithPunctuation : word ( word | punctuation word | word punctuation | '(' wordsWithPunctuation ')' | '"' wordsWithPunctuation '"' )* ; NUMBER : [0-9]+ ; word : WORD ; WORD : [A-Za-z-]+ ; punctuation : PUNCTUATION ; PUNCTUATION : (','|'!'|'?'|'\''|':'|'.') ; WS : [ \t\r\n]+ -> skip ; // skip spaces, tabs, newlines规则中的单词以某种方式打破了entry规则中的其他规则。但为什么？它在语法中是一种反模式吗？

Answer 1

通过在解析器规则中包含'of'，ANTLR正在创建一个隐式匿名令牌来表示该输入。单词of将始终具有该特殊标记类型，因此它永远不会具有类型WORD。它可能出现在您的解析树中的唯一位置是解析器规则中出现'of'的位置。

您可以通过将语法分成 HelloLexer.g4 中的单独lexer grammar HelloLexer和 HelloParser.g4中的parser grammar HelloParser来阻止ANTLR创建这些匿名令牌类型/ strong>即可。我强烈建议您始终使用此表单，原因如下：

Lexer模式仅在您执行此操作时才有效。

隐式定义的标记是语法中最常见的错误来源之一，分离语法可以防止它发生。

一旦将语法分开，您就可以更新word解析器规则，以允许将特殊标记of视为单词。

word : WORD | 'of' | ... other keywords which are also "words" ;

为什么ANTLR4不匹配＆＃34;＆＃34;作为一个词和＆＃34;，＆＃34;标点符号？

UPDATE1：

UPDATE2：

1 个答案: