Question

在我的语法中，我有类似的东西：

line : startWord (matchPhrase|
                  anyWord matchPhrase|
                  anyWord anyWord matchPhrase|
                  anyWord anyWord anyWord matchPhrase|
                  anyWord anyWord anyWord anyWord matchPhrase) 
       -> ^(TreeParent startWord anyWord* matchPhrase);

所以我希望匹配第一次出现的matchPhrase，但我会在它之前允许最多一定数量的anyWord。构成matchPhrase的令牌也与anyWord匹配。

有更好的方法吗？

我认为通过将语义谓词in this answer与非贪婪选项结合起来可能是可能的：

(options {greedy=false;} : anyWord)*

但我无法确切知道如何做到这一点。

编辑：以下是一个示例。我想从以下句子中提取信息：

Picture of a red flower.

Picture of the following: A red flower.

我的输入实际上是标记的英语句子，Lexer规则匹配标签而不是单词。所以ANTLR的输入是：

NN-PICTURE Picture IN-OF of DT a JJ-COLOR red NN-FLOWER flower

NN-PICTURE Picture IN-OF of DT the VBG following COLON : DT a JJ-COLOR red NN-FLOWER flower

我对每个标签都有lexer规则：

WS :  (' ')+ {skip();};
TOKEN : (~' ')+;

nnpicture:'NN-PICTURE' TOKEN -> ^('NN-PICTURE' TOKEN);
vbg:'VBG' TOKEN -> ^('VBG' TOKEN);

我的解析器规则是这样的：

sentence : nnpicture inof matchFlower;

matchFlower : (dtTHE|dt)? jjcolor? nnflower;

但当然第二句会失败。所以我希望通过在鲜花匹配之前允许多达N个令牌来允许一点灵活性。我有一个匹配任何内容的anyWord令牌，以下内容有效：

sentence :  nnpicture inof ( matchFlower | 
                             anyWord matchFlower |
                             anyWord anyWord matchFlower | etc.

但它不是很优雅，并且与大N不兼容。

Answer 1

您可以首先在matchFlower规则中查看 syntactic predicate，如果确实在其令牌流中提前dt? jjcolor? nnflower，则可以执行此操作。如果可以看到这样的标记，只需匹配它们，如果没有，则匹配任何标记，并递归地匹配matchFlower。这看起来像是：

matchFlower : (dt? jjcolor? nnflower)=> dt? jjcolor? nnflower -> ^(FLOWER dt? jjcolor? nnflower) | . matchFlower -> matchFlower ;

请注意，解析器规则中的.（点）不匹配任何字符，但任何标记都匹配。

这是一个快速演示：

grammar T; options { output=AST; } tokens { TEXT; SENTENCE; FLOWER; } parse : sentence+ EOF -> ^(TEXT sentence+) ; sentence : nnpicture inof matchFlower -> ^(SENTENCE nnpicture inof matchFlower) ; nnpicture : NN_PICTURE TOKEN -> ^(NN_PICTURE TOKEN) ; matchFlower : (dt? jjcolor? nnflower)=> dt? jjcolor? nnflower -> ^(FLOWER dt? jjcolor? nnflower) | . matchFlower -> matchFlower ; inof : IN_OF (t=IN | t=OF) -> ^(IN_OF $t) ; dt : DT (t=THE | t=A) -> ^(DT $t) ; jjcolor : JJ_COLOR TOKEN -> ^(JJ_COLOR TOKEN) ; nnflower : NN_FLOWER TOKEN -> ^(NN_FLOWER TOKEN) ; IN_OF : 'IN-OF'; NN_FLOWER : 'NN-FLOWER'; DT : 'DT'; A : 'a'; THE : 'the'; IN : 'in'; OF : 'of'; VBG : 'VBG'; NN_PICTURE : 'NN-PICTURE'; JJ_COLOR : 'JJ-COLOR'; TOKEN : ~' '+; WS : ' '+ {skip();};

根据上面的语法生成的解析器将解析您的输入：

NN-PICTURE Picture IN-OF of DT the VBG following COLON : DT a JJ-COLOR red NN-FLOWER flower

如下：

如您所见，花中的所有内容都从树中省略。如果你想在这里保留这些令牌，可以这样做：

grammar T; // ... tokens { // ... NOISE; } // ... matchFlower : (dt? jjcolor? nnflower)=> dt? jjcolor? nnflower -> ^(FLOWER dt? jjcolor? nnflower) | t=. matchFlower -> ^(NOISE $t) matchFlower ; // ...

导致以下AST：

在ANTLR中以非贪婪的方式匹配特定数量的重复

1 个答案: