如何编写语法来捕获内联注释而忽略仅带注释的行?

时间:2017-03-27 06:48:45

标签: python-3.x antrl4

我正在为我正在开发的新语言编写语法。 该语言具有以下评论定义:

  1. 评论可以是“内联”或“仅限行”评论
  2. “内联”评论以#
  3. 开头
  4. “only-line”评论以#*
  5. 开头
  6. 每个语言结尾都以newline
  7. 结尾
  8. “only-line”评论可以忽略
  9. 应该处理“内联”注释(在代码生成器阶段将值传递给树步行者)
  10. 示例:

    keyword(0x12, 0x12) # this is an inline comment
    keyword(0x34, 0x34) # this is another inline comment
    
    # this is an "only-line" comment
    * this is another "only-line" comment
    keyword(0x55, 0x55) # this is the 3rd inline comment
    

    这是我(减少)语法来实现这个目标:

    statement :   empty_line
              |   comment_statement
              |   keyword_statement
              ;
    
    keyword_statement : 'keyword' '(' HEX_VALUE ',' HEX_VALUE ')' in_line_comment?;
    
    in_line_comment : IN_LINE_COMMENT;
    
    comment_statement : LINE_COMMENT;
    empty_line        : NL;
    
    IN_LINE_COMMENT : '#' ~[\r\n]* ;
    LINE_COMMENT    : [#*] ~[\r\n]* -> skip;
    
    HEX_VALUE       : '0x' [0-9a-fA-F]+;
    
    NL              : '\r'? '\n' -> channel(2);
    WS              : [ \t]+ -> skip;
    

    编译Antlr4并将示例文本提供给语法产生:

    [@0,0:6='keyword',<'keyword'>,1:0]
    [@1,7:7='(',<'('>,1:7]
    [@2,8:11='0x12',<HEX_VALUE>,1:8]
    [@3,12:12=',',<','>,1:12]
    [@4,14:17='0x12',<HEX_VALUE>,1:14]
    [@5,18:18=')',<')'>,1:18]
    [@6,20:46='# this is an inline comment',<IN_LINE_COMMENT>,1:20]
    [@7,47:47='\n',<NL>,channel=2,1:47]
    [@8,48:54='keyword',<'keyword'>,2:0]
    [@9,55:55='(',<'('>,2:7]
    [@10,56:59='0x34',<HEX_VALUE>,2:8]
    [@11,60:60=',',<','>,2:12]
    [@12,62:65='0x34',<HEX_VALUE>,2:14]
    [@13,66:66=')',<')'>,2:18]
    [@14,68:99='# this is another inline comment',<IN_LINE_COMMENT>,2:20]
    [@15,100:100='\n',<NL>,channel=2,2:52]
    [@16,101:101='\n',<NL>,channel=2,3:0]
    [@17,102:133='# this is an "only-line" comment',<IN_LINE_COMMENT>,4:0]
    [@18,134:134='\n',<NL>,channel=2,4:32]
    [@19,172:172='\n',<NL>,channel=2,5:37]
    [@20,173:179='keyword',<'keyword'>,6:0]
    [@21,180:180='(',<'('>,6:7]
    [@22,181:184='0x55',<HEX_VALUE>,6:8]
    [@23,185:185=',',<','>,6:12]
    [@24,187:190='0x55',<HEX_VALUE>,6:14]
    [@25,191:191=')',<')'>,6:18]
    [@26,193:224='# this is the 3rd inline comment',<IN_LINE_COMMENT>,6:20]
    [@27,225:225='\n',<NL>,channel=2,6:52]
    [@28,226:225='<EOF>',<EOF>,7:0]
    line 4:0 extraneous input '# this is an "only-line" comment' expecting {<EOF>, 'keyword', LINE_COMMENT, NL}
    

    表示以#开头的“only-line”注释被标识为LINE_COMMENT标记,这是错误的。

    如何指示语法以不同方式处理该评论?

1 个答案:

答案 0 :(得分:0)

确定。我自己挖掘这个社区服务......

这是我的解决方案。 我在语法中使用语义谓词来解决问题。 该解决方案目前正在使用Java实现(只是为了消除Antlr4 Python的复杂性) - 但我肯定会将下面的内容翻译成python

我修改过的语法:

@lexer::members {
    int in_line = 0;                                       <-- initialize to "only-line"
}

prog      : statement+ EOF;

statement :   empty_line
          |   comment_statement
          |   keyword_statement
          ;

keyword_statement : KEYWORD '(' HEX_VALUE ',' HEX_VALUE ')' in_line_comment?;

in_line_comment : IN_LINE_COMMENT;

comment_statement : LINE_COMMENT;
empty_line        : NL;

KEYWORD         : 'keyword' {in_line = 1;};

IN_LINE_COMMENT : '#' ~[\r\n]* {in_line == 1}?;            <-- will match this token only if in_line == 1 in run-time
LINE_COMMENT    : [#*] ~[\r\n]* -> skip;

HEX_VALUE       : '0x' [0-9a-fA-F]+;

NL              : '\r'? '\n' {in_line = 0;}-> channel(2);  <-- reset in_line to 0 after every statement
WS              : [ \t]+ -> skip;