Antlr v4:这个C#文字的简单语法出了什么问题?

时间:2013-07-26 07:58:24

标签: c# antlr

我决定将C#官方语法翻译为 antlr v4 。但是,在测试时我遇到了以下问题。给定的语法与\n\ntrue\n\n<EOF>之类的简单单词不匹配。它一直在说mismatched input '\n\ntrue\n\n' expecting Literal 。即使我将Literal的定义保留为Literal: BooleanLiteral;,输入\n\ntrue\n\n<EOF>仍然无法匹配。我期待语法跳过\n s true<EOF>,但显然这种情况并没有发生。试图调试,但仍然无法找到任何错误。有什么想法吗?

grammar Test;

start: Literal EOF;

/**********
 *
 * Literals
 *
 **********/

Literal
    :   BooleanLiteral 
    |   IntegerLiteral 
    |   RealLiteral 
    |   CharacterLiteral 
    |   StringLiteral 
    |   NullLiteral 
    ;

BooleanLiteral
    :   'true' 
    |   'false' 
    ;

IntegerLiteral
    :   DecimalIntegerLiteral 
    |   HexadecimalIntegerLiteral 
    ;

DecimalIntegerLiteral
    :   DecimalDigits IntegerTypeSuffix? 
    ;

DecimalDigits
    :   DecimalDigit+
    ;

DecimalDigit
    :   [0-9]
    ;

IntegerTypeSuffix
    :   'U' 
    |   'u' 
    |   'L' 
    |   'l' 
    |   'UL' 
    |   'Ul' 
    |   'uL' 
    |   'ul' 
    |   'LU' 
    |   'Lu' 
    |   'lU' 
    |   'lu' 
    ;

HexadecimalIntegerLiteral
    :   ('0x' | '0X') HexDigits IntegerTypeSuffix?
    ;

HexDigits
    :   HexDigit+
    ;

HexDigit    
    :   [0-9A-Fa-f]
    ;

RealLiteral
    :   DecimalDigits '.' DecimalDigits ExponentPart? RealTypeSuffix? 
    |   '.' DecimalDigits ExponentPart? RealTypeSuffix? 
    |   DecimalDigits ExponentPart RealTypeSuffix? 
    |   DecimalDigits RealTypeSuffix 
    ;

ExponentPart
    :   ('e' | 'E') Sign? DecimalDigits
    ;

Sign    
    :   '+'
    |   '-' 
    ;

RealTypeSuffix  
    :   'F'
    |   'f' 
    |   'D' 
    |   'd' 
    |   'M' 
    |   'm' 
    ;

CharacterLiteral
    :   '\'' Character '\'' 
    ;

Character
    :   SingleCharacter 
    |   SimpleEscapeSequence 
    |   HexadecimalEscapeSequence 
    |   UnicodeEscapeSequence 
    ;

UnicodeEscapeSequence
    :   '\\' 'u' HexDigit HexDigit HexDigit HexDigit 
    |   '\\' 'U' HexDigit HexDigit HexDigit HexDigit HexDigit HexDigit HexDigit HexDigit 
    ;

SingleCharacter
    :   ~[\\\\\\\u000D\u000A\u0085\u2028\u2029]
    ;

SimpleEscapeSequence    
    : '\\\''
    | '\\"'
    | '\\\\'
    | '\\0'
    | '\\a'
    | '\\b'
    | '\\f'
    | '\\n'
    | '\\r'
    | '\\t'
    | '\\v'
    ;

HexadecimalEscapeSequence
    :   '\\x' HexDigit HexDigit? HexDigit? HexDigit? 
    ;

StringLiteral
    :   RegularStringLiteral 
    |   VerbatimStringLiteral 
    ;

RegularStringLiteral
    :   '"' RegularStringLiteralCharacters? '"' 
    ;

RegularStringLiteralCharacters
    :   RegularStringLiteralCharacter+
    ;

RegularStringLiteralCharacter
    :   SingleRegularStringLiteralCharacter 
    |   SimpleEscapeSequence 
    |   HexadecimalEscapeSequence 
    |   UnicodeEscapeSequence 
    ;

SingleRegularStringLiteralCharacter
    :   ~["\\\u000D\u000A\u0085\u2028\u2029]
    ;

VerbatimStringLiteral
    :   '@"' VerbatimStringLiteralCharacters? '"' 
    ;

VerbatimStringLiteralCharacters
    :   VerbatimStringLiteralCharacter+
    ;

VerbatimStringLiteralCharacter
    :   SingleVerbatimStringLiteralCharacter 
    |   QuoteEscapeSequence 
    ;

SingleVerbatimStringLiteralCharacter
    :   ~["]
    ;

QuoteEscapeSequence
    :   '""' 
    ;

NullLiteral
    :   'null'
    ;


/**********
 *
 * Whitespaces and comments
 *
 **********/    

WS  : [ \t\r\n]+ -> skip
    ;

COMMENT
    :   '/*' .*? '*/' -> skip
    ;

LINE_COMMENT
    :   '//' ~[\r\n]* -> skip
    ;

编辑: 好吧,我已经成功地将问题与这段代码隔离开来了:

grammar Test;

start : VerbatimStringLiteral EOF ;

VerbatimStringLiteral
    :   '@"' VerbatimStringLiteralCharacter* '"' 
    ;

VerbatimStringLiteralCharacter
    :   SingleVerbatimStringLiteralCharacter 
    |   QuoteEscapeSequence 
    ;

SingleVerbatimStringLiteralCharacter
    :   ~["]
    ;

QuoteEscapeSequence
    :   '""' 
    ;

WS  :  [ \t\r\n]+ -> skip
    ;

1 个答案:

答案 0 :(得分:1)

不会自行生成令牌的Lexer规则应使用fragment修饰符进行标记。例如,QuoteEscapeSequence不是独立令牌;它只是VerbatimStringLiteral令牌的一部分,因此您应该使用fragment标记它。以下是一些应该是fragment规则的其他规则:

  • VerbatimStringLiteralCharacter
  • SingleVerbatimStringLiteralCharacter
  • SingleRegularStringLiteralCharacter
  • RegularStringLiteralCharacter
  • RegularStringLiteralCharacters←这个是您输入此特定输入的错误的来源
  • SimpleEscapeSequence

可能会有更多,但这应该让你知道问题是什么以及如何解决它。