使用antlr - 一元减号和消息链接简化的smalltalk语法

时间:2012-05-10 14:28:05

标签: antlr grammar smalltalk backtracking unary-operator

我正在使用antlr编写简单的类似smalltalk的语法。它是smalltalk的简化版本,但基本思想是相同的(例如消息传递)。

到目前为止,这是我的语法:

grammar GAL;

options {
    //k=2;
    backtrack=true;
}

ID  :   ('a'..'z'|'A'..'Z'|'_') ('a'..'z'|'A'..'Z'|'0'..'9'|'_')*
    ;

INT :   '0'..'9'+
    ;

FLOAT
    :   ('0'..'9')+ '.' ('0'..'9')* EXPONENT?
    |   '.' ('0'..'9')+ EXPONENT?
    |   ('0'..'9')+ EXPONENT
    ;

COMMENT
    :   '"' ( options {greedy=false;} : . )* '"' {$channel=HIDDEN;}
    ;

WS  :   ( ' '
        | '\t'
        ) {$channel=HIDDEN;}
    ;

NEW_LINE
    :   ('\r'?'\n')
    ;

STRING
    :  '\'' ( ESC_SEQ | ~('\\'|'\'') )* '\''
    ;

fragment
EXPONENT : ('e'|'E') ('+'|'-')? ('0'..'9')+ ;

fragment
HEX_DIGIT : ('0'..'9'|'a'..'f'|'A'..'F') ;

fragment
ESC_SEQ
    :   '\\' ('b'|'t'|'n'|'f'|'r'|'\"'|'\''|'\\')
    |   UNICODE_ESC
    |   OCTAL_ESC
    ;

fragment
OCTAL_ESC
    :   '\\' ('0'..'3') ('0'..'7') ('0'..'7')
    |   '\\' ('0'..'7') ('0'..'7')
    |   '\\' ('0'..'7')
    ;

fragment
UNICODE_ESC
    :   '\\' 'u' HEX_DIGIT HEX_DIGIT HEX_DIGIT HEX_DIGIT
    ;

BINARY_MESSAGE_CHAR
    :   ('~' | '!' | '@' | '%' | '&' | '*' | '-' | '+' | '=' | '|' | '\\' | '<' | '>' | ',' | '?' | '/')
        ('~' | '!' | '@' | '%' | '&' | '*' | '-' | '+' | '=' | '|' | '\\' | '<' | '>' | ',' | '?' | '/')?
    ;

// parser

program
    :   NEW_LINE* (statement (NEW_LINE+ | EOF))*
    ;

statement

    :   message_sending
    |   return_statement
    |   assignment
    |   temp_variables
    ;

return_statement
    :   '^' statement
    ;

assignment
    :   identifier ':=' statement
    ;

temp_variables
    :   '|' identifier+ '|'
    ;

object
    :   raw_object
    ;

raw_object
    :   number
    |   string
    |   identifier
    |   literal
    |   block
    |   '(' message_sending ')'
    ;

message_sending
    :   keyword_message_sending
    ;

keyword_message_sending
    :   binary_message_sending keyword_message?
    ;

binary_message_sending
    :   unary_message_sending binary_message*
    ;

unary_message_sending
    :   object (unary_message)*
    ;

unary_message
    :   unary_message_selector
    ;

binary_message
    :   binary_message_selector unary_message_sending
    ;

keyword_message
    :   (NEW_LINE? single_keyword_message_selector NEW_LINE? binary_message_sending)+
    ;

block 
    : 
      '[' (block_signiture

      )? NEW_LINE* 
      block_body

      NEW_LINE* ']'
    ;

block_body 
    :  (statement 

      )?
      (NEW_LINE+ statement 

      )*
    ;


block_signiture 
    : 
      (':' identifier

      )+ '|'
    ;

unary_message_selector
    :   identifier
    ;

binary_message_selector
    :   BINARY_MESSAGE_CHAR
    ;

single_keyword_message_selector
    :   identifier ':'
    ;

keyword_message_selector
    :   single_keyword_message_selector+
    ;

symbol
    :   '#' (string | identifier | binary_message_selector | keyword_message_selector)
    ; 

literal
    :   symbol block? // if there is block then this is method
    ;

number
    : /*'-'?*/
    ( INT | FLOAT )
    ;

string
    :   STRING
    ;

identifier
    :   ID
    ;

1。一元减去

我对数字的一元减号有问题(规则number的注释部分)。问题是减号是有效的二进制消息。更糟糕的是,两个减号也是有效的二进制消息。我需要的是一元减去,如果没有对象发送二进制消息(例如,-3 + 4应该是一元减去因为-3没有任何东西)。此外,( - 3)也应该是二进制减去。如果1 - -2将是参数-2的二进制消息' - ',那将是很好的,但我可以没有它。我怎么能这样做?

如果我取消注释一元减号,则在解析类似1-2的内容时会收到错误MismatchedSetException(0!= null)。

2。邮件链接

在smalltalk中实现消息chaing的最佳方法是什么?我的意思是这样的:

obj message1 + 3; 
    message2; 
    + 3; 
    keyword: 2+3

其中每条消息都将发送到同一个对象,在本例中为obj。应保留邮件优先级(一元&gt;二进制&gt;关键字)。

第3。回溯

这个语法的大部分都可以使用k=2进行解析,但输入的时候是这样的:

1 + 2
Obj message: 
    1 + 2
    message2: 'string'

解析器尝试将Obj与single_keyword_message_selector匹配,并在令牌UnwantedTokenExcaption上引发message。如果删除k=2并设置backtrack=true(就像我一样),一切正常。如何删除回溯并获得所需的行为?

此外,大多数语法都可以使用k=1进行解析,因此我尝试仅为需要它的规则设置k=2,但忽略了这一点。我做了这样的事情:

rule
    options { k = 2; }
    : // rule definition
    ;

但是在我在全局选项中设置k之前它不起作用。我在这里缺少什么?


更新

从头开始编写语法并不是理想的解决方案,因为我有很多依赖它的代码。此外,设计中缺少一些缺少的小磁条功能。这不是另一个小小的实现,smalltalk只是一个灵感。

我会更乐意在这样的情况下让一元减去工作:-1+22+(-1)。像2 -- -1这样的案例并不那么重要。

此外,消息链接应该像posible一样简单。这意味着我不喜欢改变AST的想法。

关于回溯 - 我可以忍受它,只是出于个人的好奇心而被问到这一点。

这是一个很小的修改语法,可以生成AST - 也许它有助于更​​好地理解我不想改变的内容。 (temp_variables可能会被删除,我没有做出这个决定)。

grammar GAL;

options {
    //k=2;
    backtrack=true;
    language=CSharp3;
    output=AST;
}

tokens {
    HASH     = '#';
    COLON    = ':';
    DOT      = '.';
    CARET    = '^';
    PIPE     = '|';
    LBRACKET = '[';
    RBRACKET = ']';
    LPAREN   = '(';
    RPAREN   = ')';
    ASSIGN   = ':=';
}

// generated files options
@namespace { GAL.Compiler }
@lexer::namespace { GAL.Compiler}

// this will disable CLSComplaint warning in ANTLR generated code
@parser::header { 
// Do not bug me about [System.CLSCompliant(false)]
#pragma warning disable 3021 
}

@lexer::header { 
// Do not bug me about [System.CLSCompliant(false)]
#pragma warning disable 3021 
}

ID  :   ('a'..'z'|'A'..'Z'|'_') ('a'..'z'|'A'..'Z'|'0'..'9'|'_')*
    ;

INT :   '0'..'9'+
    ;

FLOAT
    :   ('0'..'9')+ '.' ('0'..'9')* EXPONENT?
    |   '.' ('0'..'9')+ EXPONENT?
    |   ('0'..'9')+ EXPONENT
    ;

COMMENT
    :   '"' ( options {greedy=false;} : . )* '"' {$channel=Hidden;}
    ;

WS  :   ( ' '
        | '\t'
        ) {$channel=Hidden;}
    ;

NEW_LINE
    :   ('\r'?'\n')
    ;

STRING
    :  '\'' ( ESC_SEQ | ~('\\'|'\'') )* '\''
    ;

fragment
EXPONENT : ('e'|'E') ('+'|'-')? ('0'..'9')+ ;

fragment
HEX_DIGIT : ('0'..'9'|'a'..'f'|'A'..'F') ;

fragment
ESC_SEQ
    :   '\\' ('b'|'t'|'n'|'f'|'r'|'\"'|'\''|'\\')
    |   UNICODE_ESC
    |   OCTAL_ESC
    ;

fragment
OCTAL_ESC
    :   '\\' ('0'..'3') ('0'..'7') ('0'..'7')
    |   '\\' ('0'..'7') ('0'..'7')
    |   '\\' ('0'..'7')
    ;

fragment
UNICODE_ESC
    :   '\\' 'u' HEX_DIGIT HEX_DIGIT HEX_DIGIT HEX_DIGIT
    ;

BINARY_MESSAGE_CHAR
    :   ('~' | '!' | '@' | '%' | '&' | '*' | '-' | '+' | '=' | '|' | '\\' | '<' | '>' | ',' | '?' | '/')
        ('~' | '!' | '@' | '%' | '&' | '*' | '-' | '+' | '=' | '|' | '\\' | '<' | '>' | ',' | '?' | '/')?
    ;

// parser

public program returns [ AstProgram program ]
    : { $program = new AstProgram(); }
    NEW_LINE* 
    ( statement (NEW_LINE+ | EOF)
        { $program.AddStatement($statement.stmt); }
    )*
    ;

statement returns [ AstNode stmt ]
    : message_sending
        { $stmt = $message_sending.messageSending; } 
    | return_statement
        { $stmt = $return_statement.ret; }
    | assignment
        { $stmt = $assignment.assignment; }
    | temp_variables
        { $stmt = $temp_variables.tempVars; }
    ;

return_statement returns [ AstReturn ret ]
    : CARET statement
        { $ret = new AstReturn($CARET, $statement.stmt); }
    ;

assignment returns [ AstAssignment assignment ]
    : dotted_expression ASSIGN statement
        { $assignment = new AstAssignment($dotted_expression.dottedExpression, $ASSIGN, $statement.stmt); }
    ;

temp_variables returns [ AstTempVariables tempVars ]
    : p1=PIPE 
        { $tempVars = new AstTempVariables($p1); }
    ( identifier
        { $tempVars.AddVar($identifier.identifier); }
    )+ 
    p2=PIPE
        { $tempVars.EndToken = $p2; }
    ;

object returns [ AstNode obj ]
    : number
        { $obj = $number.number; }
    | string
        { $obj = $string.str; }
    | dotted_expression
        { $obj = $dotted_expression.dottedExpression; }
    | literal
        { $obj = $literal.literal; }
    | block
        { $obj = $block.block; }
    | LPAREN message_sending RPAREN
        { $obj = $message_sending.messageSending; }
    ;

message_sending returns [ AstKeywordMessageSending messageSending ]
    : keyword_message_sending
        { $messageSending = $keyword_message_sending.keywordMessageSending; }
    ;

keyword_message_sending returns [ AstKeywordMessageSending keywordMessageSending ]
    : binary_message_sending 
        { $keywordMessageSending = new AstKeywordMessageSending($binary_message_sending.binaryMessageSending); }
    ( keyword_message
        { $keywordMessageSending = $keywordMessageSending.NewMessage($keyword_message.keywordMessage); }
    )?
    ;

binary_message_sending returns [ AstBinaryMessageSending binaryMessageSending ]
    : unary_message_sending
        { $binaryMessageSending = new AstBinaryMessageSending($unary_message_sending.unaryMessageSending); }
    ( binary_message
        { $binaryMessageSending = $binaryMessageSending.NewMessage($binary_message.binaryMessage); }
    )*
    ;

unary_message_sending returns [ AstUnaryMessageSending unaryMessageSending ]
    : object 
        { $unaryMessageSending = new AstUnaryMessageSending($object.obj); }
    (
      unary_message
        { $unaryMessageSending = $unaryMessageSending.NewMessage($unary_message.unaryMessage); }
    )*
    ;

unary_message returns [ AstUnaryMessage unaryMessage ]
    : unary_message_selector
        { $unaryMessage = new AstUnaryMessage($unary_message_selector.unarySelector); }
    ;

binary_message returns [ AstBinaryMessage binaryMessage ]
    : binary_message_selector unary_message_sending
        { $binaryMessage = new AstBinaryMessage($binary_message_selector.binarySelector, $unary_message_sending.unaryMessageSending); }
    ;

keyword_message returns [ AstKeywordMessage keywordMessage ]
    : 
    { $keywordMessage = new AstKeywordMessage(); }
    (
      NEW_LINE? 
      single_keyword_message_selector 
      NEW_LINE? 
      binary_message_sending
        { $keywordMessage.AddMessagePart($single_keyword_message_selector.singleKwSelector, $binary_message_sending.binaryMessageSending); }
    )+
    ;

block returns [ AstBlock block ]
    : LBRACKET 
        { $block = new AstBlock($LBRACKET); }
    (
      block_signiture
        { $block.Signiture = $block_signiture.blkSigniture; }
    )? NEW_LINE* 
      block_body
        { $block.Body = $block_body.blkBody; }
      NEW_LINE* 
      RBRACKET
        { $block.SetEndToken($RBRACKET); }
    ;

block_body returns [ IList<AstNode> blkBody ]
    @init { $blkBody = new List<AstNode>(); }
    : 
    ( s1=statement 
        { $blkBody.Add($s1.stmt); }
    )?
    ( NEW_LINE+ s2=statement 
        { $blkBody.Add($s2.stmt); }
    )*
    ;


block_signiture returns [ AstBlockSigniture blkSigniture ]
    @init { $blkSigniture = new AstBlockSigniture(); }
    : 
    ( COLON identifier
        { $blkSigniture.AddIdentifier($COLON, $identifier.identifier); }
    )+ PIPE
        { $blkSigniture.SetEndToken($PIPE); }
    ;

unary_message_selector returns [ AstUnaryMessageSelector unarySelector ]
    : identifier
        { $unarySelector = new AstUnaryMessageSelector($identifier.identifier); }
    ;

binary_message_selector returns [ AstBinaryMessageSelector binarySelector ]
    : BINARY_MESSAGE_CHAR
        { $binarySelector = new AstBinaryMessageSelector($BINARY_MESSAGE_CHAR); }
    ;

single_keyword_message_selector returns [ AstIdentifier singleKwSelector ]
    : identifier COLON
        { $singleKwSelector = $identifier.identifier; }
    ;

keyword_message_selector returns [ AstKeywordMessageSelector keywordSelector ]
    @init { $keywordSelector = new AstKeywordMessageSelector(); }
    : 
    ( single_keyword_message_selector
        { $keywordSelector.AddIdentifier($single_keyword_message_selector.singleKwSelector); }
    )+
    ;

symbol returns [ AstSymbol symbol ]
    : HASH 
    ( string 
        { $symbol = new AstSymbol($HASH, $string.str); }
    | identifier 
        { $symbol = new AstSymbol($HASH, $identifier.identifier); }
    | binary_message_selector 
        { $symbol = new AstSymbol($HASH, $binary_message_selector.binarySelector); }
    | keyword_message_selector
        { $symbol = new AstSymbol($HASH, $keyword_message_selector.keywordSelector); }
    )
    ; 

literal returns [ AstNode literal ]
    : symbol
        { $literal = $symbol.symbol; }
    ( block
        { $literal = new AstMethod($symbol.symbol, $block.block); }
    )? // if there is block then this is method
    ;

number returns [ AstNode number ]
    : /*'-'?*/
    ( INT
        { $number = new AstInt($INT); }
    | FLOAT 
        { $number = new AstInt($FLOAT); }
    )
    ;

string returns [ AstString str ]
    : STRING
        { $str = new AstString($STRING); }
    ;

dotted_expression returns [ AstDottedExpression dottedExpression ]
    : i1=identifier 
        { $dottedExpression = new AstDottedExpression($i1.identifier); }
    (DOT i2=identifier
        { $dottedExpression.AddIdentifier($i2.identifier); }
    )*
    ;

identifier returns [ AstIdentifier identifier ]
    : ID
        { $identifier = new AstIdentifier($ID); }
    ;

1 个答案:

答案 0 :(得分:1)

Hi Smalltalk语法作家,

首先,要使smalltalk语法正确解析(1 - -2)并支持可选的'。'在最后的陈述等等,你应该把空白视为重要的。不要把它放在隐藏的通道上。

到目前为止,语法并没有将规则分解成足够小的片段。这将是一个像你在K = 2和回溯中看到的问题。

我建议你查看一下在Redline Smalltalk项目http://redline.st&amp; Dn定义的ANTLR中运行的Smalltalk语法。 https://github.com/redline-smalltalk/redline-smalltalk

Rgs,James。