Question

我正在将我的自定义DSL从GoldParser迁移到ANTLR4，但我仍然处于解析步骤，因为它需要花费太多才能完成。与GoldParser中的毫秒范围相比，在34秒内解析了1000行的来源。

这是我用于解析的C＃代码：

var input = new AntlrInputStream(prg);
var lexer = new PCLexer(input);
var tokens = new CommonTokenStream(lexer);
var parser = new PCParser(tokens);       
var tree = parser.programma(); // root rule is "programma"

我怀疑问题出在语法上有很多含糊之处，实际上我之所以决定将它从GoldParser迁移（无法进一步改进它，我意识到重写它要容易得多）在Antlr4中并不关心歧义）。

我的问题是：我有什么办法可以进行毫秒数量级的解析，或者ANTLR4固有的慢速是正常的吗？我是Antlr的新手，我不知道会发生什么。

回归语法，它是一种伪C：

    grammar PC;                                    

    fragment Number : [0-9] ;

    fragment DoubleStringCharacter  : ~["\r\n] ;
    fragment SingleStringCharacter  : ~['\r\n] ;
    fragment DoubleStringCharacterM : ~["] ;  
    fragment SingleStringCharacterM : ~['] ;

    BlockComment : '/*' .*? '*/' -> skip ;
    LineComment  : '//' ~[\r\n]* -> skip ;
    WhiteSpaces  : [\t\u000B\u000C\u0020\u00A0]+ -> skip ; 

    Identifier   : [a-zA-Z_][a-zA-Z0-9_]* ; 
    Quote        : '\''  ;
    DoubleQuote  : '"'   ;
    NullLiteral  : 'null' ;
    BoolLiteral  : 'true' | 'false' ;
    IntLiteral   : (Number)+ ;
    FloatLiteral : (Number)* '.' (Number)+ ;

    StringLiteral    :     DoubleQuote DoubleStringCharacter*   DoubleQuote ;
    StringLiteralJs  :     Quote       SingleStringCharacter*   Quote       ;
    StringLiteralM   : '@' DoubleQuote DoubleStringCharacter*   DoubleQuote ;
    StringLiteralJsM : '@' Quote       SingleStringCharacter*   Quote       ;

    Or_op        : 'or' | '||'  ;
    And_op       : 'and' | '&&' ;
    Not_op       : 'not' | '!'  ;
    Not_eq       : '!=' | '<>'  ;

    programma : interfaccia? dichiarazione* ;

    interfaccia : 'interfaccia' '{' oggettoInterfaccia* '}' ;

    oggettoInterfaccia : Identifier Identifier '{' definizioneProprieta* '}' ;

    definizioneProprieta : Identifier '=' valoreProprieta ';' 
                         | oggettoInterfaccia; 

    valoreProprieta : BoolLiteral | IntLiteral  | FloatLiteral | StringLiteral | StringLiteralM | Identifier ;

    dichiarazione : dichiarazioneReference 
                  | dichiarazioneUsing 
                  | dichiarazioneClass
                  | dichiarazioneFunzione 
                  | dichiarazioneVariabile 
                  ;             

    dichiarazioneReference : 'reference' StringLiteral ';' ;
    dichiarazioneUsing     : 'using' Identifier '=' StringLiteral ';' ;
    dichiarazioneClass     : 'class' Identifier ';' ;

    dichiarazioneFunzione : Identifier Identifier '(' parametri ')' '{' stmList '}' ;

    parametri : parametro (',' parametro)* ;

    parametro : Identifier 
              | Identifier Identifier
              ;       

    dichiarazioneVariabile : Identifier listaVariabili ';' ;                            

    listaVariabili : variabile (',' variabile)* ;

    variabile : Identifier 
              | Identifier '=' exprOrArray
              ;

    stmList : stm* ;

    stm  : blocco
         | dichiarazioneVariabile
         | etichetta
         | istruzioneIf
         | istruzioneWhile
         | istruzioneFor
         | istruzioneDo                              
         | istruzioneGoto
         | istruzioneBreak
         | istruzioneContinue
         | istruzioneReturn
         | expr ';'              
         | assegnamento ';'               
         | ';'          
         | 'ConnectEvent' '(' Identifier ',' Identifier ',' Identifier ')' ';'
         | istruzioneTry
         ;

    blocco : '{' stmList '}' ;

    istruzioneIf : 'if' '(' expr ')' stm ( 'else' stm )? ;

    istruzioneFor : 'for' '(' stm condizioneFor ';' incrementoFor? ')' stm ;
    condizioneFor : expr? ; 
    incrementoFor : expr 
                  | assegnamento 
                  ;

    istruzioneWhile : 'while' '(' expr ')' stm ; 

    istruzioneDo : 'do' stm 'while' '(' expr ')' ;    // TODO si deve aggiungere ';' ?          

    etichetta          : Identifier ':' ;    
    istruzioneGoto     : 'goto' Identifier ';' ;
    istruzioneBreak    : 'break' ';' ;
    istruzioneContinue : 'continue' ';' ;
    istruzioneReturn   : 'return' exprOrArray ';' | 'return' ';' ;
    istruzioneTry      : 'try' blocco 'catch' '(' Identifier ')' blocco ;

    assegnamento : Identifier '=' exprOrArray
                 | Identifier '[' expr ']' '=' exprOrArray
                 | Identifier '.' Identifier '=' exprOrArray
                 ;

    exprOrArray : expr 
                | '{' exprList '}'
                ;

    exprList : exprOrArray ',' exprList
             | exprOrArray
             ;

    expr : expr '+=' expr
         | expr '-=' expr
         | expr '?' expr ':' expr
         | expr Or_op  expr
         | expr And_op expr         
         | expr '==' expr
         | expr Not_eq expr
         | expr '<' expr
         | expr '>' expr
         | expr '<=' expr
         | expr '>=' expr
         | expr 'as' Identifier
         | expr '+' expr
         | expr '-' expr
         | expr '*' expr
         | expr '/' expr
         | expr '%' expr
         | expr Not_op expr
         | '-' expr
         | '+' expr
         | '--' expr
         | '++' expr
         | expr '--'
         | expr '++'
         | expr '[' expr ']' 
         | callFun
         | Identifier '.' Identifier '(' methodParams ')'      
         | Identifier '.' Identifier          
         | Identifier
         | literal
         | '(' expr ')'
         ;

    methodParams : methodParam (',' methodParam)* ;
    methodParam  : exprOrArray ;

    callFun : Identifier '(' methodParams ')'               
            | 'new' Identifier '(' methodParams ')'
            ;        

    literal : NullLiteral
            | BoolLiteral
            | IntLiteral  
            | FloatLiteral 
            | StringLiteral
            | StringLiteralJs
            | StringLiteralM
            | StringLiteralJsM        
            ;

Answer 1

如果您的语法含糊不清，Gold Parser（我的理解：LALR（1））将无法正确解析源文本。 [我假设你忽略了它应该产生关于shift-reduce和reduce-reduce冲突的抱怨？]它会选择其中一个解析。而且，作为LALR（1），它将在线性时间内完成，所以它很快就会出乎意料;这是LALR（1）解析器的关键实用程序。

语法中的歧义经常（并不总是）意味着存在你应该已经消除的解析，但却没有。如果Gold正在解析中，有些是错误的，那么就没有理由相信你得到了正确的解析。

所以，事实上，如果你能用毫秒获得Gold的错误答案，为什么ANTLR得到错误的答案会更慢一些呢？

我建议你删除含糊之处。（作为一个起点，你的表达式子语法对我来说看起来很模糊）。我认为ANTLR会加速＃34;。

ANTLR4 C＃运行时解析速度极慢

1 个答案: