如何用Antlr4解析诸如Apache Velocity之类的语言?

时间:2019-02-24 00:44:18

标签: apache parsing antlr antlr4 velocity

我正在自己研究语法apache velocity,遇到一个问题,我既不能检测正常文本,也不能检测标记。

我在消息的第一行收到此消息。

line 1:0 extraneous input '// ${Name}.java' expecting {BREAK, FOREACH, IF, INCLUDE, PARSE, SET, STOP, '#[[', RAW_TEXT, '$'}

输入'// ${Name}.Java'应该标记为RAW_TEXT '$' '{' IDENTIFIER '}' RAW_TEXT。解析器规则应为rawText reference rawText。这些解析器规则是语句。

这是我的源文件。在这种情况下,它是一个Java模板,但源文件也可能是html模板,如apache velocity用户指南中所述。

// ${Name}.java
#foreach ( $vertice in $Vertices )
#if ( $vertice.Type == "Class" )
public class $vertice.Name {
    #foreach ( $edge in $Edges )
    #if ( $edge.from == $vertice.Name)
    // From $edge.from to $edge.to
    private $edge.to $edge.to.toLowerCase();

    public $edge.to get{$edge.to}() {
        return this.${edge.to.toLowerCase()};
    }

    public void set${edge.to}(${edge.to} new${edge.to}) {
        $edge.to old${edge.to} = this.${edge.to.toLowerCase()};
        if (old${edge.to} != new${edge.to}) {
            if (old${edge.to} != null) {
                this.${edge.to.toLowerCase()} = null;
                old${edge.to}.set${edge.from}(null);
            }
            this.${edge.to.toLowerCase()} = new${edge.to};
            if (new${edge.to} != null) {
                new${edge.to}.set${edge.from}(this);
            }
        }
    }

    public $edge.from with${edge.to}(${edge.to} new${edge.to}) {
        this.set${edge.to}(new${edge.to});
        return this;
    }
    #end
    #end
}
#end
#end

这是我的语法。

grammar Velocity;

/* -- Parser Rules --- */

/*
 * Start Rule
 */

template
    : statementSet EOF?
    ;

/*
 * Statements
 */

statementSet
    : statement+
    ;

statement
    : rawText # RawTextStatement
    | unparsed # UnparsedStatement
    | reference # ReferenceStatement
    | setDirective # SetStatement
    | ifDirective # IfStatement
    | foreachDirective # ForeachStatement
    | includeDirective # IncludeStatement
    | parseDirective # ParseStatement
    | breakDirective # BreakStatement
    | stopDirective # StopStatement
    ;

rawText
    : RAW_TEXT
    ;

unparsed
    : UNPARSED UnparsedText=(TEXT | NL)* UNPARSED_END
    ;

setDirective
    : SET '(' assignment ')'
    ;

ifDirective
    : ifPart (elseifPart)* (elsePart)? END
    ;

foreachDirective
    : FOREACH '(' variableReference 'in' enumerable ')' statementSet END
    ;

includeDirective
    : INCLUDE '(' stringValue (',' stringValue)* ')'
    ;

parseDirective
    : PARSE '(' stringValue ')'
    ;

breakDirective
    : BREAK
    ;

stopDirective
    : STOP
    ;

/*
 * Expressions
 */

assignment
    : assignableReference '=' expression
    ;

expression
    : reference # ReferenceExpression
    | string # StringLiteralExpression
    | NUMBER # NumberLiteralExpression
    | array # ArrayExpression
    | map # MapExpression
    | range # RangeExpression
    | arithmeticOperation # ArithmeticOperationExpression
    | booleanOperation # BooleanOperationExpression
    ;

enumerable
    : array
    | map
    | range
    | reference
    ;

stringValue
    : string # StringValue_String
    | reference # StringValue_Reference
    ;

/*
 * References
 */

reference
    : DOLLAR Quiet='!'? (referenceType | '{' referenceType '}')
    ;

assignableReference
    : DOLLAR Quiet='!'? (assignableReferenceType | '{' assignableReferenceType '}')
    ;

referenceType
    : assignableReferenceType # ReferenceType_AssignableReferenceType
    | methodReference # ReferenceType_MethodReference
    ;

assignableReferenceType
    : variableReference # AssignableReferenceType_VariableReference
    | propertyReference # AssignableReferenceType_PropertyReference
    ;

variableReference
    : IDENTIFIER indexNotation?
    ;

propertyReference
    : IDENTIFIER ('.' IDENTIFIER)+ indexNotation?
    ;

methodReference
    : IDENTIFIER ('.' IDENTIFIER)* '.' IDENTIFIER '(' (expression (',' expression)*)? ')' indexNotation?
    ;

indexNotation
    : '[' NUMBER ']' # IndexNotation_Number
    | '[' reference ']' # IndexNotation_Reference
    | '[' string ']' # IndexNotation_String
    ;

/*
 * Parsed Types
 */

string
    : '"' stringText* '"' # DoubleQuotedString
    | '\'' TEXT? '\'' # SingleQuotedString
    ;

stringText
    : TEXT # StringText_Text
    | reference # StringText_Reference
    ;

/*
 * Container Types
 */

array
    : '[' (expression (',' expression)*)? ']'
    ;

map
    : '{' (expression ':' expression (',' expression ':' expression))? '}'
    ;

range
    : '[' n=NUMBER '..' m=NUMBER ']'
    ;

/*
 * Arithmetic Operators
 */

arithmeticOperation
    : sum
    ;

sum
    : term (followingTerm)*
    ;

followingTerm
    : Operator=('+' | '-') term
    ;

term
    : factor (followingFactor)*
    ;

followingFactor
    : Operator=('*' | '/' | '%') factor
    ;

factor
    : NUMBER # Factor_Number
    | reference # Factor_Reference
    | '(' arithmeticOperation ')' # Factor_InnerArithmeticOperation
    ;

/*
 * Boolean Operators
 */

booleanOperation
    : disjunction
    ;

disjunction
    : conjunction (followingConjunction)*
    ;

followingConjunction
    : Operator=OR conjunction
    ;

conjunction
    : booleanComparison (followingBooleanComparison)*
    ;

followingBooleanComparison
    : Operator=AND booleanComparison
    ;

booleanComparison
    : booleanFactor (followingBooleanFactor)*
    ;

followingBooleanFactor
    : Operator=(EQUALS | NOT_EQUALS) booleanFactor
    ;

booleanFactor
    : BOOLEAN # BooleanFactor_Boolean
    | reference # BooleanFactor_Reference
    | negation # BooleanFactor_Negation
    | arithmeticComparison # BooleanFactor_ArithmeticComparison
    | '(' booleanOperation ')' # BooleanFactor_InnerBooleanOperation
    ;

arithmeticComparison
    : LeftHandSide=arithmeticOperation Operator=(EQUALS | NOT_EQUALS | GREATER_THAN | GREATER_THAN_OR_EQUAL_TO | LESS_THAN | LESS_THAN_OR_EQUAL_TO) RightHandSide=arithmeticOperation
    ;

negation
    : NOT booleanFactor
    ;

/*
 * Conditionals
 */

ifPart
    : IF '(' booleanOperation ')' statementSet
    ;

elseifPart
    : ELSEIF '(' booleanOperation ')' statementSet
    ;

elsePart
    : ELSE statementSet
    ;

/* --- Lexer Rules --- */

/*
 * Comments
 */

SINGLE_LINE_COMMENT
    : '##' TEXT? NL -> skip
    ;

MULTI_LINE_COMMENT
    : '#*' (TEXT | NL)* '*#' -> skip
    ;

COMMENT_BLOCK
    : '#**' (TEXT | NL)* '*#' -> skip
    ;

/*
 * Directives
 */

BREAK
    : '#break'
    | '#{break}'
    ;

DEFINE
    : '#define'
    | '#{define}'
    ;

ELSE
    : '#else'
    | '#{else}'
    ;

ELSEIF
    : '#elseif'
    | '#{elseif}'
    ;

END
    : '#end'
    | '#{end}'
    ;

EVALUATE
    : '#evaluate'
    | '#{evaluate}'
    ;

FOREACH
    : '#foreach'
    | '#{foreach}'
    ;

IF
    : '#if'
    | '#{if}'
    ;

INCLUDE
    : '#include'
    | '#{include}'
    ;

MACRO
    : '#macro'
    | '#{macro}'
    ;

PARSE
    : '#parse'
    | '#{parse}'
    ;

SET
    : '#set'
    | '#{set}'
    ;

STOP
    : '#stop'
    | '#{stop}'
    ;

UNPARSED
    : '#[['
    ;

UNPARSED_END
    : ']]#'
    ;

/*
 * Identifier
 */

DOLLAR
    : '$' -> more
    ;

IDENTIFIER
    : CHARACTER+ (CHARACTER | INTEGER | HYPHEN | UNDERSCORE)*
    ;

/*
 * Boolean Values
 */

TRUE
    : 'true'
    ;

FALSE
    : 'false'
    ;

/*
 * Boolean Operators
 */

EQUALS
    : '=='
    | 'eq'
    ;

NOT_EQUALS
    : '!='
    | 'ne'
    ;

GREATER_THAN
    : '>'
    | 'gt'
    ;

GREATER_THAN_OR_EQUAL_TO
    : '>='
    | 'ge'
    ;

LESS_THAN
    : '<'
    | 'lt'
    ;

LESS_THAN_OR_EQUAL_TO
    : '<='
    | 'le'
    ;

OR
    : '||'
    ;

AND
    : '&&'
    ;

NOT
    : '!'
    | 'not'
    ;

/*
 * Literals
 */

BOOLEAN
    : TRUE
    | FALSE
    ;

NUMBER
    : '-'? INTEGER
    | '-'? INTEGER '.' INTEGER
    ;

/*
 * Content
 */

RAW_TEXT
    : ~[*#$]+
    ;

TEXT
    : (ESC | SAFE_CODE_POINT)+
    ;

fragment ESC
    : '\\' (["\\/#$!bftrn] | UNICODE)
    ;

fragment UNICODE
    : 'u' HEX HEX HEX HEX
    ;

fragment HEX
    : [0-9a-fA-F]
    ;

fragment SAFE_CODE_POINT
    : ~["\\\u0000-\u001F]
    ;

/*
 * Atomic elements
 */

CHARACTER
    : [a-zA-Z]+
    ;

INTEGER
    : [0-9]+
    ;

HYPHEN
    : '-'
    ;

UNDERSCORE
    : '_'
    ;

NL
    : '\r'
    | '\n'
    | '\r\n'
    ;

WS
    : ('\t' | ' ' | '\r' | '\n' | '\r\n')+ -> skip
    ;

我在这里缺少什么细节?实际解析速度代码需要做什么?

最好的问候

更新

我更改了这些词法分析器规则。

DOLLAR
    : '$'
    ;

RAW_TEXT
    : ~[*#$]*
    ;

TEXT
    : (ESC | SAFE_CODE_POINT)*?
    ;

fragment SAFE_CODE_POINT
    : ~[$"\\\u0000-\u001F]
    ;

现在我收到此消息。

[0] line 1:4 mismatched input '{Name}.java\r\n' expecting {'!', '{', IDENTIFIER}
[0] line 2:8 mismatched input ' ( ' expecting '('
[0] line 2:12 mismatched input 'vertice in ' expecting {'!', '{', IDENTIFIER}
[0] line 2:24 mismatched input 'Vertices )\r\n' expecting {'!', '{', IDENTIFIER}
[0] line 3:3 mismatched input ' ( ' expecting '('
[0] line 3:7 mismatched input 'vertice.Type == "Class" )\r\npublic class ' expecting {'!', '{', IDENTIFIER}
[0] line 4:14 mismatched input 'vertice.Name {\r\n\t' expecting {'!', '{', IDENTIFIER}
[0] line 5:9 mismatched input ' ( ' expecting '('
[0] line 5:13 mismatched input 'edge in ' expecting {'!', '{', IDENTIFIER}
[0] line 5:22 mismatched input 'Edges )\r\n\t' expecting {'!', '{', IDENTIFIER}
[0] line 6:4 mismatched input ' ( ' expecting '('
[0] line 6:8 mismatched input 'edge.from == ' expecting {'!', '{', IDENTIFIER}
[0] line 6:22 mismatched input 'vertice.Name)\r\n\t' expecting {'!', '{', IDENTIFIER}

它有帮助,但是词法分析器仍在窃取$符号,为什么在输入以'{'字符开头时却期望输入'{'字符?我将看这个问题。

0 个答案:

没有答案