一旦语法完成,走ANTLR v4树的最佳方法是什么?

时间:2013-02-24 08:52:27

标签: coldfusion antlr antlr4

目标

我正在开发一个为Coldfusion CFscript创建Varscoper的项目。基本上,这意味着检查源代码文件以确保开发人员正确地var了他们的变量。

使用ANTLR V4几天后,我有一个语法,在GUI视图中生成一个非常好的解析树。现在,使用该树,我需要一种方法来以编程方式在节点上爬行和寻找变量声明,并确保如果它们在函数内部,则它们具有适当的范围。如果可能的话,我宁愿不在语法文件中这样做,因为这需要将语言的定义与此特定任务混合。

我尝试了什么

我最近的尝试是使用ParserRuleContext并尝试通过children审核getPayload()。检查getPayLoad()的类后,我会有ParserRuleContext个对象或Token个对象。不幸的是,使用它我永远无法找到获取特定节点的实际规则类型的方法,只有它包含文本。每个节点的规则类型都是必需的,因为该文本节点是否是被忽略的右手表达式,变量赋值或函数声明都很重要。

问题

  1. 我是ANTLR的新手,这是正确的方法,还是有更好的方法来遍历树?
  2. 这是我的示例java代码:

    Cfscript.java

    import org.antlr.v4.runtime.*;
    import org.antlr.v4.runtime.tree.Trees;
    
    public class Cfscript {
        public static void main(String[] args) throws Exception {
            ANTLRInputStream input = new ANTLRFileStream(args[0]);
            CfscriptLexer lexer = new CfscriptLexer(input);
            CommonTokenStream tokens = new CommonTokenStream(lexer);
            CfscriptParser parser = new CfscriptParser(tokens);
            parser.setBuildParseTree(true);
            ParserRuleContext tree = parser.component();
            tree.inspect(parser); // show in gui
            /*
                Recursively go though tree finding function declarations and ensuring all variableDeclarations are varred
                but how?
            */
        }
    }
    

    Cfscript.g4

    grammar Cfscript;
    
    component
        : 'component' keyValue* '{' componentBody '}'
        ;
    
    componentBody
        : (componentElement)*
        ;
    
    componentElement
        : statement
        | functionDeclaration
        ;
    
    functionDeclaration
        : Identifier? Identifier? 'function' Identifier argumentsDefinition '{' functionBody '}'
        ;
    
    argumentsDefinition
        : '(' argumentDefinition (',' argumentDefinition)* ')'
        | '()'
        ;
    
    argumentDefinition
        : Identifier? Identifier? argumentName ('=' expression)?
        ; 
    
    argumentName
        : Identifier
        ;
    
    functionBody
        : (statement)*
        ;
    
    statement
        : variableStatement
        | nonVarVariableStatement
        | expressionStatement
        ;
    
    variableStatement
        : 'var' variableName '=' expression ';'
        ;
    
    nonVarVariableStatement
        : variableName '=' expression ';'
        ;
    
    expressionStatement
        : expression ';'
        ;
    
    expression
        : assignmentExpression
        | arrayLiteral
        | objectLiteral
        | StringLiteral
        | incrementExpression
        | decrementExpression
        | 'true' 
        | 'false'
        | Identifier
        ;
    
    incrementExpression
        : variableName '++'
        ;
    
    decrementExpression
        : variableName '--'
        ;
    
    assignmentExpression
        : Identifier (assignmentExpressionSuffix)*
        | assignmentExpression (('+'|'-'|'/'|'*') assignmentExpression)+
        ;
    
    assignmentExpressionSuffix
        : '.' assignmentExpression
        | ArrayIndex
        | ('()' | '(' expression (',' expression)* ')' )
        ;
    
    methodCall
        : Identifier ('()' | '(' expression (',' expression)* ')' )
        ;
    
    variableName
        : Identifier (variableSuffix)*
        ;
    
    variableSuffix
        : ArrayIndex
        | '.' variableName
        ;
    
    arrayLiteral
        : '[' expression (',' expression)* ']'
        ;
    
    objectLiteral
        : '{' (Identifier '=' expression (',' Identifier '=' expression)*)? '}'
        ;
    
    keyValue
        : Identifier '=' StringLiteral
        ;
    
    StringLiteral
        :  '"' (~('\\'|'"'))* '"'
        ;
    
     ArrayIndex
        : '[' [1-9] [0-9]* ']'
        | '[' StringLiteral ']'
        ;
    
    Identifier
        : [a-zA-Z0-9]+
        ;
    
    WS
        : [ \t\r\n]+ -> skip 
        ;
    
    COMMENT 
        : '/*' .*? '*/'  -> skip
        ;
    

    Test.cfc(测试代码文件)

    component something = "foo" another = "more" persistent = "true" datasource = "#application.env.dsn#" {
        var method = something.foo.test1;
        testing = something.foo[10];
        testingagain = something.foo["this is a test"];
        nuts["testing"]++;
        blah.test().test3["test"]();
    
        var math = 1 + 2 - blah.test().test4["test"];
    
        var test = something;
        var testing = somethingelse;
        var testing = { 
            test = more, 
            mystuff = { 
                interior = test 
            },
            third = "third key"
        };
        other = "Idunno homie";
        methodCall(interiorMethod());
    
        public function bar() {
            var new = "somebody i used to know";
            something = [1, 2, 3];
        }
    
        function nuts(required string test1 = "first", string test = "second", test3 = "third") {
    
        }
    
        private boolean function baz() {
            var this = "something else";
        }
    }
    

1 个答案:

答案 0 :(得分:38)

如果我是你,我不会手动走这个。在生成词法分析器和解析器之后,ANTLR还会生成一个名为CfscriptBaseListener的文件,该文件具有适用于所有解析器规则的空方法。您可以让ANTLR遍历您的树并附加一个自定义树监听器,您只能覆盖您感兴趣的那些方法/规则。

在您的情况下,您可能希望在创建新函数时通知(创建新范围),并且您可能对变量赋值(variableStatementnonVarVariableStatement)感兴趣。当你在ANTLR走树时,你的调用者VarListener将跟踪所有范围。

我确实略微更改了1条规则(我添加了objectLiteralEntry):

objectLiteral
    : '{' (objectLiteralEntry (',' objectLiteralEntry)*)? '}'
    ;

objectLiteralEntry
    : Identifier '=' expression
    ;
    

在以下演示中使生活更轻松:

VarListener.java

public class VarListener extends CfscriptBaseListener {

    private Stack<Scope> scopes;

    public VarListener() {
        scopes = new Stack<Scope>();
        scopes.push(new Scope(null));
    } 

    @Override
    public void enterVariableStatement(CfscriptParser.VariableStatementContext ctx) {
        String varName = ctx.variableName().getText();
        Scope scope = scopes.peek();
        scope.add(varName);
    }

    @Override
    public void enterNonVarVariableStatement(CfscriptParser.NonVarVariableStatementContext ctx) {
        String varName = ctx.variableName().getText();
        checkVarName(varName);
    }

    @Override
    public void enterObjectLiteralEntry(CfscriptParser.ObjectLiteralEntryContext ctx) {
        String varName = ctx.Identifier().getText();
        checkVarName(varName);
    }

    @Override
    public void enterFunctionDeclaration(CfscriptParser.FunctionDeclarationContext ctx) {
        scopes.push(new Scope(scopes.peek()));
    }

    @Override
    public void exitFunctionDeclaration(CfscriptParser.FunctionDeclarationContext ctx) {
        scopes.pop();        
    }

    private void checkVarName(String varName) {
        Scope scope = scopes.peek();
        if(scope.inScope(varName)) {
            System.out.println("OK   : " + varName);
        }
        else {
            System.out.println("Oops : " + varName);
        }
    }
}

Scope对象可以简单如下:

Scope.java

class Scope extends HashSet<String> {

    final Scope parent;

    public Scope(Scope parent) {
        this.parent = parent;
    }

    boolean inScope(String varName) {
        if(super.contains(varName)) {
            return true;
        }
        return parent == null ? false : parent.inScope(varName);
    }
}

现在,为了测试这一切,这里有一个小主要类:

Main.java

import org.antlr.v4.runtime.*;
import org.antlr.v4.runtime.tree.*;

public class Main {

    public static void main(String[] args) throws Exception {

        CfscriptLexer lexer = new CfscriptLexer(new ANTLRFileStream("Test.cfc"));
        CfscriptParser parser = new CfscriptParser(new CommonTokenStream(lexer));
        ParseTree tree = parser.component();
        ParseTreeWalker.DEFAULT.walk(new VarListener(), tree);
    }
}

如果您运行此Main课程,将打印以下内容:

Oops : testing
Oops : testingagain
OK   : test
Oops : mystuff
Oops : interior
Oops : third
Oops : other
Oops : something

毫无疑问,这并不是你想要的,我可能会讨论一些Coldfusion的范围规则。但我认为这将为您提供一些如何正确解决问题的见解。我认为代码是非常自我解释的,但如果不是这样,请不要犹豫要求澄清。

HTH