当前,我有ANTLR生成的JavaLexer.py和JavaParser.py文件。我的目标是:首先,我想为Java代码生成抽象语法树。生成这些AST之后,我想使用编辑距离度量来检测不同Java代码之间的代码相似性。为此,我决定将ANTLR 4.7.2与Python 3.6一起使用。
现在,我可以使用JavaParser.py进行解析并获取名为compilingUnit()的东西。的代码如下:
source = open(path, "r", encoding="utf-8")
input_stream = InputStream(source.read())
lexer = JavaLexer(input_stream)
token_stream = CommonTokenStream(lexer)
parser = JavaParser(token_stream)
tree = parser.compilationUnit()
print(tree.toStringTree(recog=parser))
输出如下:
(compilationUnit (typeDeclaration (classOrInterfaceModifier public) (classDeclaration class HelloWorld (classBody { (classBodyDeclaration (modifier (classOrInterfaceModifier public)) (modifier (classOrInterfaceModifier static)) (memberDeclaration (methodDeclaration (typeTypeOrVoid void) main (formalParameters ( (formalParameterList (formalParameter (typeType (classOrInterfaceType String) [ ]) (variableDeclaratorId args))) )) (methodBody (block { (blockStatement (statement (expression (expression (expression (primary System)) . out) . (methodCall println ( (expressionList (expression (primary (literal "HelloWorld")))) ))) ;)) }))))) }))) <EOF>)
问题是,使用我当前使用的设置,是否可以将输出解析为树形结构?