过了一会儿,我成功地使用ANTLR从文件.java中获取了一个唯一的id。然后,由于ANTLR,我使用N-gram将该唯一ID划分为4-gram。这是我的代码:
public void runAlgoritma(File mainFile, List<String> fileJlist)
BufferedReader in = null;
try {
in = new BufferedReader(new FileReader(FileUtama.getAbsolutePath()));
} catch (FileNotFoundException e1) {
e1.printStackTrace();
}
final Antlr3JavaLexer lexer = new Antlr3JavaLexer();
lexer.preserveWhitespacesAndComments = false;
try {
lexer.setCharStream(new ANTLRReaderStream(in));
} catch (IOException e) {
e.printStackTrace();
}
final CommonTokenStream tokens = new CommonTokenStream();
tokens.setTokenSource(lexer);
tokens.LT(10); // paksa force load
Antlr3JavaParser parser = new Antlr3JavaParser(tokens);
StringBuilder sbr = new StringBuilder();
List tokenList = tokens.getTokens();
for (int i = 0; i < tokenList.size(); i++) {
org.antlr.runtime.Token token = (org.antlr.runtime.Token) tokenList.get(i);
int text = token.getType();
sbr.append(text);
}
String mainFile = sbr.toString();
StringBuffer stringBuffer = new StringBuffer();
for (String term : new NgramAnalyzer(4).analyzer(mainFile)) {
stringBuffer.append(term + "\n");
}
System.out.println(stringBuffer);
我想知道,如何使用我所制作的n-gram中的jaccard similiar来比较两个java源代码?