实现Jaccard距离到ANTLR以找到java代码的相似性

时间:2014-06-24 12:11:41

标签: java

过了一会儿,我成功地使用ANTLR从文件.java中获取了一个唯一的id。然后,由于ANTLR,我使用N-gram将该唯一ID划分为4-gram。这是我的代码:

public void runAlgoritma(File mainFile, List<String> fileJlist)
 BufferedReader in = null;
 try {
     in = new BufferedReader(new FileReader(FileUtama.getAbsolutePath()));
  } catch (FileNotFoundException e1) {
   e1.printStackTrace();
  }
 final Antlr3JavaLexer lexer = new Antlr3JavaLexer();
 lexer.preserveWhitespacesAndComments = false;

 try {
   lexer.setCharStream(new ANTLRReaderStream(in));
   } catch (IOException e) {
    e.printStackTrace();
   }

 final CommonTokenStream tokens = new CommonTokenStream();
 tokens.setTokenSource(lexer);
    tokens.LT(10); // paksa force load

    Antlr3JavaParser parser = new Antlr3JavaParser(tokens);

    StringBuilder sbr = new StringBuilder();
    List tokenList = tokens.getTokens();
    for (int i = 0; i < tokenList.size(); i++) {          
        org.antlr.runtime.Token token = (org.antlr.runtime.Token) tokenList.get(i);
        int text = token.getType();
        sbr.append(text);
    }


    String mainFile = sbr.toString();
    StringBuffer stringBuffer = new StringBuffer();
    for (String term : new NgramAnalyzer(4).analyzer(mainFile)) {

        stringBuffer.append(term + "\n");

    }
    System.out.println(stringBuffer);

我想知道,如何使用我所制作的n-gram中的jaccard similiar来比较两个java源代码?

0 个答案:

没有答案