ANTLR3实现了Jaccard Similarity来比较两个java的文件

时间:2014-06-19 09:42:59

标签: java similarity

请参阅此代码,这是我所掌握的JCCD API。 ^ _ ^

 BufferedReader in = new BufferedReader(new FileReader(f.getFile()));
    String filePath = f.getNama(); // getName of file
    final Antlr3JavaLexer lexer = new Antlr3JavaLexer();
    lexer.preserveWhitespacesAndComments = false;
    try {
        lexer.setCharStream(new ANTLRReaderStream(in));

    } catch (IOException e) {
        e.printStackTrace();
        return false;
    }


    StringBuilder sbu = new StringBuilder();
    while (true) {
        org.antlr.runtime.Token token = lexer.nextToken();
        if (token.getType() == lexer.EOF) {
           break;
        }
        sbu.append(token.getType());
            System.out.println(token.getType());
    }

它为TestFileOne.java

提供了这样的输出
876116423877916429791644323742916418167432388167444266238816449164291643016743444242877916429791641179164432310329164351674323742916420164432316461643016444426623164616430164444242881644442879010116429164164224143234242[]

和这个TestFileTwo.java

876116423877916429791644323742916418167432388167444266238816449164291643016743444242877916429791641179164432310329164351674323742916420164432316461643016444426623164616430164444242881644442879010116429164164224143234242[]

现在我的问题是,任何人都可以给我一个线索或建议,以实现预期结果的jaccard相似性,例如输出类似于相似性的百分比? 非常感谢你......

0 个答案:

没有答案