java中2 arraylist的余弦相似度

时间:2016-10-27 13:18:38

标签: java casting double information-retrieval dot-product

我有2个包含术语权重的文件,我的目标是计算余弦相似度 cos = dotproduct(weight1,weights2)/ euclidianDistance(weight1)* euclidianDistance(weight2));

这是我的代码:

import java.io.*;
import java.util.*;

public class tp5
{
    private static BufferedReader br1;
    private static BufferedReader br2;

    public static double getSimilarity(File file1, File file2)
        throws IOException
    {
        br1 = new BufferedReader(new FileReader(file1));
        String line1;
        line1 = br1.readLine();
        ArrayList<String> words1 = new ArrayList<String>();
        for (String word : line1.split(" ")) {
            words1.add(word);
        }

        br2 = new BufferedReader(new FileReader(file2));
        String line2;
        line2 = br2.readLine();
        ArrayList<String> words2 = new ArrayList<String>();
        for (String word : line2.split(" ")) {
            words2.add(word);
        }

        int i;
        int j;
        int k;

        //  Double [] temp = null;
        Double DotProduct = (double) 0 ;
        Double euclid1 = (double) 0;
        Double euclid2 = (double) 0; 

        for (j = 0; j < words1.size(); j++) {
            DotProduct += Double.parseDouble(words1.get(j)) * Double.parseDouble(words2.get(j));
        }

        for (i = 0; i < words1.size(); i++) {
            euclid1 = Math.pow(Double.parseDouble(words1.get(i)), Double.parseDouble(words1.get(i)));
        }

        euclid1 = Math.sqrt(euclid1);

        for (k = 0; k < words1.size(); k++) {
            euclid2 = Math.pow(Double.parseDouble(words2.get(k)), Double.parseDouble(words2.get(k)));
        }

        euclid2 = Math.sqrt(euclid2);

        return DotProduct / (euclid1 * euclid2);
    }

    public static void main(String[] args)
        throws IOException
    {
        File file1 = new File("texte.95-1.poids");
        File file2 = new File("texte.95-2.poids");

        System.out.println(getSimilarity(file1, file2));
    }
}

我的重量可能是这样的问题,例如重量= 0.750305594399894

我在Double.parseDouble

中有错误
Exception in thread "main" java.lang.NumberFormatException: For input string: ""    0.750305594399894"
    at sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:2043)
    at sun.misc.FloatingDecimal.parseDouble(FloatingDecimal.java:110)
    at java.lang.Double.parseDouble(Double.java:538)

解决方案是什么?

2 个答案:

答案 0 :(得分:0)

当您尝试将String解析为数字时,会发生抛出的异常Route::controller,但该数字是平滑的。可能是因为逗号(尝试点),因为它是一个空字符串,或者因为有一个字母。

我希望我有所帮助。

祝你有个愉快的一天。 :)

答案 1 :(得分:0)

刚刚使用了Double.valueOf(字符串编号),并且您的测试用例没有问题。