Java中的余弦相似度

时间:2015-03-01 09:13:05

标签: java matrix cosine-similarity

我想计算D等矩阵行的相似度,但结果不正确!!我的代码有什么问题? 在计算矩阵U中行的相似度时,我的确如下所示。 如结果所示,行的相似性介于1.0和-1.0之间,我认为这是错误的!

    {

public void run(String[] args) throws Exception {

        Matrix A = new Matrix(array);

        for(int i = 0; i < A.getRowDimension(); i++)
            System.out.println("similar is : " + cosineSimilarity(i, A));

    }


private ArrayList cosineSimilarity(int rowIndex, Matrix D) {

        double dotProduct = 0.0, firstNorm = 0.0, secondNorm = 0.0;
        double cosinSimilarity;
        ArrayList<Double> similarRows = new ArrayList<>();

        for(int row = 0; row < D.getRowDimension(); row++){
            for (int column = 0; column < D.getColumnDimension(); column++) {
            dotProduct = + (D.get(rowIndex, column) * D.get(row, column));
            firstNorm =  + pow(D.get(rowIndex, column),2);
            secondNorm = + pow(D.get(row, column), 2);
           // Matrix f = D.getMatrix(row, column);
            }
            cosinSimilarity = (dotProduct / (sqrt(firstNorm) * sqrt(secondNorm)));
            similarRows.add(row, cosinSimilarity);
        }
return similarRows;
    }

}

结果是:

A is :    
 0.067174 -0.862994 -0.435024 0.123151 -0.214891 0.011754
 0.502582 -0.205973 0.093513 0.031561 0.821020 0.145506
 0.406919 -0.032555 0.413105 0.623333 -0.246395 -0.462002
 0.394209 0.218539 -0.497640 -0.386091 -0.002859 -0.632551
 0.571882 0.300883 -0.279673 0.132980 -0.354327 0.600810
 0.308004 -0.271047 0.552712 -0.654632 -0.305748 0.064427

similar is : [1.0, 1.0, -1.0, -1.0, 1.0, 1.0]
similar is : [1.0, 1.0, -1.0, -1.0, 1.0, 1.0]
similar is : [-1.0, -1.0, 1.0, 1.0, -1.0, -1.0]
similar is : [-1.0, -1.0, 1.0, 1.0, -1.0, -1.0]
similar is : [1.0, 1.0, -1.0, -1.0, 1.0, 1.0]
similar is : [1.0, 1.0, -1.0, -1.0, 1.0, 1.0]

1 个答案:

答案 0 :(得分:2)

您想要计算给定行与Matrix中每行之间的相似性。因此,必须计算内积和范数getRowDimension次。

但是初始化是在错误的地方 - 将它们移动到所有行的循环中。

你想使用+ =而不是= +!

private ArrayList cosineSimilarity(int rowIndex, Matrix D) {
    ArrayList<Double> similarRows = new ArrayList<>();

    for(int row = 0; row < D.getRowDimension(); row++){
        double dotProduct = 0.0, firstNorm = 0.0, secondNorm = 0.0;
        for (int column = 0; column < D.getColumnDimension(); column++) {
        dotProduct += (D.get(rowIndex, column) * D.get(row, column));
        firstNorm += pow(D.get(rowIndex, column),2);
        secondNorm += pow(D.get(row, column), 2);
       // Matrix f = D.getMatrix(row, column);
        }
        double cosinSimilarity = (dotProduct / (sqrt(firstNorm) * sqrt(secondNorm)));
        similarRows.add(row, cosinSimilarity);
    }