我正在尝试使用apache commons的余弦距离类。但它总是返回1.0。我错过了吗?这是我的代码:
public class ComputeDistance {
public static void main(String[] args)throws Exception{
CosineDistance dist = new CosineDistance();
CharSequence c1 = "example text1";
CharSequence c2 = "another file";
System.out.println(dist.apply(c1,c2));
}
}
答案 0 :(得分:1)
CosineDistance
返回1 - cosineSimilarity(leftVector, rightVector)
。 leftVector
和rightVector
是字符的映射和char序列中的出现次数,因此是cosineSimilarity(leftVector, rightVector) = 0
的结果。您可以更改代码以使用您的char序列的字符而不是单词:
public class ComputeDistance {
public static void main(String[] args) throws Exception {
CosineSimilarity dist = new CosineSimilarity();
String c1 = "example text1";
String c2 = "another file";
Map<CharSequence, Integer> leftVector =
Arrays.stream(c1.split(""))
.collect(Collectors.toMap(c -> c, c -> 1, Integer::sum));
Map<CharSequence, Integer> rightVector =
Arrays.stream(c2.split(""))
.collect(Collectors.toMap(c -> c, c -> 1, Integer::sum));
System.out.println(1 - dist.cosineSimilarity(leftVector,rightVector));
}
}