我想使用word2vec Google新闻语料库找到两个不同长度的句子之间的余弦相似度。这种方法使我可以找到两个长度相同的句子之间的余弦相似度,但是当长度不同时会抛出错误。
@RestController
public class word2vecsentence {
@Autowired
Word2VecModel wordVector;
@RequestMapping(value="/sentsimilarity",method=RequestMethod.POST)
public double cosineSimForSentence(@RequestParam("sent1") String sentence1,
@RequestParam("sent2")String sentence2){
Collection<String> label1 = Splitter.on(' ').splitToList(sentence1);
Collection<String> label2 = Splitter.on(' ').splitToList(sentence2);
WordVectors vector = wordVector.getModel();
double consin = 0;
try{
INDArray array1 = vector.getWordVectorsMean(label1);
System.out.println(array1);
INDArray array2 = vector.getWordVectorsMean(label2);
System.out.println(array2);
consin = Transforms.cosineSim(array1, array2);
return consin;
}catch(Exception e){
e.printStackTrace();
return consin;
}
}}
有人可以帮助我解决这个问题吗?