Question

我遇到了以下问题：

我们说我有这个给定的字符串：

A = "Today, the weather is fine. The sun is shining and we'd love to go swimming. Since the water is cold, we only walk around. We love weekends."

A中的单词集将描述我的单词向量基础。

我们说我们有另一组词我想计算余弦相似度：

B = "Friday, morning, tomorrow, today, sun, moon, fun, swim"

然后，我使用逐点互信息计算向量中要素的权重。（让我们假设它们被给予了。）

如何计算B中单词之间的相似度？结果应该是BxB-Matrix。

为了计算cos相似度，我已经这样做了：

def counter_cosine_similarity(c1, c2):
    terms = set(c1).union(c2)
    dotprod = sum(c1.get(k, 0) * c2.get(k, 0) for k in terms)
    magA = math.sqrt(sum(c1.get(k, 0)**2 for k in terms))
    magB = math.sqrt(sum(c2.get(k, 0)**2 for k in terms))
    return dotprod / (magA * magB)

其中c1，c2是Counter-objects。

但是：关于cos计算我的功能是正确的。但是，如何计算B中每个单词的相似性？使用给定的解决方案，我只能计算整个字符串/列表与另一个字符串/列表的相似性。

非常感谢您的帮助！

单词

0 个答案: