如何计算文本中每对句子之间的距离

时间:2019-01-02 12:58:50

标签: python nlp

我正在计算句子之间的levenshtein距离,现在我有一个包含几个句子的文本。我不知道如何编写for循环来生成每对句子之间的距离。

sent = ['motrin 400-600 mg every 8 hour as need for pai . ', 'the depression : continue escitalopram ; assess need to change medication as an outpatient . ', 'Blood cltures from 11-30 grow KLEBSIELLA PNEUMONIAE and 12-01 grow KLEBSIELLA PNEUMONIAE and PROTEUS MIRABILIS both sensitive to the Meropenam which she have already be receive . ']

def similarity(sent):
    feature_sim = []
    for a,b in sent:
            feature_sim[a,b] = pylev.levenshtein(a,b) 
        print (feature_sim)

1 个答案:

答案 0 :(得分:1)

  

使用一对嵌套的for循环。

最简单的版本:

for a in sent:
    for b in sent:
        ...

跳过相同的对(Levenshtein距离通常为0):

for a in sent:
    for b in sent:
        if a != b:
           ...

避免处理重复对(a, bb, a相同):

for i in range(0, len(sent)):
    for j in range(i+1, len(sent)):
        # a = sent[i], b = sent[j]
        ...
  

问题:feature_sim是一个列表,只能用整数索引,不能用字符串或任何其他类型索引。

改为使用词典

feature_sim = {}
for i in range(0, len(sent)):
    for j in range(i+1, len(sent)):
        feature_sim[(sent[i], sent[j])] = pylev.levenshtein(sent[i], sent[j])