Question

我有两个列表。短语列表包含单词和短语，而检查列表仅包含单词。我想确认检查表的任何成员是否是短语列表中的一部分或短语。我将根据其在phrase列表中的成员身份为check列表中的每个字符串赋予一个分数。

在下面的示例中，information retrieval产生0.5，因为check列表中只有两个词（“信息”）。另一方面，wave transformation会产生1，因为“ {wave}”和“ transformation”都出现在check中。

我一直在寻找方法，但是都没有成功

score = []
phrase = ['information retrieval', 'wave transformation', 'information', 'services', 'gold coast village']
check = ['information', 'wave', 'transformation', 'village', 'services']

我希望分数列表包含短语列表中每个成员的分数。

phrase = ['information retrieval', 'wave transformation', 'information', 'services', 'gold coast village']
score = [0.5, 1, 1, 1, 0.33]

Answer 1

尝试一下：

phrase = list(map(str.split, phrase))
score = [len(set(check).intersection(k))/len(k) for k in phrase]

输出：

[0.5, 1.0, 1.0, 1.0, 0.3333333333333333]

Answer 2

[sum(word in check for word in elem.split()) / len(elem.split()) for elem in phrase]

会返回：

[0.5, 1.0, 1.0, 1.0, 0.3333333333333333]

Answer 3

您可以编写一个计分功能来累积分数并返回：

def scoring(phrase, check):
    scores = []
    for block in phrase:
        tokens = block.split()
        score = 0
        for word in check:
            if word in tokens:
                score += 1 / len(tokens)
        scores.append(score)
    return scores

score = []
phrase = ['information retrieval', 'wave transformation', 'information', 'services', 'gold coast village']
check = ['information', 'wave', 'transformation', 'village', 'services']

# score = [0.5, 1, 1, 1, 0.33]

scoring(phrase, check)

输出：

[0.5, 1.0, 1.0, 1.0, 0.3333333333333333]

Answer 4

您可以使用统计模块获取分数，因为短语中每个单词的计数都为1或0，具体取决于检查列表中是否存在该单词：

import statistics as stats
score = [stats.mean(w in check for w in p.split()) for p in phrase]

为使操作更快，您应该将check定义为集合而不是列表。

有没有一种方法可以对一个成员位于另一个列表中的列表进行评分？

4 个答案:

输出：