Question

我有2个清单。实际和预测。我需要比较两个列表并确定模糊匹配的数量。我说模糊匹配的原因是因为它们不会完全相同。我正在使用difflib库中的SequenceMatcher。

def similar(a, b):
    return SequenceMatcher(None, a, b).ratio()

我可以假设百分比匹配超过80％的字符串被认为是相同的。示例列表

actual=[ "Appl", "Orange", "Ornge", "Peace"]
predicted=["Red", "Apple", "Green", "Peace", "Orange"]

我需要一种方法来确定预测列表中的Apple，Peace和Orange已在实际列表中找到。所以只有3场比赛，而不是5场比赛。我该如何有效地做到这一点？

Answer 1

如果模糊匹配确实是您正在寻找的，您可以使用以下设置理解来使用similar方法获得所需的输出。

threshold = 0.8
result = {x for x in predicted for y in actual if similar(x, y) > threshold}

Answer 2

您可以将两个列表都设置为集并在其上应用交集。

这将为您提供三项override func prepare(for segue: UIStoryboardSegue, sender: Any?) { if segue.identifier == "segue" { let destinationVC = segue.destination as! GameViewController destinationVC.playerOneName = playerOneName destinationVC.playerTwoName = playerTwoName } }。

然后，您可以计算结果集len中与实际列表len的比率。

{'Peace', 'Apple', 'Orange'}

修改

为了使用比率，您需要实现嵌套循环。由于set是作为哈希表实现的，所以搜索是O（1），我宁愿使用实际的集合。

如果预测是在实际（完全匹配）中，那么只需将其添加到结果集中即可。（最好的情况是所有这些，最终的复杂性是O（n））。

如果预测不是实际的，则循环实际并查找是否存在超过0.8的比率。（最坏的情况是所有都是这样，复杂性（On ^ 2））

actual=["Apple", "Appl", "Orange", "Ornge", "Peace"] predicted=["Red", "Apple", "Green", "Peace", "Orange"] res = set(actual).intersection(predicted) print (res) print ((len(res) / len(actual)) * 100)

Answer 3

{x[1] for x in itertools.product(actual, predicted) if similar(*x) > 0.80}

Answer 4

>>> actual=["Apple", "Appl", "Orange", "Ornge", "Peace"]
>>> predicted=["Red", "Apple", "Green", "Peace", "Orange"]
>>> set(actual) & set(predicted)
set(['Orange', 'Peace', 'Apple'])

Answer 5

在这种情况下，您只需要检查预测列表的第i个元素是否存在于实际列表中。如果存在，则添加到新列表。

In [2]: actual=["Apple", "Appl", "Orange", "Ornge", "Peace"]
...: predicted=["Red", "Apple", "Green", "Peace", "Orange"]


In [3]: [i for i in predicted if i in actual]
Out[3]: ['Apple', 'Peace', 'Orange']

Answer 6

简单的方法，但无效，将是：

counter = 0
for item in b:
    if SequenceMatcher(None, a, item).ratio() > 0:
        counter += 1

这就是你想要的，模糊匹配元素的数量，不仅是相同的元素（大多数其他答案所提供的）。

Answer 7

首先取两组的交集：

actual, predicted = set(actual), set(predicted)

exact = actual.intersection(predicted)

如果这包含你所有的实际单词，那么你就完成了。但是，

if len(exact) < len(actual):
    fuzzy = [word for word in actual-predicted for match in predicted if similar(word, match)>0.8]

最后，您的结果集为exact.union(set(fuzzy))

Answer 8

您还可以尝试以下方法来实现您的要求：

import itertools

fuzlist = [ "Appl", "Orange", "Ornge", "Peace"]
actlist = ["Red", "Apple", "Green", "Peace", "Orange"]
foundlist = []
for fuzname in fuzlist:
    for name in actlist:
        for actname in itertools.permutations(name):
            if fuzname.lower() in ''.join(actname).lower():
                foundlist.append(name)
                break

print set(foundlist)

Python - 匹配2个列表中的字符串

8 个答案: