Question

这是我第一次在这里问一些事情，我对此非常陌生，所以我会尽我所能。我有一个包含短语的列表，我想要消除所有类似的短语，例如：

array = ["A very long string saying some things", 
         "Another long string saying some things", 
         "extremely large string saying some things", 
         "something different", 
         "this is a test"]

我想要这个结果：

array2 = ["A very long string saying some things", 
          "something different", 
          "this is a test"]`

我有这个：

for i in range(len(array)):
    swich=True
    for j in range(len(array2)):
        if (fuzz.ratio(array[i],array2[j]) >= 80) and (swich == True):
            swich=False
            pass
        if (fuzz.ratio(array[i],array2[j]) >= 80) and (swich == False):
            array2.pop(j)

但它给了我列表IndexError ...

fuzzy.ratio比较两个字符串并给出0到100之间的值，越大，字符串越相似。

我尝试做的是逐个元素地比较列表，第一次找到两个相似的字符串时，只需打开开关并从该点开始传递每个类似的发现，弹出{{1 }}。我完全接受任何建议。

Answer 1

您获得的错误是由修改列表引起的，您在该列表上进行迭代。（永远不要添加/删除/替换当前迭代的迭代元素！）range(len(array2))知道长度为N，但在array2.pop(j)之后，长度不再是N，而是N-1。当之后尝试访问第N个元素时，您会得到IndexError因为列表现在更短了。

快速猜测另一种方法：

original = ["A very long string saying some things", "Another long string saying some things", "extremely large string saying some things", "something different", "this is a test"]

filtered = list()

for original_string in original:
    include = True
    for filtered_string in filtered:
        if fuzz.ratio(original_string, filtered_string) >= 80:
            include = False
            break
    if include:
        filtered.append(original_string)

请注意for string in array循环，这是更多＆＃34; pythonic＆＃34;并且不需要整数变量也不需要范围。

Answer 2

如何使用不同的库来压缩代码并减少循环次数？

.do(dispatchAction(...))

在字符串列表中删除元素

2 个答案: