Question

我写了一篇机器学习算法，该算法现在可以完美地工作了，我必须将列表中的所有项目相互迭代，以生成0.01到1.00之间的相似性标记。这是代码

    temp[]
    start_node = 0
    end_node = 0
    length = len(temp)
    for start_node in range(length):
        doc1 = nlp(temp[start_node])
        for end_node in range(++start_node, length):
            doc2 = nlp(temp[end_node])
            similar = doc1.similarity(doc2)
            exp_value = float(0.85)
            if similar == 1.0:
                print("Exact match", similar, temp[end_node], "---------||---------",  temp[start_node])
            elif 0.96 < similar < 0.99:
                print("possible match", similar, temp[end_node], "---------||---------", temp[start_node])
                temp.remove(temp[end_node])

在这里，我尝试与列表中的所有其他项目检查一项是否相似，然后从列表中删除该项目，因为没有好处再检查句子与其他元素的相似性，这将是浪费计算能力。但是，当我尝试弹出元素时，出现索引错误。

<ipython-input-12-c1959947bdd1> in <module>
      4 length = len(temp)
      5 for start_node in range(length):
----> 6     doc1 = nlp(temp[start_node])
      7     for end_node in range(++start_node, length):
      8         doc2 = nlp(temp[end_node])

我只是想保留原始句子，删除列表中所有相似的句子，这样就不会再检查这些项目了。

临时列表中有351个项目，这里我只是指一个列表。

这里是一个测试

print(temp[:1])

['malicious: caliche development partners "financial statement"has been shared with you']

我尝试创建另一个重复的列表并从该列表中删除类似的项目

final_items = temp
start_node = 0
end_node = 0
length = len(temp)
for start_node in range(length):
    doc1 = nlp(temp[start_node])
    for end_node in range(++start_node, length):
        doc2 = nlp(temp[end_node])
        similar = doc1.similarity(doc2)
        exp_value = float(0.85)
        if similar == 1.0:
            print("Exact match", similar, temp[end_node], "---------||---------",  temp[start_node])
        elif 0.96 < similar < 0.99:
            print("possible match", similar, temp[end_node], "---------||---------", temp[start_node])
            final_items.remove(temp[end_node])

但是当我从另一个甚至没有迭代的列表中删除元素时，仍然有相同的列表索引超出范围。

Answer 1

我认为您的问题出在这里。

temp.remove(temp[end_node])

您将删除temp列表中的项目，因此列表索引将超出范围。

比方说，从temp开始包含351个项，即索引0到350。

现在，脚本将删除temp列表中的1（或更多）项。
突然temp列表将有350个项目，即索引0到349。

但是，脚本仍然使用temp的原始长度351进行迭代。
因此，当脚本到达最后一次迭代索引350时（或更早，如果删除了多个项目），交互将尝试获取不再存在的列表索引。

doc1 = nlp(temp[350])

由于此时temp列表索引为0到349。

最好有一个额外的列表副本进行修改，而不是修改您迭代的列表。
如果您创建其他列表，请记住使用复制方法。

final_items = temp.copy()

由于常规分配将继续引用temp列表。
Python doc - copy()

遍历列表时使索引超出范围

1 个答案: