Question

我对编程很新，并且不理解我的程序减速的原因。

我正在处理大约350,000到500,000行的数据集，并且会欣赏一些方向。

我需要检查新列表中的所有条目，以便更新旧条目，并将全新条目添加到列表末尾。

如果将print语句添加到重新分配循环和新行异常中，前几千次迭代很快，但之后程序变得非常慢。（在前3秒内几乎有1000个完整的循环，在大约第20,000次迭代之后，速度在5秒内减少到慢于100个完整循环，到第60,000次迭代，它在15秒内慢于100个完整循环。）

RAM使用率低于70％，CPU保持在48％到50％不变

代码如下所示：

import gc
gc.disable() #this was added to possibly improve speed

def updateOldList(oldListOfLists, newListOfLists):
    oldListIndexDict = dict()
    IDNumber = <index of ID number>
    for i in range(len(oldListOfLists)):
        oldListIndexDict[oldList[i][IDNumber]] = i
    for i in range(len(newListOfLists)):
        try:
            oldIndex = oldListIndexDict[newListOfLists[i][IDNumber]]
            oldListOfLists[oldIndex][0] = newListOfLists[i][0]
            oldListOfLists[oldIndex][3] = newListOfLists[i][3]
            del(oldListIndexDict[newListOfLists[i][IDNumber]]) #this was added to limit the number of entries in the hash table to attempt to improve speed
        except:
            oldListOfLists= oldListOfLists + newListOfLists
return oldListOfLists

每个列表列表中的内部列表需要保持有序，所以我认为我不能使用集合。

以下两个问题非常相似，我在询问之前尝试/考虑过他们的评论。

python function slowing down for no apparent reason

Python function slows down with presence of large list

Answer 1

好的，让我们使用Python 3.3。我想对于 oldListOfLists 中的每个列表应该是 newListOfLists 中的一个，并且您主要更新这些值，例如， oldListOfLists 由 newListOfLists 的第0个更新，1英尺等等 - 相同的索引，您可以简化代码。

def updateOldList(oldListOfLists, newListOfLists): for i in range(lenNewListOfLists): try: oldListOfLists[i][0] = newListOfLists[i][0] oldListOfLists[i][3] = newListOfLists[i][3] except IndexError: oldListOfLists+=newListOfLists return oldListOfLists

如果来自 oldListofLists 的列表未被 newListOfLists 中具有相同索引的列表更新，那么它实际上将无法正常工作，您可以想象它。

编辑：您可能希望捕获类似IndexError的内容，以防活动新列表没有相应的旧列表，而不是其他常见错误。

Edit2：+ =是extend的别名。

oldListOfLists+=newListOfLists

与
相同
oldListOfLists.extend(newListOfLists)

Edit3：代码是否还会变慢？你的最后一个列表（在索引中）变得越来越大吗？两个列表列表的总内存大小是多少？

Answer 2

当我的代码运行缓慢时，我会按照this link

的最佳答案中所解释的那样做

您可以看到代码的哪一部分使程序运行缓慢并尝试改进它

Python 3.3在大循环期间减速

2 个答案: