Question

据我所知，Python字典是一个HashTable，如果表的大小超过当前表（Objects\dictnotes.txt）最大大小的2/3，则会调整大小。

我需要删除大量字典项（数千个），例如，每小时一次，基于简单的标准 - 如果键＆lt; = guard_condition。

我知道有关创建新词典的dict理解，以及在迭代时调整dict的大小。

# dict comprehension
new_d = {key: value for key, value in d.iteritems() if key >= guard_condition }

# resize while iterating
for key in d:
    if key < guard_condition:
        del d[key]

是否有其他方法可以达到此目的？哪个更快？

Answer 1

它取决于你的字典大小以及你需要多少元素：如果你的字典键少于80％，那么＆＃34;重复大小，同时迭代＆＃34;与＆＃34; dict comprehension＆＃34;相比更快。如果你超过80％的字典键，那么＆＃34; dict comprehension＆＃34;是比较快的。使用此代码自行尝试

import cProfile, pstats, StringIO
pr = cProfile.Profile()
pr.enable()

guard_condition = int(raw_input("Enter guard_condition: "))

d = {item: item for item in xrange(10000000)};

new_d = {key: value for key, value in d.iteritems() if key >= guard_condition }

def del_iter(d, guard_condition):
    for key in d.keys():
        if key < guard_condition:
            del d[key]

del_iter(d, guard_condition)

pr.disable()
s = StringIO.StringIO()
sortby = 'cumulative'
ps = pstats.Stats(pr, stream=s).sort_stats(sortby)
ps.print_stats()
print s.getvalue()

对于guard_condition = 7000000，输出为

ncalls  tottime  percall  cumtime  percall filename:lineno(function)
     1    2.794    2.794    2.794    2.794 {raw_input}
     1    1.263    1.263    1.263    1.263 dictDel1.py:7(<dictcomp>)
     1    1.030    1.030    1.030    1.030 dictDel1.py:9(<dictcomp>) <-- dict comprehension
     1    0.892    0.892    0.976    0.976 dictDel1.py:11(del_iter) <-- resize while iterating
     1    0.085    0.085    0.085    0.085 {method 'keys' of 'dict' objects}
     1    0.000    0.000    0.000    0.000 {method 'iteritems' of 'dict' objects}
     1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}

当guard_condition = 8500000时，输出为

ncalls  tottime  percall  cumtime  percall filename:lineno(function)
     1    3.316    3.316    3.316    3.316 {raw_input}
     1    1.247    1.247    1.247    1.247 dictDel1.py:7(<dictcomp>)
     1    0.937    0.937    1.052    1.052 dictDel1.py:11(del_iter) <-- resize while iterating
     1    0.787    0.787    0.787    0.787 dictDel1.py:9(<dictcomp>) <-- dict comprehension
     1    0.115    0.115    0.115    0.115 {method 'keys' of 'dict' objects}
     1    0.000    0.000    0.000    0.000 {method 'iteritems' of 'dict' objects}
     1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}

Answer 2

我尝试使用IPython，结果如下：

In [140]: d = {item: item for item in xrange(10000)};

In [142]: guard_condition = 9000;

In [144]: %timeit new_d = {key: value for key, value in d.iteritems() if key >=
100 loops, best of 3: 2.54 ms per loop

In [140]: d = {item: item for item in xrange(10000)};

In [149]: def del_iter(d, guard_condition):
   .....:     for key in d.keys():
   .....:         if key < guard_condition:
   .....:             del d[key]
   .....:

In [150]: %timeit del_iter(d, guard_condition)
1000 loops, best of 3: 232 us per loop

差异大约是100个循环* 2.54毫秒= 254000个我们VERSUS 1000个循环* 232个我们= 232000个我们，它对我的情况可以忽略不计。

我会使用dict理解，因为可读数和

正如我所见，执行的时间是小菜一碟，我同意@Hyperboreus关于过早优化的问题。

Answer 3

继上面的评论之后 - 使用LRU（最近最少使用）或LFU（最常用）缓存：

http://en.wikipedia.org/wiki/Cache_algorithms

当新项目进入时，使用适当的策略确定要清除的项目。这将在应用程序的生命周期内分摊删除项目的成本，而不是偶尔在有选择地删除项目的突发事件中。

我不瘦，有一种更快的方法可以使用del [key]从字典中删除，但有更好的方法来实现你（我猜）尝试做的事情。 LRU和LFU是非常受欢迎的常用解决方案。

Answer 4

从纯粹的性能角度来看，无论您是在Ruby还是Python中进行编程，while循环总是比列表理解更快。但是既然你在Python中编程，你可能想要使用列表理解，因为如果速度是最重要的话，你就不会用Python编程。

Python：从dict中删除大量密钥的最快策略

4 个答案: