在比较任务中,Python`Counter`似乎比`sorted`慢 - 为什么?

时间:2016-10-21 02:31:02

标签: python timeit

参考这个年龄较大的stack overflow question @Raymond Hettinger获得了正确的结果,其中Counter使用命令行中的timeit比sorted快4倍,如下所示:

python3.6 -m timeit -s 'from collections import Counter' -s 'from random import shuffle' -s 't=list(range(100)) * 5' -s 'shuffle(t)' -s 'u=t[:]' -s 'shuffle(u)' 'Counter(t)==Counter(u)'

我的结果表明sorted明显快于Counter!我是否错误地使用了timeit?解释结果错了?设置数据是否会以某种方式产生不同的结果?

import timeit, functools
from collections import Counter


def sorted_lists(l1,l2):
    return sorted(l1) == sorted(l2)


def counted_lists(l1,l2):
    return Counter(l1) == Counter(l2)


short1 = [0,1,2,3,4,5,5]
short2 = [0,1,5,3,4,5,2]
long1 = list(range(0, 1000)) + [100, 10, 1000, 5]
long2 = list(range(0, 1000)) + [5, 10, 100, 1000]

number = 1000

# Long test
t = timeit.Timer(lambda: sorted_lists(long1, long2))
rl1 = t.timeit(number)
print("sorted long  :{}".format(rl1))

t = timeit.Timer(lambda: counted_lists(long1, long2))
rl2 = t.timeit(number)
print("counted long :{}".format(rl2)


# Short test
t = timeit.Timer(functools.partial(sorted_lists, short1, short2))
rs1 = t.timeit(number)
print("sorted short :{}".format(rs1))

t = timeit.Timer(functools.partial(counted_lists, short1, short2))
rs2 = t.timeit(number)
print("counted short:{}".format(rs2)

输出相当一致:

sorted long  :0.04470205499092117 # less time = fastest
counted long :0.1182843999704346

sorted short :0.0012896459666080773 # less time = fastest
counted short:0.009829471004195511

两组测试都在python 3.6中运行。

1 个答案:

答案 0 :(得分:0)

感谢@ user2357112上面的评论,如果输入数据修改如下:

short1 = list(range(10)) * 5
short2 = list(range(10)) * 5
long1 = list(range(100)) * 5
long2 = list(range(100)) * 5

shuffle(short1)
shuffle(short2)
shuffle(long1)
shuffle(long2)

结果:

sorted long  :0.055621962004806846
counted long :0.04698559001553804

sorted short :0.008079404011368752
counted short:0.014304430980701

结果正在接近Raymonds的旧问题。

另外几项测试显示以下内容:

sorted似乎在较长的近似排序列表或50项以下的任何列表中更快

对于超过1000多个项目的完全随机列表,

Count明显更快。