为什么“”.join()似乎比+ =

时间:2016-09-10 16:35:07

标签: python

尽管有这个问题Why is ''.join() faster than += in Python?,它的答案以及幕后代码的这个很好的解释:https://paolobernardi.wordpress.com/2012/11/06/python-string-concatenation-vs-list-join/
我的测试表明不然,我感到困惑 我做的事情简单,不正确吗?我承认我正在捏造x的创造,但我不知道这会如何影响结果。

import time
x="xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
y=""
t1 = (time.time())
for i in range(10000):
    y+=x
t2 = (time.time())
#print (y)
print (t1,t2,"=",t2-t1)

(1473524757.681939,1473524757.68521,'=',0.0032711029052734375)

import time
x="xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
y=""
t1 = (time.time())
for i in range(10000):
    y=y+x
t2 = (time.time())
#print (y)
print (t1,t2,"=",t2-t1)

(1473524814.544177,1473524814.547544,'=',0.0033669471740722656)

import time
x=10000*"xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
y=""
t1 = (time.time())
y= "".join(x)
t2 = (time.time())
#print (y)
print (t1,t2,"=",t2-t1)

(1473524861.949515,1473524861.978755,'=',0.029239892959594727)

可以看出,"".join()速度要慢得多,但我们被告知它意味着更快。
这些值在python2.7和python3.4

中非常相似

修改 太公平了。

“一大串”的事情就是踢球者。

import time
x=[]
for i in range(10000):
    x.append("xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx")
y=""
t1 = (time.time())
y= "".join(x)
t2 = (time.time())
#print (y)
print (t1,t2,"=",t2-t1)

(1473526344.55748,1473526344.558409,'=',0.0009288787841796875)

快一个数量级。 Mea Culpa!

2 个答案:

答案 0 :(得分:4)

您在一个巨大的字符串上调用''.join(),而不是列表(乘以字符串会产生更大的字符串)。这会强制str.join()遍历该巨大字符串,加入74k 个别'x'个字符。换句话说,你的第二次测试比第一次测试多74倍。

要进行公平审判,您需要从两者的相同输入开始,并使用timeit module来减少垃圾收集和其他进程对您系统的影响。

这意味着两种方法都需要从字符串的列表开始工作(您的赋值示例依赖于重复添加字符串文字,存储为常量):

from timeit import timeit

testlist = ['x' * 74 for _ in range(100)]

def strjoin(testlist):
    return ''.join(testlist)

def inplace(testlist):
    result = ''
    for element in testlist:
        result += element
    return result

def concat(testlist):
    result = ''
    for element in testlist:
        result = result + element
    return result

for f in (strjoin, inplace, concat):
    timing = timeit('f(testlist)', 'from __main__ import f, testlist',
                    number=100000)
    print('{:>7}: {}'.format(f.__name__, timing))

在我的Macbook Pro上,在Python 3.5上,这会产生:

strjoin: 0.09923043003072962
inplace: 1.0032496969797648
 concat: 1.0027298880158924

在2.7,我得到:

strjoin: 0.118290185928
inplace: 0.85814499855
 concat: 0.867822885513

str.join()仍然是胜利者。

答案 1 :(得分:2)

您没有比较相同的操作,因为您的第一个操作在每次迭代时添加了长字符串,而join连接地添加了字符串的每个项目。 (另见@MartijnPieters答案)

如果我进行比较,我会得到完全不同的时间,表明str.join要快得多:

x = "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"

def join_inplace_add(y, x, num):
    for _ in range(num):
        y += x
    return y

def join_by_join(x, num):
    return ''.join([x for _ in range(num)])

%timeit join_by_join('', x, 1000)
# 10000 loops, best of 3: 91 µs per loop
%timeit join_inplace_add(x, 1000)
# 1000 loops, best of 3: 325 µs per loop