Question

Python中是否有高效的质量串连接方法（如C＃中的 StringBuilder 或Java中的 StringBuffer ）？我找到了以下方法here：

使用+
使用字符串列表和join方法
使用UserString模块中的MutableString
使用字符数组和array模块
使用cStringIO模块中的StringIO

但是你的专家使用或建议了什么？为什么？

[A related question here]

Answer 1

您可能对此感兴趣：Guido的An optimization anecdote。虽然值得记住的是，这是一篇旧文章，它早于''.join之类的事情存在（虽然我猜string.joinfields或多或少相同）

根据这一点，array模块可能是最快的，如果你可以将问题塞进去。但是''.join可能足够快并且具有惯用性的好处，因此其他python程序员更容易理解。

最后，优化的黄金法则：除非你知道你需要，否则不要进行优化，而是衡量而不是猜测。

您可以使用timeit模块测量不同的方法。这可以告诉你最快的，而不是互联网上的随机陌生人猜测。

Answer 2

''.join(sequenceofstrings)通常效果最好 - 最简单，最快。

Answer 3

Python 3.6使用Literal String Interpolation更改了已知组件的字符串连接游戏。

鉴于来自mkoistinen's answer的测试用例，有字符串

domain = 'some_really_long_example.com'
lang = 'en'
path = 'some/really/long/path/'

竞争者

f'http://{domain}/{lang}/{path}' - 0.151μs
'http://%s/%s/%s' % (domain, lang, path) - 0.321μs
'http://' + domain + '/' + lang + '/' + path - 0.356μs
''.join(('http://', domain, '/', lang, '/', path)) - 0.249 μs（请注意，构建一个恒定长度的元组比构建一个常量列表要快一些。）

因此，目前最短和最漂亮的代码也是最快的。

在Python的alpha版本中，f''字符串的实现可能是最慢的 - 实际上生成的字节代码几乎等同于''.join()的情况，不必要的调用没有参数的str.__format__只会使self保持不变。这些低效率在3.6决赛之前得到了解决。

速度可以与Python 2的最快方法形成对比，后者在我的计算机上连接+;对于8位字符串， 0.203 μs;如果字符串都是Unicode，则 0.259 μs。

Answer 4

这取决于你在做什么。

在Python 2.5之后，使用+运算符进行字符串连接非常快。如果您只是连接几个值，使用+运算符效果最佳：

>>> x = timeit.Timer(stmt="'a' + 'b'")
>>> x.timeit()
0.039999961853027344

>>> x = timeit.Timer(stmt="''.join(['a', 'b'])")
>>> x.timeit()
0.76200008392333984

但是，如果你把一个字符串放在一个循环中，你最好使用列表连接方法：

>>> join_stmt = """
... joined_str = ''
... for i in xrange(100000):
...   joined_str += str(i)
... """
>>> x = timeit.Timer(join_stmt)
>>> x.timeit(100)
13.278000116348267

>>> list_stmt = """
... str_list = []
... for i in xrange(100000):
...   str_list.append(str(i))
... ''.join(str_list)
... """
>>> x = timeit.Timer(list_stmt)
>>> x.timeit(100)
12.401000022888184

...但请注意，在差异变得明显之前，你必须将相对较多的字符串组合在一起。

Answer 5

根据John Fouhy的回答，除非你必须，否则不要进行优化，但如果你在这里问这个问题，可能正是因为你必须。就我而言，我需要从字符串变量中快速组装一些URL。我注意到没有人（到目前为止）似乎正在考虑字符串格式方法，所以我想我会尝试这一点，并且主要是为了温和的兴趣，我想我会把字符串插值算子扔在那里好已淘汰。说实话，我并不认为其中任何一个会直接与＃+;＆＃39;操作或＆＃39; .join（）。但猜猜怎么了？在我的Python 2.7.5系统上，字符串插值运算符统治它们，string.format（）是表现最差的：

# concatenate_test.py

from __future__ import print_function
import timeit

domain = 'some_really_long_example.com'
lang = 'en'
path = 'some/really/long/path/'
iterations = 1000000

def meth_plus():
    '''Using + operator'''
    return 'http://' + domain + '/' + lang + '/' + path

def meth_join():
    '''Using ''.join()'''
    return ''.join(['http://', domain, '/', lang, '/', path])

def meth_form():
    '''Using string.format'''
    return 'http://{0}/{1}/{2}'.format(domain, lang, path)

def meth_intp():
    '''Using string interpolation'''
    return 'http://%s/%s/%s' % (domain, lang, path)

plus = timeit.Timer(stmt="meth_plus()", setup="from __main__ import meth_plus")
join = timeit.Timer(stmt="meth_join()", setup="from __main__ import meth_join")
form = timeit.Timer(stmt="meth_form()", setup="from __main__ import meth_form")
intp = timeit.Timer(stmt="meth_intp()", setup="from __main__ import meth_intp")

plus.val = plus.timeit(iterations)
join.val = join.timeit(iterations)
form.val = form.timeit(iterations)
intp.val = intp.timeit(iterations)

min_val = min([plus.val, join.val, form.val, intp.val])

print('plus %0.12f (%0.2f%% as fast)' % (plus.val, (100 * min_val / plus.val), ))
print('join %0.12f (%0.2f%% as fast)' % (join.val, (100 * min_val / join.val), ))
print('form %0.12f (%0.2f%% as fast)' % (form.val, (100 * min_val / form.val), ))
print('intp %0.12f (%0.2f%% as fast)' % (intp.val, (100 * min_val / intp.val), ))

结果：

# python2.7 concatenate_test.py
plus 0.360787868500 (90.81% as fast)
join 0.452811956406 (72.36% as fast)
form 0.502608060837 (65.19% as fast)
intp 0.327636957169 (100.00% as fast)

如果我使用较短的域和较短的路径，插值仍然会胜出。但是，差异更明显，字符串更长。

现在我有一个不错的测试脚本，我也在Python 2.6,3.3和3.4下进行了测试，结果如下。在Python 2.6中，plus运算符是最快的！在Python 3上，加入胜出。注意：这些测试在我的系统上非常可重复。所以，＆＃39;加上＆＃39;在2.6，＆＃39; intp＆＃39;在2.7和＃39;加入＆＃39;总是更快在Python 3.x上总是更快。

# python2.6 concatenate_test.py
plus 0.338213920593 (100.00% as fast)
join 0.427221059799 (79.17% as fast)
form 0.515371084213 (65.63% as fast)
intp 0.378169059753 (89.43% as fast)

# python3.3 concatenate_test.py
plus 0.409130576998 (89.20% as fast)
join 0.364938726001 (100.00% as fast)
form 0.621366866995 (58.73% as fast)
intp 0.419064424001 (87.08% as fast)

# python3.4 concatenate_test.py
plus 0.481188605998 (85.14% as fast)
join 0.409673971997 (100.00% as fast)
form 0.652010936996 (62.83% as fast)
intp 0.460400978001 (88.98% as fast)

# python3.5 concatenate_test.py
plus 0.417167026084 (93.47% as fast)
join 0.389929617057 (100.00% as fast)
form 0.595661019906 (65.46% as fast)
intp 0.404455224983 (96.41% as fast)

获得的经验：

有时，我的假设是错误的。
针对系统环境进行测试你将在生产中运作。
字符串插值还没有死！

<强> TL; DR：

如果使用2.6，请使用+运算符。
如果您使用2.7，请使用＆＃39;％＆＃39;操作者。
如果您使用3.x，请使用＆＃39; .join（）。

Answer 6

它几乎取决于每次新连接后新字符串的相对大小。使用+运算符，对于每个连接，都会生成一个新字符串。如果中间字符串相对较长，则+变得越来越慢，因为存储了新的中间字符串。

考虑这种情况：

from time import time
stri=''
a='aagsdfghfhdyjddtyjdhmfghmfgsdgsdfgsdfsdfsdfsdfsdfsdfddsksarigqeirnvgsdfsdgfsdfgfg'
l=[]
#case 1
t=time()
for i in range(1000):
    stri=stri+a+repr(i)
print time()-t

#case 2
t=time()
for i in xrange(1000):
    l.append(a+repr(i))
z=''.join(l)
print time()-t

#case 3
t=time()
for i in range(1000):
    stri=stri+repr(i)
print time()-t

#case 4
t=time()
for i in xrange(1000):
    l.append(repr(i))
z=''.join(l)
print time()-t

<强>结果

1 0.00493192672729

2 0.000509023666382

3 0.00042200088501

4 0.000482797622681

在1＆amp; 2的情况下，我们添加一个大字符串，join（）执行速度提高约10倍。在案例3和4中，我们添加一个小字符串，＆＃39; +＆＃39;执行得稍快

Answer 7

我遇到了一个需要有一个未知大小的可附加字符串的情况。这些是基准测试结果（python 2.7.3）：

$ python -m timeit -s 's=""' 's+="a"'
10000000 loops, best of 3: 0.176 usec per loop
$ python -m timeit -s 's=[]' 's.append("a")'
10000000 loops, best of 3: 0.196 usec per loop
$ python -m timeit -s 's=""' 's="".join((s,"a"))'
100000 loops, best of 3: 16.9 usec per loop
$ python -m timeit -s 's=""' 's="%s%s"%(s,"a")'
100000 loops, best of 3: 19.4 usec per loop

这似乎表明'+ ='是最快的。 skymind链接的结果有点过时了。

（我意识到第二个例子没有完成，最后的列表需要加入。但是，这确实表明，只是准备列表需要比字符串连接更长的时间。）

Answer 8

一年后，让我们用python 3.4.3测试mkoistinen的答案：

加0.963564149000（快95.83％）
加入0.923408469000（快100.00％）
表格1.501130934000（快61.51％）
intp 1.019677452000（快90.56％）

没有任何改变。加入仍然是最快的方法。使用intp可以说是可读性方面的最佳选择，你可能想要使用intp。

Answer 9

对于python 3.8.6 / 3.9，我不得不做一些肮脏的技巧，因为perfplot发出了一些错误。这里假设x[0]是a，而x[1]是b：

对于大数据，该图几乎相同。对于小数据，

由perfplot拍摄，这是代码，大数据== range（8），小数据== range（4）。

import perfplot

from random import choice
from string import ascii_lowercase as letters

def generate_random(x):
    data = ''.join(choice(letters) for i in range(x))
    sata = ''.join(choice(letters) for i in range(x))
    return [data,sata]

def fstring_func(x):
    return [ord(i) for i in f'{x[0]}{x[1]}']

def format_func(x):
    return [ord(i) for i in "{}{}".format(x[0], x[1])]

def replace_func(x):
    return [ord(i) for i in "|~".replace('|', x[0]).replace('~', x[1])]

def join_func(x):
    return [ord(i) for i in "".join([x[0], x[1]])]

perfplot.show(
    setup=lambda n: generate_random(n),
    kernels=[
        fstring_func,
        format_func,
        replace_func,
        join_func,
    ],
    n_range=[int(k ** 2.5) for k in range(4)],
)

当有中等数据并且有4个字符串时，x[0]，x[1]，x[2]，x[3]而不是2个字符串：

def generate_random(x):
    a =  ''.join(choice(letters) for i in range(x))
    b =  ''.join(choice(letters) for i in range(x))
    c =  ''.join(choice(letters) for i in range(x))
    d =  ''.join(choice(letters) for i in range(x))
    return [a,b,c,d]

最好坚持用细绳

Answer 10

受@ JasonBaker基准的启发，这是一个简单的比较10 "abcdefghijklmnopqrstuvxyz"个字符串，显示.join()更快;即使变量微小增加：

连环

>>> x = timeit.Timer(stmt='"abcdefghijklmnopqrstuvxyz" + "abcdefghijklmnopqrstuvxyz" + "abcdefghijklmnopqrstuvxyz" + "abcdefghijklmnopqrstuvxyz" + "abcdefghijklmnopqrstuvxyz" + "abcdefghijklmnopqrstuvxyz" + "abcdefghijklmnopqrstuvxyz" + "abcdefghijklmnopqrstuvxyz" + "abcdefghijklmnopqrstuvxyz" + "abcdefghijklmnopqrstuvxyz" + "abcdefghijklmnopqrstuvxyz"')
>>> x.timeit()
0.9828147209324385

加入

>>> x = timeit.Timer(stmt='"".join(["abcdefghijklmnopqrstuvxyz", "abcdefghijklmnopqrstuvxyz", "abcdefghijklmnopqrstuvxyz", "abcdefghijklmnopqrstuvxyz", "abcdefghijklmnopqrstuvxyz", "abcdefghijklmnopqrstuvxyz", "abcdefghijklmnopqrstuvxyz", "abcdefghijklmnopqrstuvxyz", "abcdefghijklmnopqrstuvxyz", "abcdefghijklmnopqrstuvxyz", "abcdefghijklmnopqrstuvxyz"])')
>>> x.timeit()
0.6114138159765048

Answer 11

对于短字符串的小集 （即2或3个不超过几个字符的字符串）， plus 仍然是快点。在Python 2和3中使用mkoistinen的精彩脚本：

plus 2.679107467004 (100.00% as fast)
join 3.653773699996 (73.32% as fast)
form 6.594011374000 (40.63% as fast)
intp 4.568015249999 (58.65% as fast)

因此，当您的代码执行大量单独的小型连接时，如果速度至关重要，那么是首选方式。

Answer 12

可能“Python 3.6中的新f字符串”是连接字符串的最有效方式。

使用％s

>>> timeit.timeit("""name = "Some"
... age = 100
... '%s is %s.' % (name, age)""", number = 10000)
0.0029734770068898797

使用.format

>>> timeit.timeit("""name = "Some"
... age = 100
... '{} is {}.'.format(name, age)""", number = 10000)
0.004015227983472869

使用f

>>> timeit.timeit("""name = "Some"
... age = 100
... f'{name} is {age}.'""", number = 10000)
0.0019175919878762215

来源：https://realpython.com/python-f-strings/

python中最有效的字符串连接方法是什么？

12 个答案:

连环

加入