Python str.translate VS str.replace

时间:2015-06-30 16:11:50

标签: python

为什么 Python replacetranslate快1.5倍?

In [188]: s = '1 a  2'

In [189]: s.replace(' ','')
Out[189]: '1a2'

In [190]: s.translate(None,' ')
Out[190]: '1a2'

In [191]: %timeit s.replace(' ','')
1000000 loops, best of 3: 399 ns per loop

In [192]: %timeit s.translate(None,' ')
1000000 loops, best of 3: 614 ns per loop

2 个答案:

答案 0 :(得分:15)

假设Python 2.7(因为我必须在没有说明的情况下翻转硬币),我们可以在string.py中找到string.translatestring.replace的源代码:

>>> import inspect
>>> import string
>>> inspect.getsourcefile(string.translate)
>>> inspect.getsourcefile(string.replace)

哦,我们不能,as string.py以:

"""A collection of string operations (most are no longer used).

Warning: most of the code you see here isn't normally used nowadays.
Beginning with Python 1.6, many of these functions are implemented as
methods on the standard string object.


from cProfile import run
from string import ascii_letters

s = '1 a  2'

def _replace():
    for x in range(5000000):
        s.replace(' ', '')

def _translate():
    for x in range(5000000):    
        s.translate(None, ' ')


         5000004 function calls in 2.059 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.976    0.976    2.059    2.059 <ipython-input-3-9253b3223cde>:8(_replace)
        1    0.000    0.000    2.059    2.059 <string>:1(<module>)
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
  5000000    1.033    0.000    1.033    0.000 {method 'replace' of 'str' objects}
        1    0.050    0.050    0.050    0.050 {range}



         5000004 function calls in 1.785 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.977    0.977    1.785    1.785 <ipython-input-3-9253b3223cde>:12(_translate)
        1    0.000    0.000    1.785    1.785 <string>:1(<module>)
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
  5000000    0.756    0.000    0.756    0.000 {method 'translate' of 'str' objects}
        1    0.052    0.052    0.052    0.052 {range}

我们的函数调用数是相同的,而不是更多的函数调用意味着运行速度会慢,但它通常是一个好看的地方。有趣的是translate在我的机器上运行速度比replace快!考虑到不是孤立地测试变化的乐趣 - 不重要,因为我们只关心能够告诉为什么可以有所不同。

在任何情况下,我们至少现在知道可能存在性能差异,并且在评估字符串对象的方法时确实存在(参见tottime)。 translate __docstring__表示正在使用转换表,而替换仅提及旧到新子字符串替换。


from dis import dis


def dis_replace():
    '1 a  2'.replace(' ', '')


dis("'1 a  2'.replace(' ', '')")

  3           0 LOAD_CONST               1 ('1 a  2')
              3 LOAD_ATTR                0 (replace)
              6 LOAD_CONST               2 (' ')
              9 LOAD_CONST               3 ('')
             12 CALL_FUNCTION            2
             15 POP_TOP             
             16 LOAD_CONST               0 (None)
             19 RETURN_VALUE        


def dis_translate():
    '1 a  2'.translate(None, ' ')

  2           0 LOAD_CONST               1 ('1 a  2')
              3 LOAD_ATTR                0 (translate)
              6 LOAD_CONST               0 (None)
              9 LOAD_CONST               2 (' ')
             12 CALL_FUNCTION            2
             15 POP_TOP             
             16 LOAD_CONST               0 (None)
             19 RETURN_VALUE        

不幸的是,这两个看起来与dis相同,这意味着我们应该在这里开始查找字符串的C源代码(通过转到我正在使用的Python版本的python源代码找到) ](。

这是source for translate 如果您查看注释,则可以看到根据输入的长度,有多个replace函数定义行。



/* len(self)>=1, len(from)==len(to)>=2, maxcount>=1 */
Py_LOCAL(PyStringObject *)
replace_substring_in_place(PyStringObject *self,


/* len(self)>=1, len(from)>=2, len(to)>=2, maxcount>=1 */
Py_LOCAL(PyStringObject *)
replace_substring(PyStringObject *self,


/* Special case for deleting a single character */
/* len(self)>=1, len(from)==1, to="", maxcount>=1 */
Py_LOCAL(PyStringObject *)
replace_delete_single_character(PyStringObject *self,
                                char from_c, Py_ssize_t maxcount)

'1 a 2'.replace(' ', '')是len(self)== 6,用空字符串替换1个字符,使其为replace_delete_single_character



答案 1 :(得分:5)



import random
import string
import timeit
import re

def do_translation(N,M):
    trans_map = random.sample(string.ascii_lowercase,N),random.sample(string.ascii_lowercase,N)
    trans_tab = string.maketrans(*map("".join,trans_map))
    s = "".join(random.choice(string.ascii_lowercase) for _ in range(M))
    return s.translate(trans_tab)

def do_resub(N,M):
    trans_map = random.sample(string.ascii_lowercase,N),random.sample(string.ascii_lowercase,N)
    trans_tab = dict(zip(*trans_map))
    s = "".join(random.choice(string.ascii_lowercase) for _ in range(M))
    return re.sub("([%s])"%("".join(trans_map[0]),),lambda m:trans_tab.get(,,s)

def do_replace(N,M):
    trans_map = random.sample(string.ascii_lowercase,N),random.sample(string.ascii_lowercase,N)
    s = "".join(random.choice(string.ascii_lowercase) for _ in range(M))
    for k,v in zip(*trans_map):
       s = s.replace(k,v)
    return s

data = {}
for i in range(2,20,2):
    for j in range(10,200,10):
        data[(i,j)] = {
            "translate":timeit.timeit("do_translation(%s,%s)"%(i,j),"from __main__ import do_translation,string,random",number=100),
            "re.sub":timeit.timeit("do_resub(%s,%s)"%(i,j),"from __main__ import do_resub,re,random",number=100),
            "replace":timeit.timeit("do_replace(%s,%s)"%(i,j),"from __main__ import do_replace,random",number=100)}

print data
