在python中执行此字符串模式替换的最快方法是什么?

时间:2015-09-04 01:57:22

标签: python

给出patten中的字符串

str="a@b = c"

想要将其替换为

str="a@'b'"

即引用' b'并删除"之后的所有内容="和它本身。

在python中执行此操作的最佳方法是什么?

编辑:

' B'上面可以是任何长度的任何未知的非空白字符串

4 个答案:

答案 0 :(得分:3)

更新示例。假设我们要替换的角色总是继续“@”:

str="a@b = c"
replaceChar = str.split('@')[1].split(' ')[0] 
print str.split('=')[0].replace(replaceChar, "'{0}'".format(replaceChar) ).replace(' ', '')

输出:

a@'b'

在以下代码上运行相同的代码:

str="a@e = c"
str="a@test = c"
str="a@whammy = c"

输出:

a@'e'
a@'test'
a@'whammy'

这就是你要追求的吗?

<强>更新

由于有人最终提供了一个使用正则表达式的方法,我们可以对它们进行基准测试。

import re
import timeit

# Method #1 (string ops)
def stringOps():
    s="a@whammy = c"
    replaceChar = s.split('@')[1].split(' ')[0] 
    s.split('=')[0].replace(replaceChar, "'{0}'".format(replaceChar) ).replace(' ', '')

# Method #2 (regex)
def regex():
    s="a@bam = c"
    re.sub(r'(\w+)(\s*=\s*\w+$)', r"'\1'", s)

timestamp1 = timeit.Timer('from __main__ import stringOps;stringOps()')
timestamp2 = timeit.Timer('from __main__ import regex;regex()')
iterations = 1000000
time1 = timestamp1.timeit(iterations)
time2 = timestamp2.timeit(iterations)
print 'Method #1 took {0}'.format(time1)
print 'Method #2 took {0}'.format(time2)

输出:

Method #1 took 4.98833298683
Method #2 took 14.708286047

因此,在这种情况下,正则表达式似乎仍然较慢。虽然我会给予他们信任,但感觉更具可读性。如果你没有做任何疯狂的迭代,我会选择你觉得最舒服的方法。

答案 1 :(得分:3)

"%s@'%s'"%tuple(txt.split(' =')[0].split('@'))

只要它们被&#39; @&#39;分开,就可以使用a或b的任意值。 &c由&#39; =&#39;分隔。

PS。如果b包含&#39;它会中断=&#39;或者&#39; @&#39;

编辑:添加基于Green Cell的速度基准。

edit_again:在基准测试中添加其他示例。

import re

import timeit

# Method #1 (string ops) -> Green Cell's
def stringOps():
    s="a@whammy = c"
    replaceChar = s.split('@')[1].split(' ')[0] 
    s.split('=')[0].replace(replaceChar, "'{0}'".format(replaceChar) ).replace(' ', '')
time1 = timeit.timeit('from __main__ import stringOps;stringOps()')
# Method #2 (regex)  -> Dawg's 
def regex():
    s="a@bam = c"
    re.sub(r'(\w+)(\s*=\s*\w+$)', r"'\1'", s)


time2 = timeit.timeit('from __main__ import regex;regex()')

#%method 3 split_n_dice  -> my own
def slice_dice():
    txt="a@whammy = c"
    "%s@'%s'"%tuple(txt.split(' =')[0].split('@'))

time3 = timeit.timeit('from __main__ import slice_dice;slice_dice()')    

print 'Method #1 took {0}'.format(time1)
print 'Method #2 took {0}'.format(time2)
print 'Method #3 took {0}'.format(time3)
  
    
      

方法#1花了2.01555299759

             

方法#2花了4.66884493828

             

方法#3采用1.44083309174

    
  

答案 2 :(得分:1)

因为你声明&#39; b&#39;上面可以是任何长度的任何未知的非空白字符串最好的可能是正则表达式。

此正则表达式执行替换:

/(\w+)(\s*=\s*\w+$)/'\1'/

Demo

在Python中:

>>> import re
>>> s="a@b = c"
>>> re.sub(r'(\w+)(\s*=\s*\w+$)', r"'\1'", s)
"a@'b'"

答案 3 :(得分:1)

不确定这是否是最快或最有效的,但它非常简单。

依赖于字符串中的@= 常量,并且只有一个。

s = "a@b = c"
keep, _ = s.split('=')
keep = keep.strip()
keep = keep.split('@')
keep[1] = "\'" + keep[1] + "\'"
#keep[1] = r"'" + keep[1] + r"'"
#keep[1] = "'" + keep[1] + "'"
result = '@'.join(keep)

作为一项功能:

def f(s):
    keep, _ = s.split('=')
    keep = keep.strip()
    keep = keep.split('@')
    keep[1] = "\'" + keep[1] + "\'"
    return '@'.join(keep)