Python替换函数[替换一次]

时间:2013-03-10 15:57:28

标签: python string replace

我需要帮助我用Python编写的程序。

假设我想将单词"steak"的每个实例替换为"ghost"(只需使用它......)但我还想将单词"ghost"的每个实例替换为同时"steak"。以下代码不起作用:

 s="The scary ghost ordered an expensive steak"
 print s
 s=s.replace("steak","ghost")
 s=s.replace("ghost","steak")
 print s

打印:The scary steak ordered an expensive steak

我想要的是The scary steak ordered an expensive ghost

6 个答案:

答案 0 :(得分:24)

我可能在这里使用正则表达式:

>>> import re
>>> s = "The scary ghost ordered an expensive steak"
>>> sub_dict = {'ghost':'steak','steak':'ghost'}
>>> regex = '|'.join(sub_dict)
>>> re.sub(regex, lambda m: sub_dict[m.group()], s)
'The scary steak ordered an expensive ghost'

或者,作为您可以复制/粘贴的功能:

import re
def word_replace(replace_dict,s):
    regex = '|'.join(replace_dict)
    return re.sub(regex, lambda m: replace_dict[m.group()], s)

基本上,我创建了一个我要用其他单词替换的单词映射(sub_dict)。我可以从该映射创建一个正则表达式。在这种情况下,正则表达式为"steak|ghost"(或"ghost|steak" - 顺序无关紧要),正则表达式引擎执行其余工作,查找非重叠序列并相应地替换它们。 / p>


一些可能有用的修改

  • regex = '|'.join(map(re.escape,replace_dict)) - 允许正则表达式在其中具有特殊的正则表达式语法(如括号)。这会转义特殊字符,使正则表达式与文字文本匹配。
  • regex = '|'.join(r'\b{0}\b'.format(x) for x in replace_dict) - 如果我们的一个单词是另一个单词的子字符串,请确保我们不匹配。换句话说,将he更改为she,而不是the更改为tshe

答案 1 :(得分:12)

通过其中一个目标拆分字符串,进行替换,然后将整个目标重新组合在一起。

pieces = s.split('steak')
s = 'ghost'.join(piece.replace('ghost', 'steak') for piece in pieces)

完全.replace()一样,包括忽略单词边界。因此,它会将"steak ghosts"变为"ghost steaks"

答案 2 :(得分:4)

将其中一个单词重命名为文本中未出现的临时值。请注意,对于非常大的文本,这不是最有效的方法。为此,re.sub可能更合适。

 s="The scary ghost ordered an expensive steak"
 print s
 s=s.replace("steak","temp")
 s=s.replace("ghost","steak")
 S=s.replace("temp","steak")
 print s

答案 3 :(得分:1)

string.replace()方法中使用count变量。因此,使用您的代码,您将拥有:

s="The scary ghost ordered an expensive steak"
print s
s=s.replace("steak","ghost", 1)
s=s.replace("ghost","steak", 1)
print s

http://docs.python.org/2/library/stdtypes.html

答案 4 :(得分:1)

这样的事情怎么样?将原件存储在拆分列表中,然后翻译。保持核心代码简短,然后在需要调整翻译时调整dict。另外,易于移植到功能:

 def translate_line(s, translation_dict):
    line = []
    for i in s.split():
       # To take account for punctuation, strip all non-alnum from the
       # word before looking up the translation.
       i = ''.join(ch for ch in i if ch.isalnum()]
       line.append(translation_dict.get(i, i))
    return ' '.join(line)


 >>> translate_line("The scary ghost ordered an expensive steak", {'steak': 'ghost', 'ghost': 'steak'})
 'The scary steak ordered an expensive ghost'

答案 5 :(得分:1)

注意考虑到此问题的收视率,我取消删除并重写了不同类型的测试用例

我已经考虑了答案中的四个竞争实施

>>> def sub_noregex(hay):
    """
    The Join and replace routine which outpeforms the regex implementation. This
    version uses generator expression
    """
    return 'steak'.join(e.replace('steak','ghost') for e in hay.split('ghost'))

>>> def sub_regex(hay):
    """
    This is a straight forward regex implementation as suggested by @mgilson
    Note, so that the overheads doesn't add to the cummulative sum, I have placed
    the regex creation routine outside the function
    """
    return re.sub(regex,lambda m:sub_dict[m.group()],hay)

>>> def sub_temp(hay, _uuid = str(uuid4())):
    """
    Similar to Mark Tolonen's implementation but rather used uuid for the temporary string
    value to reduce collission
    """
    hay = hay.replace("steak",_uuid).replace("ghost","steak").replace(_uuid,"steak")
    return hay

>>> def sub_noregex_LC(hay):
    """
    The Join and replace routine which outpeforms the regex implementation. This
    version uses List Comprehension
    """
    return 'steak'.join([e.replace('steak','ghost') for e in hay.split('ghost')])

广义时间函数

>>> def compare(n, hay):
    foo = {"sub_regex": "re",
           "sub_noregex":"",
           "sub_noregex_LC":"",
           "sub_temp":"",
           }
    stmt = "{}(hay)"
    setup = "from __main__ import hay,"
    for k, v in foo.items():
        t = Timer(stmt = stmt.format(k), setup = setup+ ','.join([k, v] if v else [k]))
        yield t.timeit(n)

广义测试程序

>>> def test(*args, **kwargs):
    n = kwargs['repeat']
    print "{:50}{:^15}{:^15}{:^15}{:^15}".format("Test Case", "sub_temp",
                             "sub_noregex ", "sub_regex",
                             "sub_noregex_LC ")
    for hay in args:
        hay, hay_str = hay
        print "{:50}{:15.10}{:15.10}{:15.10}{:15.10}".format(hay_str, *compare(n, hay))

测试结果如下

>>> test((' '.join(['steak', 'ghost']*1000), "Multiple repeatation of search key"),
         ('garbage '*998 + 'steak ghost', "Single repeatation of search key at the end"),
         ('steak ' + 'garbage '*998 + 'ghost', "Single repeatation of at either end"),
         ("The scary ghost ordered an expensive steak", "Single repeatation for smaller string"),
         repeat = 100000)
Test Case                                            sub_temp     sub_noregex      sub_regex   sub_noregex_LC 
Multiple repeatation of search key                   0.2022748797   0.3517142003   0.4518992298   0.1812594258
Single repeatation of search key at the end          0.2026047957   0.3508259952   0.4399926194   0.1915298898
Single repeatation of at either end                  0.1877455356   0.3561734007   0.4228843986   0.2164233388
Single repeatation for smaller string                0.2061019057   0.3145984487   0.4252060592   0.1989413449
>>> 

基于测试结果

  1. 非正则表达式LC和临时变量替换具有更好的性能,尽管临时变量的使用性能不一致

  2. 与发电机相比,LC版本具有更好的性能(已确认)

  3. 正则表达式的速度慢了两倍(因此,如果这段代码是瓶颈,那么可以重新考虑实施更改)

  4. 正则表达式和非正则表达式版本等效且可以扩展