替换多个(特殊)字符-最有效的方法?

时间:2019-05-30 13:01:58

标签: python python-3.x string

在我收到的文本中,我想用一个空格替换以下特殊字符:

symbols = ["`", "~", "!", "@", "#", "$", "%", "^", "&", "*", "(", ")", "_", "-", "+", "=", "{", "[", "]", "}", "|", "\\", ":", ";", "\"", "<", ",", ">", ".", "?", "/"]

(在代码执行时间方面)最有效的方法是什么?

例如,我想要这个:

(Hello World)] *!

成为这个:

Hello World

候选方法似乎如下:

  1. 列表理解
  2. .replace()
  3. .translate()
  4. 正则表达式

5 个答案:

答案 0 :(得分:4)

对于有效的解决方案,您可以为此使用str.maketrans。请注意,一旦定义了转换表,就只需要映射字符串中的字符即可。这样做的方法如下:

symbols = ["`", "~", "!", "@", "#", "$", "%", "^", "&", "*", "(", ")", "_", "-", "+",
           "=", "{", "[", "]", "}", "|", "\\", ":", ";", "\"", "<", ",", ">", ".", "?", "/"]

首先使用dict.fromkeys从符号创建字典,然后为每个条目设置一个空格作为值,然后从字典创建翻译表:

d = dict.fromkeys(''.join(symbols), ' ')
# {'`': ' ', ',': ' ', '~': ' ', '!': ' ', '@': ' '...
t = str.maketrans(d)

然后调用字符串translate方法以将上述字典中的字符映射为空白:

s = '~this@is!a^test@'
s.translate(t)
# ' this is a test '

答案 1 :(得分:3)

启动一些测试后,我可以说str.translate()是最好的选择。

输入数据:

symbols = {"`", "~", "!", "@", "#", "$", "%", "^", "&", "*", "(", ")", "_", "-", "+", "=", "{", "[", "]", "}", "|", "\\", ":", ";", "\"", "<", ",", ">", ".", "?", "/"}
translate_table = {126: None, 93: None, 91: None, 125: None, 92: None, 42: None, 45: None, 94: None, 62: None, 47: None, 35: None, 59: None, 44: None, 58: None, 60: None, 124: None, 61: None, 36: None, 95: None, 43: None, 96: None, 123: None, 64: None, 33: None, 38: None, 63: None, 46: None, 34: None, 41: None, 37: None, 40: None}
regular_expression = "[`~!@#$%^&*()_\-+={[\]}|\\:;\"<,>.?/]"
small_document = "Some**r@an]]\"dom t##xt"
normal_document = "TbsX^Kt$FZ%haZe+sLxu:Al\"xNAL\\Kix[mHp_gn]PrG`DqGd~GdNc;BoEq.SYD?Rp>ukq,UfO<XdTc=RUH}oifc&oP!CB*me@Qv{Qf-Li)gmXL/IQH#mne(Khaj|"
big_document = "QOfY+dymyoGBAxTAoIeM+jEWlaECUZEUXuMvprJOqFtQR*OiHtTFZkUNbYipSTTDPOVkIdGTcjWrQmbmthKBHBSEOZ)lQAIJOrVgmGGFdtqbuFfj<Dls<JWtKczAFMPYMemiJBJHdPeeul\\x>lGIBvUsxBokagvVovrrdxdKMtAKx>MEexYv>DGqPUXYaBQKwiSIUobrPQYjilhHMQunE;RiqOZPTnyOEgRrpxcuobvvmGkFpTqgMxYYhrmRRnauiqgvCmZ\"UauceaXsgAMSakxewzPrlIrYkVCVZaEGh]qiizYyzbkcHPF@qQsQMfHPDEbEnWtrCFoARUYAloOcctqmL@hegZbfhsHaJOxOxzQhZAVjVDgokosATfhKMT!WYyPWKcKAHKCzQGGJOCglYGZbftsuyntXZUKNqgGlsLJqgN,pUcOoA/tStXFXgpoSErgvw/OUMPWjJwt=bhMAIDayOZXJm=ifYYUuAvSIZjwnBfktNvEvZmvQso%HiNZEVqoDR%nQBtCkhjSfVfDuRSRsvp-sCunjDDUYSEVLICQdisxhEfqkUTkiPlLiUNNwrvO#WTDmweZyMeIbgNXkIsvaJeHYXV(HvRcGNZM(PPRIAyyLWivGiqMVBtwObqLfEEISyyjGNEdUU:ys`dXcVawkIEAjFXky`RUXNTm`LDM}mwTOcmsSo}haJXPnkwOhKLYwve}SWifzKq}grw}fMSQXXWguUQtlWpPZQymR^wBKEyolFlZnzEEmehSNenOqDOHWRit[Npm?R?DIPXAmQYYBbmJofxUzzWBsVCoPI?VmpXhoMxCfXyHEHowXzIJvExThiffLhBTtma_jk_NrbkPCGGypXvOuBqBxDYfC{bwIHoaqnJSKytxwWXBNnKG~PKuQklGblEwH~rJoGpKZmm~tTEFnPLdmzfrqJibMYIykzL$RZLPmsZjB$AAbZwFnByOydEOIfFvTaEQaSjbpeBZuUGY&ZfPQgLihmPYrhZxSwMzLrNF.WjFiDCLyXksdkLeMHVCfrdgCAotElQ|"
no_match_document = "XOtasggWqhtSLJpHEGoCmMRepFBlRfAGKTLPcEtKonFVsPgvWgAbvJVeMWILPgLapwAmTgXWVbxOJtUFmMygzIqYPqyAxzwElTFyYcGdtnNa"

代码:

def func1(doc):
    for c in symbols:
        doc = doc.replace(c, "")
    return doc


def func2(doc):
    return doc.translate(translate_table)


def func3(doc):
    return re.sub(regular_expression, "", doc)


def func4(doc):
    return "".join(c for c in doc if c not in symbols)

测试结果:

func1(small_document):      0.701037002
func1(normal_document):     1.1260866900000002
func1(big_document):        3.4234831459999997
func1(no_match_document):   0.7740780450000004

func2(small_document):      0.14135037500000003
func2(normal_document):     0.5368806810000004
func2(big_document):        0.8128472860000002
func2(no_match_document):   0.394245089

func3(small_document):      0.3157141610000007
func3(normal_document):     0.927359323000001
func3(big_document):        1.9310377590000005
func3(no_match_document):   0.18656399199999996

func4(small_document):      0.3034549070000008
func4(normal_document):     1.3695875739999988
func4(big_document):        10.115730064
func4(no_match_document):   1.2086623230000022

UPD。

我提供的输入数据是为纯方法测试专门“准备”的。

要生成translate_table,我使用了下一个dict理解:

translate_table = {ord(s): None for s in symbols}

Here是用于正则表达式验证的网站链接(可能会有帮助)。


如果您想自己重新计算测试,请使用以下代码:

    if __name__ == '__main__':
    import timeit
    print("func1(small_document)", timeit.timeit("func1(small_document)", setup="from __main__ import func1, small_document", number=100000))
    print("func1(normal_document): ", timeit.timeit("func1(normal_document)", setup="from __main__ import func1, normal_document", number=100000))
    print("func1(big_document): ", timeit.timeit("func1(big_document)", setup="from __main__ import func1, big_document", number=100000))
    print("func1(no_match_document): ", timeit.timeit("func1(no_match_document)", setup="from __main__ import func1, no_match_document", number=100000))

    print("func2(small_document): ", timeit.timeit("func2(small_document)", setup="from __main__ import func2, small_document", number=100000))
    print("func2(normal_document): ", timeit.timeit("func2(normal_document)", setup="from __main__ import func2, normal_document", number=100000))
    print("func2(big_document): ", timeit.timeit("func2(big_document)", setup="from __main__ import func2, big_document", number=100000))
    print("func2(no_match_document): ", timeit.timeit("func2(no_match_document)", setup="from __main__ import func2, no_match_document", number=100000))

    print("func3(small_document): ", timeit.timeit("func3(small_document)", setup="from __main__ import func3, small_document", number=100000))
    print("func3(normal_document): ", timeit.timeit("func3(normal_document)", setup="from __main__ import func3, normal_document", number=100000))
    print("func3(big_document): ", timeit.timeit("func3(big_document)", setup="from __main__ import func3, big_document", number=100000))
    print("func3(no_match_document): ", timeit.timeit("func3(no_match_document)", setup="from __main__ import func3, no_match_document", number=100000))

    print("func4(small_document): ", timeit.timeit("func4(small_document)", setup="from __main__ import func4, small_document", number=100000))
    print("func4(normal_document): ", timeit.timeit("func4(normal_document)", setup="from __main__ import func4, normal_document", number=100000))
    print("func4(big_document): ", timeit.timeit("func4(big_document)", setup="from __main__ import func4, big_document", number=100000))
    print("func4(no_match_document): ", timeit.timeit("func4(no_match_document)", setup="from __main__ import func4, no_match_document", number=100000))

答案 2 :(得分:1)

s = '''
def translate_():
    symbols = '`,~,!,@,#,$,%,^,&,*,(,),_,-,+,=,{,[,],},|,\,:,;,",<,,,>,.,?,/'
    s = '~this@is!a^test @'
    t = str.maketrans(dict.fromkeys(symbols, ' '))
    s.translate(t)
    return s

def replace_():
    symbols = '`,~,!,@,#,$,%,^,&,*,(,),_,-,+,=,{,[,],},|,\,:,;,",<,,,>,.,?,/'
    s = '~this@is!a^test @'
    for symbol in symbols:
        s = s.replace(symbol, ' ')
    return s
'''

print(timeit.timeit('replace_()', setup=s, number=100000))
print(timeit.timeit('translate_()', setup=s, number=100000))

将打印:

  

0.7663131961598992

     

0.4139239452779293

因此,用translate进行替换比使用几个replace快近2倍。

答案 3 :(得分:1)

我的代码用空格替换符号,并且不删除那些空格。

对于较短的字符串,.join()速度很快,但是对于较大的字符串,.translate()则要替换很多,速度更快。令人惊讶的是,.replace()仍然非常快,如果要进行的替换很少。

text: '(Hello World)] *!'
using_replace                     0.046
using_join                        0.016
using_translate                   0.031

text: '~this@is!a^test@'
using_replace                     0.046
using_join                        0.017
using_translate                   0.029

text: '~/()&this@isasd!&=)(/as/dw&%#a^test@~/()&this@isasd!&=)(/as/dw&%#a^test@~/()&this@isasd!&=)(/as/dw&%#a^test@~/()&this@isasd!&=)(/as/dw&%#a^test@~/()&this@isasd!&=)(/as/dw&%#a^test@~/()&this@isasd!&=)(/as/dw&%#a^test@~/()&this@isasd!&=)(/as/dw&%#a^test@~/()&this@isasd!&=)(/as/dw&%#a^test@~/()&this@isasd!&=)(/as/dw&%#a^test@~/()&this@isasd!&=)(/as/dw&%#a^test@~/()&this@isasd!&=)(/as/dw&%#a^test@~/()&this@isasd!&=)(/as/dw&%#a^test@~/()&this@isasd!&=)(/as/dw&%#a^test@~/()&this@isasd!&=)(/as/dw&%#a^test@~/()&this@isasd!&=)(/as/dw&%#a^test@~/()&this@isasd!&=)(/as/dw&%#a^test@~/()&this@isasd!&=)(/as/dw&%#a^test@~/()&this@isasd!&=)(/as/dw&%#a^test@~/()&this@isasd!&=)(/as/dw&%#a^test@~/()&this@isasd!&=)(/as/dw&%#a^test@~/()&this@isasd!&=)(/as/dw&%#a^test@~/()&this@isasd!&=)(/as/dw&%#a^test@~/()&this@isasd!&=)(/as/dw&%#a^test@~/()&this@isasd!&=)(/as/dw&%#a^test@~/()&this@isasd!&=)(/as/dw&%#a^test@~/()&this@isasd!&=)(/as/dw&%#a^test@~/()&this@isasd!&=)(/as/dw&%#a^test@~/()&this@isasd!&=)(/as/dw&%#a^test@~/()&this@isasd!&=)(/as/dw&%#a^test@~/()&this@isasd!&=)(/as/dw&%#a^test@~/()&this@isasd!&=)(/as/dw&%#a^test@~/()&this@isasd!&=)(/as/dw&%#a^test@~/()&this@isasd!&=)(/as/dw&%#a^test@~/()&this@isasd!&=)(/as/dw&%#a^test@~/()&this@isasd!&=)(/as/dw&%#a^test@~/()&this@isasd!&=)(/as/dw&%#a^test@~/()&this@isasd!&=)(/as/dw&%#a^test@~/()&this@isasd!&=)(/as/dw&%#a^test@~/()&this@isasd!&=)(/as/dw&%#a^test@~/()&this@isasd!&=)(/as/dw&%#a^test@~/()&this@isasd!&=)(/as/dw&%#a^test@~/()&this@isasd!&=)(/as/dw&%#a^test@~/()&this@isasd!&=)(/as/dw&%#a^test@~/()&this@isasd!&=)(/as/dw&%#a^test@~/()&this@isasd!&=)(/as/dw&%#a^test@~/()&this@isasd!&=)(/as/dw&%#a^test@~/()&this@isasd!&=)(/as/dw&%#a^test@~/()&this@isasd!&=)(/as/dw&%#a^test@~/()&this@isasd!&=)(/as/dw&%#a^test@~/()&this@isasd!&=)(/as/dw&%#a^test@~/()&this@isasd!&=)(/as/dw&%#a^test@~/()&this@isasd!&=)(/as/dw&%#a^test@~/()&this@isasd!&=)(/as/dw&%#a^test@~/()&this@isasd!&=)(/as/dw&%#a^test@~/()&this@isasd!&=)(/as/dw&%#a^test@~/()&this@isasd!&=)(/as/dw&%#a^test@~/()&this@isasd!&=)(/as/dw&%#a^test@~/()&this@isasd!&=)(/as/dw&%#a^test@~/()&this@isasd!&=)(/as/dw&%#a^test@~/()&this@isasd!&=)(/as/dw&%#a^test@~/()&this@isasd!&=)(/as/dw&%#a^test@~/()&this@isasd!&=)(/as/dw&%#a^test@~/()&this@isasd!&=)(/as/dw&%#a^test@~/()&this@isasd!&=)(/as/dw&%#a^test@~/()&this@isasd!&=)(/as/dw&%#a^test@~/()&this@isasd!&=)(/as/dw&%#a^test@~/()&this@isasd!&=)(/as/dw&%#a^test@~/()&this@isasd!&=)(/as/dw&%#a^test@~/()&this@isasd!&=)(/as/dw&%#a^test@~/()&this@isasd!&=)(/as/dw&%#a^test@~/()&this@isasd!&=)(/as/dw&%#a^test@~/()&this@isasd!&=)(/as/dw&%#a^test@~/()&this@isasd!&=)(/as/dw&%#a^test@~/()&this@isasd!&=)(/as/dw&%#a^test@~/()&this@isasd!&=)(/as/dw&%#a^test@~/()&this@isasd!&=)(/as/dw&%#a^test@~/()&this@isasd!&=)(/as/dw&%#a^test@~/()&this@isasd!&=)(/as/dw&%#a^test@~/()&this@isasd!&=)(/as/dw&%#a^test@~/()&this@isasd!&=)(/as/dw&%#a^test@~/()&this@isasd!&=)(/as/dw&%#a^test@~/()&this@isasd!&=)(/as/dw&%#a^test@~/()&this@isasd!&=)(/as/dw&%#a^test@~/()&this@isasd!&=)(/as/dw&%#a^test@~/()&this@isasd!&=)(/as/dw&%#a^test@~/()&this@isasd!&=)(/as/dw&%#a^test@~/()&this@isasd!&=)(/as/dw&%#a^test@~/()&this@isasd!&=)(/as/dw&%#a^test@~/()&this@isasd!&=)(/as/dw&%#a^test@~/()&this@isasd!&=)(/as/dw&%#a^test@~/()&this@isasd!&=)(/as/dw&%#a^test@~/()&this@isasd!&=)(/as/dw&%#a^test@~/()&this@isasd!&=)(/as/dw&%#a^test@~/()&this@isasd!&=)(/as/dw&%#a^test@~/()&this@isasd!&=)(/as/dw&%#a^test@~/()&this@isasd!&=)(/as/dw&%#a^test@~/()&this@isasd!&=)(/as/dw&%#a^test@~/()&this@isasd!&=)(/as/dw&%#a^test@~/()&this@isasd!&=)(/as/dw&%#a^test@~/()&this@isasd!&=)(/as/dw&%#a^test@'
using_replace                     0.195
using_join                        2.327
using_translate                   0.061

text: 'a long text without chars to replacea long text without chars to replacea long text without chars to replacea long text without chars to replacea long text without chars to replacea long text without chars to replacea long text without chars to replacea long text without chars to replacea long text without chars to replacea long text without chars to replacea long text without chars to replacea long text without chars to replacea long text without chars to replacea long text without chars to replacea long text without chars to replacea long text without chars to replacea long text without chars to replacea long text without chars to replacea long text without chars to replacea long text without chars to replacea long text without chars to replacea long text without chars to replacea long text without chars to replacea long text without chars to replacea long text without chars to replacea long text without chars to replacea long text without chars to replacea long text without chars to replacea long text without chars to replacea long text without chars to replacea long text without chars to replacea long text without chars to replacea long text without chars to replacea long text without chars to replacea long text without chars to replacea long text without chars to replacea long text without chars to replacea long text without chars to replacea long text without chars to replacea long text without chars to replacea long text without chars to replacea long text without chars to replacea long text without chars to replacea long text without chars to replacea long text without chars to replacea long text without chars to replacea long text without chars to replacea long text without chars to replacea long text without chars to replacea long text without chars to replacea long text without chars to replacea long text without chars to replacea long text without chars to replacea long text without chars to replacea long text without chars to replacea long text without chars to replacea long text without chars to replacea long text without chars to replacea long text without chars to replacea long text without chars to replacea long text without chars to replacea long text without chars to replacea long text without chars to replacea long text without chars to replacea long text without chars to replacea long text without chars to replacea long text without chars to replacea long text without chars to replacea long text without chars to replacea long text without chars to replacea long text without chars to replacea long text without chars to replacea long text without chars to replacea long text without chars to replacea long text without chars to replacea long text without chars to replacea long text without chars to replacea long text without chars to replacea long text without chars to replacea long text without chars to replacea long text without chars to replacea long text without chars to replacea long text without chars to replacea long text without chars to replacea long text without chars to replacea long text without chars to replacea long text without chars to replacea long text without chars to replacea long text without chars to replacea long text without chars to replacea long text without chars to replacea long text without chars to replacea long text without chars to replacea long text without chars to replacea long text without chars to replacea long text without chars to replacea long text without chars to replacea long text without chars to replacea long text without chars to replacea long text without chars to replace'
using_replace                     0.051
using_join                        2.100
using_translate                   0.064

比较一些策略:

def using_replace(text, symbols_to_replace, replacement=' '):
    for char in symbols_to_replace:
        text = text.replace(char, replacement)

    return text

def using_join(text, symbols_to_replace, replacement=' '):
    return ''.join(
        replacement if char in symbols_to_replace else char
        for char in text)

def using_translate(text, symbols_to_replace, replacement=' '):
    translation_dict = str.maketrans(
        dict.fromkeys(symbols_to_replace, replacement))

    return text.translate(translation_dict)

使用此timeit代码表示不同的文本:

    # a 'set' for faster lookup
    symbols = {
        '`', '~', '!', '@', '#', '$', '%', '^', '&', '*',
        '(', ')', '_', '-', '+', '=', '{', '[', ']', '}',
        '|', '/', ':', ';', '"', '<', ',', '>', '.', '?',
        '\\',
    }

    text_list = [
        '(Hello World)] *!',
        '~this@is!a^test@',
        '~/()&this@isasd!&=)(/as/dw&%#a^test@' * 1000,
        'a long text without chars to replace' * 1000,
    ]
    for s in text_list:
        assert (
                using_replace(s, symbols)
                == using_join(s, symbols)
                == using_translate(s, symbols))

    for s in text_list:
        print()
        print('text:', repr(s))
        for func in [using_replace, using_join, using_translate]:
            t = timeit.timeit(
                'func(s, symbols)',
                'from __main__ import func, s, symbols',
                number=10000)
            print('{:30s} {:8.3f}'.format(func.__name__, t))

答案 4 :(得分:0)

str.translate()确实是最快的方法。这是一种构建转换表以排除字符的简洁方法:

symbols = ["`", "~", "!", "@", "#", "$", "%", "^", "&", "*", "(", ")", "_", "-", "+", "=", "{", "[", "]", "}", "|", "\\", ":", ";", "\"", "<", ",", ">", ".", "?", "/"]
removeSymbols = str.maketrans("","","".join(symbols))

cleanText = "[Hello World] *!".translate(removeSymbols)
print(cleanText) # "Hello World "

maketrans()函数可以使用3个参数,第一个是带有替换字符的字符串,第二个是它们的替换字符,第三个是应删除的字符列表。要直截了当地删除所有字符,我们只需要向第3个参数提供一个包含要删除符号的字符串即可。

然后,转换表removeSymbols完全删除符号列表中的字符。

要用空格替换,请按以下方式构建翻译表:

removeSymbols = str.maketrans("".join(symbols)," "*len(symbols))