清理用户输入的字符串的简单方法是什么? 这是我在清理混乱时依赖的代码。如果有一个更简单的智能版本可用,那就太好了。
invalid = ['#','@','$','$','%','^','&','*','(',')','-','+','!',' ']
for c in invalid:
if len(line)>0: line=line.replace(c,'')
PS我如何将这个(使用嵌套if)函数放在一行上?
答案 0 :(得分:5)
import re
re.sub('[#@$%^&*()-+!]', '', line)
re
是正则表达式模块。使用方括号意味着“匹配括号内的任何一个东西”。所以调用说,“在括号内的line
中找到任何内容,并将其替换为空(''
)。
答案 1 :(得分:5)
最快的方法是使用str.translate
:
>>> invalid = ['#','@','$','$','%','^','&','*','(',')','-','+','!',' ']
>>> s = '@#$%^&*fdsfs#$%^&*FGHGJ'
>>> s.translate(None, ''.join(invalid))
'fdsfsFGHGJ'
时间比较:
>>> s = '@#$%^&*fdsfs#$%^&*FGHGJ'*100
>>> %timeit re.sub('[#@$%^&*()-+!]', '', s)
1000 loops, best of 3: 766 µs per loop
>>> %timeit re.sub('[#@$%^&*()-+!]+', '', s)
1000 loops, best of 3: 215 µs per loop
>>> %timeit "".join(c for c in s if c not in invalid)
100 loops, best of 3: 1.29 ms per loop
>>> %timeit re.sub(invalid_re, '', s)
1000 loops, best of 3: 718 µs per loop
>>> %timeit s.translate(None, ''.join(invalid)) #Winner
10000 loops, best of 3: 17 µs per loop
在Python3上你需要做这样的事情:
>>> trans_tab = {ord(x):None for x in invalid}
>>> s.translate(trans_tab)
'fdsfsFGHGJ'
答案 2 :(得分:4)
你可以这样做:
from string import punctuation # !"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~
line = "".join(c for c in line if c not in punctuation)
例如:
'hello, I @m pleased to meet you! How *about (you) try something > new?'
变为
'hello I m pleased to meet you How about you try something new'
答案 3 :(得分:1)
这是正则表达式实际上有用的一种情况。
>>> invalid = ['#','@','$','$','%','^','&','*','(',')','-','+','!',' ']
>>> import re
>>> invalid_re = '|'.join(map(re.escape, invalid))
>>> re.sub(invalid_re, '', 'foo * bar')
'foobar'
答案 4 :(得分:1)
这是我在自己的代码中使用的代码段。您基本上使用正则表达式来指定允许的字符,匹配这些字符,然后将它们连接在一起。
import re
def clean(string_to_clean, valid='ACDEFGHIKLMNPQRSTVWY'):
"""Remove unwanted characters from string.
Args:
clean: (str) The string from which to remove
unwanted characters.
valid_chars: (str) The characters that are valid and should be
included in the returned sequence. Default character
set is: 'ACDEFGHIKLMNPQRSTVWY'.
Returns: (str) A sequence without the invalid characters, as a string.
"""
valid_string = r'([{}]+)'.format(valid)
valid_regex = re.compile(valid_string, re.IGNORECASE)
# Create string of matching characters, concatenate to string
# with join().
return (''.join(valid_regex.findall(string_to_clean)))
答案 5 :(得分:1)
使用简单的列表理解:
>>> invalid = ['#','@','$','$','%','^','&','*','(',')','-','+','!',' ']
>>> x = 'foo * bar'
>>> "".join(i for i in x if i not in invalid)
'foobar'
将列表理解与string.punctuation
+ \s
一起使用:
>>> import string
>>> x = 'foo * bar'
>>> "".join(i for i in x if i not in string.punctuation)
'foo bar'
>>> "".join(i for i in x if i not in string.punctuation+" ")
'foobar'
使用str.translate
:
>>> invalid = ['#','@','$','$','%','^','&','*','(',')','-','+','!',' ']
>>> x = 'foo * bar'
>>> x.translate(None,"".join(invalid))
'foobar'
使用re.sub
:
>>> import re
>>> invalid = ['#','@','$','$','%','^','&','*','(',')','-','+','!',' ']
>>> x = 'foo * bar'
>>> y = "["+"".join(invalid)+"]"
>>> re.sub(y,'',x)
'foobar'
>>> re.sub(y+'+','',x)
'foobar'
答案 6 :(得分:1)
这有效
invalid = '#@$%^_ '
line = "#master_Of^Puppets#@$%Yeah"
line = "".join([for l in line if l not in invalid])
#line will be - 'masterOfPuppetsYeah'