我有一个像这样的字符串
somestring='in this/ string / i have many. interesting.occurrences of {different chars} that need to .be removed '
这是我想要的结果:
somestring='in this string i have many interesting occurrences of different chars that need to be removed'
我开始手动执行各种.replace
,但是有很多不同的组合,我认为必须有一个更简单的方法。也许有一个图书馆已经这样做了?
有谁知道如何清理这个字符串>?
答案 0 :(得分:14)
我会使用正则表达式将所有非字母数字替换为空格:
>>> import re
>>> somestring='in this/ string / i have many. interesting.occurrences of {different chars} that need to .be removed '
>>> rx = re.compile('\W+')
>>> res = rx.sub(' ', somestring).strip()
>>> res
'in this string i have many interesting occurrences of different chars that need to be removed'
答案 1 :(得分:2)
您有两个步骤:删除标点符号,然后删除多余的空格。
1)使用string.translate
import string
trans_table = string.maketrans( string.punctuation, " "*len(string.punctuation)
new_string = some_string.translate(trans_table)
这使得然后应用将标点符号映射到空白的转换表。
2)删除多余的空格
new_string = " ".join(new_string.split())
答案 2 :(得分:1)
re.sub('[\[\]/{}.,]+', '', somestring)