python:清理一个字符串

时间:2010-09-01 19:07:26

标签: python string

我有一个像这样的字符串

somestring='in this/ string / i have many. interesting.occurrences of {different chars} that need     to .be removed  '

这是我想要的结果:

somestring='in this string i have many interesting occurrences of different chars that need to be removed'

我开始手动执行各种.replace,但是有很多不同的组合,我认为必须有一个更简单的方法。也许有一个图书馆已经这样做了?

有谁知道如何清理这个字符串>?

3 个答案:

答案 0 :(得分:14)

我会使用正则表达式将所有非字母数字替换为空格:

>>> import re
>>> somestring='in this/ string / i have many. interesting.occurrences of {different chars} that need     to .be removed  '
>>> rx = re.compile('\W+')
>>> res = rx.sub(' ', somestring).strip()
>>> res
'in this string i have many interesting occurrences of different chars that need to be removed'

答案 1 :(得分:2)

您有两个步骤:删除标点符号,然后删除多余的空格。

1)使用string.translate

import string
trans_table = string.maketrans( string.punctuation, " "*len(string.punctuation)
new_string = some_string.translate(trans_table)

这使得然后应用将标点符号映射到空白的转换表。

2)删除多余的空格

new_string = " ".join(new_string.split())

答案 2 :(得分:1)

re.sub('[\[\]/{}.,]+', '', somestring)