什么都不替换标点符号

时间:2013-12-13 16:05:40

标签: python regex string

>>> import re
>>> a="what is. your. name? It's good"
>>> b=re.findall(r'\w+',a)
>>> b
['what', 'is', 'your', 'name', 'It', 's', 'good']

以上结果将It's拆分为['It','s']我不希望如此。

我想用任何内容替换它,It's将是Its。同样适用于所有标点符号。 我怎样才能做到这一点?

2 个答案:

答案 0 :(得分:5)

您是否被迫使用正则表达式?使用str.translatestring.punctuation作为deletechars

可以轻松完成此任务
>>> from string import punctuation
>>> a="what is. your. name? It's good"
>>> a.translate(None, punctuation)
'what is your name Its good'

如果您被迫使用正则表达式,那么另一种选择就是

>>> from string import punctuation
>>> r = re.compile(r'[{}]+'.format(re.escape(punctuation)))
>>> r.sub('', a)
'what is your name Its good'

但是,我仍然建议你重新考虑这个设计。使用正则表达式执行此任务是一种过度杀伤。

答案 1 :(得分:0)

匹配任何单词字符和单引号逗号' (如果有)。

import re
string = "Many cook's were involved and many cooked pre-season food"
punctaution = re.findall(r"\w+([\-_.!~*'()])\w+",string)

for i in punctaution:
    string = re.sub(i,'',string)

print string

输出:

Many cooks were involved and many cooked preseason food