Question

我的代码是

text = 'his eyes she eclip ++ @ #ses and predominates the whole of her sex'
alphabets = set(string.ascii.lowercase)
punctuation = ['!', ',', '.', ':', ';', '?']
allowed_chars = alphbets.union(punctuation, ' ')
regex = re.compile('[^allowed_string]')
text = regex.sub(' ', text)

根据我的理解，上述代码应删除除任何给定文本中的小写ascii和标点符号之外的所有其他字符。

但是当我执行它时，结果是：

is e es s e e li    ses and redo inates t e w ole o  er se

我做错了什么？感谢

Answer 1

首先，string.ascii.lowercase无效。我认为你的意思是string.ascii_lowercase

其次，您不能使用re.compile这样的变量。它只是一个普通的字符串。

这是一个更好的解决方案。

>>>import re
>>>text = 'his eyes she eclip ++ @ #ses and predominates the whole of her sex'
>>>re_cmp = re.compile("[^a-z!,.:;?]+")
>>>re_cmp.sub(' ',text)
'his eyes she eclip ses and predominates the whole of her sex.'

使用正则表达式Python从表达式中删除字符

1 个答案: