我有以下想要使用python regex以所需格式显示的文本
text = "' PowerPoint PresentationOctober 11th, 2011(Visit) to Lap Chec1Edit or delete me in ‘view’ then ’slide master’.'"
我使用了以下代码
reg = re.compile("[^\w']")
text = reg.sub(' ', text)
但是它以文本= "'PowerPoint PresentationOctober 11th 2011 Visit to Lap Chec1Edit or delete me in â viewâ then â slide masterâ'"
的形式给出输出,这不是期望的输出。
我想要的输出应该是text = '"PowerPoint PresentationOctober 11th, 2011(Visit) to Lap Chec1Edit or delete me in view then slide master.'"
我要删除除[]()-,.
答案 0 :(得分:1)
您可以使用正确的编码来修复字符,而不是删除字符:
text = text.encode('windows-1252').decode('utf-8')
// => ' PowerPoint PresentationOctober 11th, 2011Visit to Lap Chec1Edit or delete me in ‘view’ then ’slide master’.'
请参见Python demo
如果以后要删除它们,它将变得更加容易,例如text.replace('‘', '').replace('’', '')
或re.sub(r'[’‘]+', '', text)
。
答案 1 :(得分:-1)
尽管很简单,但我得到了答案,谢谢您的答复。
reg = re.compile("[^\w'\,\.\(\)\[\]]")
text = reg.sub(' ', text)