删除除了数字内标点之外的标点符号

时间:2017-11-07 04:00:36

标签: python regex

有没有办法删除字符串中的所有标点符号,但保留连字符和数字内的标点符号?

Hello! this episode is thirty-five minutes long, 35.26 mins to be precise.

应该是:

Hello this episode is thirty-five minutes long 35.26 mins to be precise

2 个答案:

答案 0 :(得分:5)

您可以将re.sub正向前瞻

一起使用
In [165]: re.sub('\W(?=\s|$)', '', s)
Out[165]: 'Hello this episode is thirty-five minutes long 35.26 mins to be precise'

<强>详情

\W      # any character that is not a letter or digit
(?=     # positive lookahead
\s      # whitespace
|       # regex OR
$       # EOL
)

答案 1 :(得分:3)

使用较新的regex模块可以实现变体:

\w+[-.]+\w+(*SKIP)(*FAIL)|[!,.]+

细分:

\w+[-.]+\w+    # 1+ word characters, followed by - or ., another 1+ wc
(*SKIP)(*FAIL) # all of these shall fail
|              # or
[!,.]+         # one of !,. but possibly more

a demo on regex101.com

<小时/> 在Python

import regex as re

string = "Hello! this episode is thirty-five minutes long, 35.26 mins to be precise."

rx = re.compile(r'\w+[-.]+\w+(*SKIP)(*FAIL)|[!,.]+')
string = rx.sub('', string)
print(string)
# Hello this episode is thirty-five minutes long 35.26 mins to be precise