有没有办法删除字符串中的所有标点符号,但保留连字符和数字内的标点符号?
Hello! this episode is thirty-five minutes long, 35.26 mins to be precise.
应该是:
Hello this episode is thirty-five minutes long 35.26 mins to be precise
答案 0 :(得分:5)
您可以将re.sub
与正向前瞻:
In [165]: re.sub('\W(?=\s|$)', '', s)
Out[165]: 'Hello this episode is thirty-five minutes long 35.26 mins to be precise'
<强>详情
\W # any character that is not a letter or digit
(?= # positive lookahead
\s # whitespace
| # regex OR
$ # EOL
)
答案 1 :(得分:3)
使用较新的regex
模块可以实现变体:
\w+[-.]+\w+(*SKIP)(*FAIL)|[!,.]+
细分:
\w+[-.]+\w+ # 1+ word characters, followed by - or ., another 1+ wc
(*SKIP)(*FAIL) # all of these shall fail
| # or
[!,.]+ # one of !,. but possibly more
<小时/>
在Python
:
import regex as re
string = "Hello! this episode is thirty-five minutes long, 35.26 mins to be precise."
rx = re.compile(r'\w+[-.]+\w+(*SKIP)(*FAIL)|[!,.]+')
string = rx.sub('', string)
print(string)
# Hello this episode is thirty-five minutes long 35.26 mins to be precise