我有一个字符串:
s= "Classic for older systems. People • Animals • Food • ⚽ Activities • Travel • Objects • Symbols ...45.6"
我想删除符号,表情符号,•
预期输出如下:
"Classic for older systems People Animals Food Activities Travel Objects Symbols 45.6"
代码:
re.sub(r'([^\s\w]|_)+', '', s)
产生
'Classic for older systems People Animals Food Activities Travel Objects Symbols 456'
从浮点数中删除点。我该如何解决这个问题?
答案 0 :(得分:2)
(\d+\.\d+)|[^a-z\d\s]+
(\d+\.\d+)
将十进制数捕获到第一个捕获组中:一个或多个数字,点,一个或多个数字[^a-z\d\s]+
匹配一个或多个不是字母数字或空格的字符。使用i
(不区分大小写的标志),这也匹配大写变体。替换:$1
输出:
Classic for older systems People Animals Food Activities Travel Objects Symbols 45.6
答案 1 :(得分:0)
您可以使用以下代码模仿(*SKIP)(*FAIL)
import re
s = "Classic for older systems. People • Animals • Food • ⚽ Activities • Travel • Objects • Symbols ...45.6"
rx = re.compile(r'\d+\.\d+|(\W+)')
def replacer(match):
if match.group(1) is not None:
return ' ' * len(match.group(1))
else:
return match.group(0)
s = rx.sub(replacer, s)
print(s)
这使用函数replacer
作为替换并产生
Classic for older systems People Animals Food Activities Travel Objects Symbols 45.6