使用Python3.6从字符串中提取字母数字,整数,浮点数

时间:2018-02-13 17:31:14

标签: python regex

我有一个字符串:

s= "Classic for older systems.  People •  Animals •  Food • ⚽ Activities •  Travel •  Objects •  Symbols ...45.6"

我想删除符号,表情符号,•

预期输出如下:

"Classic for older systems  People   Animals   Food   Activities   Travel   Objects   Symbols 45.6"

代码:

re.sub(r'([^\s\w]|_)+', '', s)

产生

'Classic for older systems  People   Animals   Food   Activities   Travel   Objects   Symbols 456'

从浮点数中删除点。我该如何解决这个问题?

2 个答案:

答案 0 :(得分:2)

See regex in use here

(\d+\.\d+)|[^a-z\d\s]+
  • (\d+\.\d+)将十进制数捕获到第一个捕获组中:一个或多个数字,点,一个或多个数字
  • [^a-z\d\s]+匹配一个或多个不是字母数字或空格的字符。使用i(不区分大小写的标志),这也匹配大写变体。

替换:$1

输出:

Classic for older systems  People   Animals   Food   Activities   Travel   Objects   Symbols 45.6

答案 1 :(得分:0)

您可以使用以下代码模仿(*SKIP)(*FAIL)

import re

s = "Classic for older systems.  People •  Animals •  Food • ⚽ Activities •  Travel •  Objects •  Symbols ...45.6"

rx = re.compile(r'\d+\.\d+|(\W+)')

def replacer(match):
    if match.group(1) is not None:
        return ' ' * len(match.group(1))
    else:
        return match.group(0)

s = rx.sub(replacer, s)
print(s)

这使用函数replacer作为替换并产生

Classic for older systems     People      Animals      Food      Activities      Travel      Objects      Symbols    45.6