python正则表达式拆分字符串并使所有单词无效

时间:2017-09-02 18:00:15

标签: python regex python-3.6

我正在尝试split使用regular expression python\w+(\.?\w+)*获取所有匹配的文字。

RE:[a-zA-Z0-9_]

这需要仅捕获>>> import re >>> from pprint import pprint >>> pattern = r"\w+(\.?\w+)*" >>> string = """this is some test string and there are some digits as well that need to be captured as well like 1234567890 and 321 etc. But it should also select _ as well. I'm pretty sure that that RE does exactly the same. ... Oh wait, it also need to filter out the symbols like !@#$%^&*()-+=[]{}.,;:'"`| \(`.`)/ ... ... I guess that's it.""" >>> pprint(re.findall(r"\w+(.?\w+)*", string)) [' etc', ' well', ' same', ' wait', ' like', ' it'] 类似的内容。

Here is example

但是当我尝试匹配并从字符串中获取所有内容时,它不会返回正确的结果。

代码段:

background-size: 100% 100%;

它只返回一些单词,但实际上它应该返回所有的单词,数字和下划线[如链接示例]。

python版本: Python 3.6.2(默认,2017年7月17日,16:44:45)

感谢。

1 个答案:

答案 0 :(得分:2)

您需要使用 - 捕获组(请参阅here why)并转义点(请参阅here应在正则表达式中转义哪些字符):

>>> import re
>>> from pprint import pprint
>>> pattern = r"\w+(?:\.?\w+)*"
>>> string = """this is some test string and there are some digits as well that need to be captured as well like 1234567890 and 321 etc. But it should also select _ as well. I'm pretty sure that that RE does exactly the same.
... Oh wait, it also need to filter out the symbols like !@#$%^&*()-+=[]{}.,;:'"`| \(`.`)/
... 
... I guess that's it."""
>>> pprint(re.findall(pattern, string, re.A))
['this', 'is', 'some', 'test', 'string', 'and', 'there', 'are', 'some', 'digits', 'as', 'well', 'that', 'need', 'to', 'be', 'captured', 'as', 'well', 'like', '1234567890', 'and', '321', 'etc', 'But', 'it', 'should', 'also', 'select', '_', 'as', 'well', 'I', 'm', 'pretty', 'sure', 'that', 'that', 'RE', 'does', 'exactly', 'the', 'same', 'Oh', 'wait', 'it', 'also', 'need', 'to', 'filter', 'out', 'the', 'symbols', 'like', 'I', 'guess', 'that', 's', 'it']

此外,要仅匹配ASCII字母,数字和_,您必须传递re.A标记。

请参阅Python demo