python正则表达式从字符串中查找子字符串

时间:2017-03-31 02:29:20

标签: python regex

我有一个字符串:

 <robot generated="20170330 17:19:11.956" generator="Robot 3.0.2 (Python 2.7.13 on win32)">

我想获得“生成”的值,但是使用下面的代码,它不起作用

import re
doc=r'<robot generated="20170330 17:19:11.956" generator="Robot 3.0.2 (Python 2.7.13 on win32)">'
match = re.match(r'generated="(\d+ \d+:\d+:\d+.\d+)',doc)

匹配的值是none。有人可以帮忙吗?

2 个答案:

答案 0 :(得分:1)

re.match仅匹配字符串的开头。使用re.search代替不仅在开头匹配,而且匹配任何地方。

>>> import re
>>> doc=r'<robot generated="20170330 17:19:11.956" generator="Robot 3.0.2 (Python 2.7.13 on win32)">'
>>> re.search(r'generated="(\d+ \d+:\d+:\d+\.\d+)',doc)
<_sre.SRE_Match object at 0x1010505d0>

>>> re.search(r'generated="(\d+ \d+:\d+:\d+\.\d+)',doc).group()
'generated="20170330 17:19:11.956'

>>> re.search(r'generated="(\d+ \d+:\d+:\d+\.\d+)',doc).group(1)
'20170330 17:19:11.956'

请参阅search() vs. match() from re module documentation

答案 1 :(得分:1)

在这种情况下,您不一定需要正则表达式。以下是将BeautifulSoup XML/HTML parserdateutil datetime parser一起使用的另一种想法:

In [1]: from dateutil.parser import parse

In [2]: from bs4 import BeautifulSoup

In [3]: data = '<robot generated="20170330 17:19:11.956" generator="Robot 3.0.2 (Python 2.7.13 on win32)">'

In [4]: parse(BeautifulSoup(data, "html.parser").robot['generated'])
Out[4]: datetime.datetime(2017, 3, 30, 17, 19, 11, 956000)

我发现这种方法美观,简单,直接。