Question

我正在尝试使用此XML来获取错误代码。

>>> re_code = re.compile(r'<errorcode>([0-9]+)</errorcode>', re.MULTILINE)
>>> re_code.match('''<?xml version="1.0" encoding="ISO-8859-1" standalone="no"?>
... <methoderesponse>
...     <status>
...         <message/>
...         <errorcode>515</errorcode>
...         <value>ERROR</value>
...     </status>
... </methoderesponse>
... ''')

应该很容易。但我不明白为什么它不匹配。

Answer 1

.match()尝试在开始时进行匹配。您希望.search()或更有可能.findall()

看一下XML解析器 - 使用XPath或等价物来获取数据要好得多（加上它会处理正则表达式不会产生的细微差别）

适用于您的示例XML的示例：

import xml.etree.ElementTree as ET
tree = ET.fromstring(text)

>>> tree.findall('.//errorcode')[0].text
'515'

有关ElementTree here的更多信息，我个人会查看lxml

Answer 2

正如@ Jon Clements所说，.match()仅在表达式应该从字符串的开头运行时才有效，.search()在第一次出现时搜索字符串，.findall() 1}}搜索所有出现次数。

但无论如何，您应该将您的正则表达式略微修改为稍微更易读的版本：

regex = re.compile(r'<errorcode>(\d+)</errorcode>')

您不需要re.MULTILINE参数，它与此问题无关。

Python正则表达式：这有什么问题？

2 个答案: