Python正则表达式与分组匹配

时间:2014-01-31 03:02:21

标签: python regex python-2.7

输入字符串

<msgCode>1111</msgCode>asdasdad<errorId>2222</errorId>

我想要什么

(1111,2222)

如果我使用findall,这就是我得到的:

>>> import re;
>>> print re.findall("<(msgCode|errorId)>([0-9]+)</(msgCode|errorId)>","<msgCode>1111</msgCode>asdasdad<errorId>2222</errorId>");
[('msgCode', '1111', 'msgCode'), ('errorId', '2222', 'errorId')]

我希望的是

[('1111','2222')]

有一种简单的方法可以使用re代替后处理输出吗?

2 个答案:

答案 0 :(得分:2)

考虑使用xpath代替:

>>> from lxml import html
>>> root = html.fromstring('<msgCode>1111</msgCode>asdasdad<errorId>2222</errorId>')
>>> root.xpath('//*[self::msgcode or self::errorid]/text()')
['1111', '2222']

答案 1 :(得分:-1)

对msgCode标记使用非捕获组(?:msgCode|errorId)

>> import re
>> subject = "<msgCode>1111</msgCode>asdasdad<errorId>2222</errorId>"
>> result = re.findall("<(?:msgCode|errorId)>([0-9]+)</(?:msgCode|errorId)>", subject)
>> print result

['1111', '2222']