输入字符串
<msgCode>1111</msgCode>asdasdad<errorId>2222</errorId>
我想要什么
(1111,2222)
如果我使用findall,这就是我得到的:
>>> import re;
>>> print re.findall("<(msgCode|errorId)>([0-9]+)</(msgCode|errorId)>","<msgCode>1111</msgCode>asdasdad<errorId>2222</errorId>");
[('msgCode', '1111', 'msgCode'), ('errorId', '2222', 'errorId')]
我希望的是
[('1111','2222')]
有一种简单的方法可以使用re代替后处理输出吗?
答案 0 :(得分:2)
考虑使用xpath代替:
>>> from lxml import html
>>> root = html.fromstring('<msgCode>1111</msgCode>asdasdad<errorId>2222</errorId>')
>>> root.xpath('//*[self::msgcode or self::errorid]/text()')
['1111', '2222']
答案 1 :(得分:-1)
对msgCode标记使用非捕获组(?:msgCode|errorId)
>> import re
>> subject = "<msgCode>1111</msgCode>asdasdad<errorId>2222</errorId>"
>> result = re.findall("<(?:msgCode|errorId)>([0-9]+)</(?:msgCode|errorId)>", subject)
>> print result
['1111', '2222']