使用re.split和pattern在Python中使用正则表达式

时间:2015-07-14 07:38:02

标签: python regex

我有一个这样的字符串:

string ='ArcelorMittal invests =E2=82=AC87m in new process that cuts emissions=20'

我想取出=E2=82=AC=20

但是当我使用时,

pattern ='(=\w\w)+'
a=re.split(pattern,string)

它返回

['ArcelorMittal invests ', '=AC', '87m in new process that cuts emissions', '=20', '']

2 个答案:

答案 0 :(得分:1)

您可以使用re.findall

>>> s = 'ArcelorMittal invests =E2=82=AC87m in new process that cuts emissions=20'
>>> re.findall(r'(?:=\w{2})+', s)
['=E2=82=AC', '=20']
>>> 

如果要删除这些字符,请使用re.sub

>>> re.sub(r'(?:=\w{2})+', '', s)
'ArcelorMittal invests 87m in new process that cuts emissions'

答案 1 :(得分:1)

基于your comment我建议您在原始字符串上使用quopri.decodestring。无需提取这些字符并单独解码它们

>>> import quopri
>>> s = 'ArcelorMittal invests =E2=82=AC87m in new process that cuts emissions=20'
>>> quopri.decodestring(s)
'ArcelorMittal invests \xe2\x82\xac87m in new process that cuts emissions '
>>> print quopri.decodestring(s)
ArcelorMittal invests €87m in new process that cuts emissions