我有一个这样的字符串:
string ='ArcelorMittal invests =E2=82=AC87m in new process that cuts emissions=20'
我想取出=E2=82=AC
和=20
但是当我使用时,
pattern ='(=\w\w)+'
a=re.split(pattern,string)
它返回
['ArcelorMittal invests ', '=AC', '87m in new process that cuts emissions', '=20', '']
答案 0 :(得分:1)
您可以使用re.findall
>>> s = 'ArcelorMittal invests =E2=82=AC87m in new process that cuts emissions=20'
>>> re.findall(r'(?:=\w{2})+', s)
['=E2=82=AC', '=20']
>>>
如果要删除这些字符,请使用re.sub
。
>>> re.sub(r'(?:=\w{2})+', '', s)
'ArcelorMittal invests 87m in new process that cuts emissions'
答案 1 :(得分:1)
基于your comment我建议您在原始字符串上使用quopri.decodestring
。无需提取这些字符并单独解码它们
>>> import quopri
>>> s = 'ArcelorMittal invests =E2=82=AC87m in new process that cuts emissions=20'
>>> quopri.decodestring(s)
'ArcelorMittal invests \xe2\x82\xac87m in new process that cuts emissions '
>>> print quopri.decodestring(s)
ArcelorMittal invests €87m in new process that cuts emissions