带有unicode字符串的Python正则表达式

时间:2015-01-31 05:44:54

标签: python regex unicode

无法匹配python 2.7中的unicode字符串。 预期结果749130

>>> print match("\d+", u'\ufeff749130'.encode('utf-8'))
None
>>> print match("\d+", u'\ufeff749130')
None
>>> print match("\d+", u'\ufeff749130'.decode('utf-8'))
Traceback (most recent call last):
....
UnicodeEncodeError: 'ascii' codec can't encode character u'\ufeff' in position 0: ordinal not in range(128)

1 个答案:

答案 0 :(得分:-1)

无需在unicode字符串上使用str.decode。如评论中所述,您可能希望使用search,因为match仅匹配目标字符串的开头。

>>> print search("\d+", u'\ufeff749130').group()
749130