UnicodeDecodeError:'ascii'编解码器无法解码位置0中的字节0xa0:序数不在范围内(128)

时间:2016-03-18 14:33:46

标签: python python-2.7

我正在为我正在做的项目抓取Oregon Teacher License data。这是我的代码:

educ_employ = tree.xpath('//tr[15]//td[@bgcolor="#A9EDFC"]//text()')
print educ_employ
#[u'Jefferson Middle School\xa0\xa0(2013 - 2014)']

我想剥去“\ xa0”。这是我的代码:

educ_employ = ([s.strip('\xa0') for s in educ_employ])
print educ_employ
UnicodeDecodeError: 'ascii' codec can't decode byte 0xa0 in position 0: ordinal not in range(128)

我尝试了this

educ_employ = ([s.decode('utf-8').strip('\xa0') for s in educ_employ])
print educ_employ
UnicodeDecodeError: 'ascii' codec can't decode byte 0xa0 in position 0: ordinal not in range(128)

this

import sys

reload(sys)
sys.setdefaultencoding('utf-8')

educ_employ = tree.xpath('//tr[15]//td[@bgcolor="#A9EDFC"]//text()')
educ_employ = ([s.decode('utf-8').strip('\xa0') for s in educ_employ])
print educ_employ
>>>

我没有得到最后一个错误,但我也没有得到输出。我正在使用Python 2.7。有谁知道如何解决这个问题?

1 个答案:

答案 0 :(得分:3)

您正在混合unicode个对象和str个对象。 educ_employunicode,但'\xa0'str

此外,.strip()仅从字符串的开头和结尾删除字符,而不是中间字符。请改为.replace()

尝试:

educ_employ = [u'Jefferson Middle School\xa0\xa0(2013 - 2014)']
educ_employ = [s.replace(u'\xa0', u'') for s in educ_employ]
print educ_employ