UnicodeDecodeError - 在Excel中解码和保存

时间:2017-12-18 21:20:07

标签: python excel unicode decode encode

我试图保存我在excel表中搜索的一些数据,并且我对一个特定部分有unicode解码问题,其形式如下:

work_info['title'] = Darimān-i afsaradgī : rāhnamā-yi kāmil bira-yi hamah-ʼi khānvādahʹhā

导致错误的代码是:

data.write(b + book + accumulated_books+ 2, 43, work_info['title'])
wb.save('/Users/apple/Downloads/WC Scrape_trialfortwo.csv')

错误是:

  
    

UnicodeDecodeError:' ascii'编解码器不能解码位置5中的字节0xc4:序数不在范围内(128)

  

我尝试了几种不同的编码/解码技术,但到目前为止还没有任何工作。任何建议都将非常感激。

谢谢!

1 个答案:

答案 0 :(得分:0)

看起来你正在使用python2,并且python2的unicode / bytes处理导致了这个问题。

>>> s = 'Darimān-i afsaradgī : rāhnamā-yi kāmil bira-yi hamah-ʼi khānvādahʹhā'
>>> wb = Workbook()
>>> ws = wb.add_sheet('test')
>>> ws.write(1, 0, s)
>>> wb.save('test.xls')
Traceback (most recent call last):
...
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc4 in position 5: ordinal not in range(128)

xlwt假设s是一个ascii编码的字符串,并尝试将其解码为unicode,但失败了:

>>> s.decode('ascii')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc4 in position 5: ordinal not in range(128)

实际上,s编码为utf-8:

>>> s.decode('utf-8') 
u'Darim\u0101n-i afsaradg\u012b : r\u0101hnam\u0101-yi k\u0101mil bira-yi hamah-\u02bci kh\u0101nv\u0101dah\u02b9h\u0101'

最简单的解决方案可能是将您的工作簿编码为utf-8:

>>> wb = Workbook(encoding='utf-8')
>>> ws = wb.add_sheet('test')
>>> ws.write(1, 0, s)
>>> wb.save('test.xls')

如果您需要更精细的方法,可以在将字符串写入工作表之前将字符串显式解码为unicode:

>>> wb = Workbook()
>>> ws = wb.add_sheet('test')
>>> ws.write(1, 0, s.decode('utf-8'))
>>> wb.save('test.xls')