我试图保存我在excel表中搜索的一些数据,并且我对一个特定部分有unicode解码问题,其形式如下:
work_info['title'] = Darimān-i afsaradgī : rāhnamā-yi kāmil bira-yi hamah-ʼi khānvādahʹhā
导致错误的代码是:
data.write(b + book + accumulated_books+ 2, 43, work_info['title'])
wb.save('/Users/apple/Downloads/WC Scrape_trialfortwo.csv')
错误是:
UnicodeDecodeError:' ascii'编解码器不能解码位置5中的字节0xc4:序数不在范围内(128)
我尝试了几种不同的编码/解码技术,但到目前为止还没有任何工作。任何建议都将非常感激。
谢谢!
答案 0 :(得分:0)
看起来你正在使用python2,并且python2的unicode / bytes处理导致了这个问题。
>>> s = 'Darimān-i afsaradgī : rāhnamā-yi kāmil bira-yi hamah-ʼi khānvādahʹhā'
>>> wb = Workbook()
>>> ws = wb.add_sheet('test')
>>> ws.write(1, 0, s)
>>> wb.save('test.xls')
Traceback (most recent call last):
...
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc4 in position 5: ordinal not in range(128)
xlwt
假设s
是一个ascii编码的字符串,并尝试将其解码为unicode,但失败了:
>>> s.decode('ascii')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc4 in position 5: ordinal not in range(128)
实际上,s
编码为utf-8:
>>> s.decode('utf-8')
u'Darim\u0101n-i afsaradg\u012b : r\u0101hnam\u0101-yi k\u0101mil bira-yi hamah-\u02bci kh\u0101nv\u0101dah\u02b9h\u0101'
最简单的解决方案可能是将您的工作簿编码为utf-8:
>>> wb = Workbook(encoding='utf-8')
>>> ws = wb.add_sheet('test')
>>> ws.write(1, 0, s)
>>> wb.save('test.xls')
如果您需要更精细的方法,可以在将字符串写入工作表之前将字符串显式解码为unicode:
>>> wb = Workbook()
>>> ws = wb.add_sheet('test')
>>> ws.write(1, 0, s.decode('utf-8'))
>>> wb.save('test.xls')