我正在尝试使用Python读取.xls文件。该文件包含多个非ascii字符(即äöü)。我已经尝试过使用openpyxls和xlrd(我对xlrd寄予厚望,因为它无论如何都会读取unicode中的所有内容),但都没有工作。
我在尝试从xls打印信息时发现了多个处理编码/解码的答案,但我似乎无法达到目标。只需尝试读取文件后,此脚本就会出错:
import xlrd
workbook = xlrd.open_workbook('export_data.xls')
导致:
Traceback (most recent call last):
File "C:\Users\Administrator\workspace\tufinderxlstoxml\tufinderxlstoxml2.py", line 2, in <module>
workbook = xlrd.open_workbook('export_data.xls')
File "C:\Python27_32\lib\site-packages\xlrd\__init__.py", line 435, in open_workbook
ragged_rows=ragged_rows,
File "C:\Python27_32\lib\site-packages\xlrd\book.py", line 119, in open_workbook_xls
bk.get_sheets()
File "C:\Python27_32\lib\site-packages\xlrd\book.py", line 705, in get_sheets
self.get_sheet(sheetno)
File "C:\Python27_32\lib\site-packages\xlrd\book.py", line 696, in get_sheet
sh.read(self)
File "C:\Python27_32\lib\site-packages\xlrd\sheet.py", line 796, in read
strg = unpack_string(data, 6, bk.encoding or bk.derive_encoding(), lenlen=2)
File "C:\Python27_32\lib\site-packages\xlrd\biffh.py", line 269, in unpack_string
return unicode(data[pos:pos+nchars], encoding)
UnicodeDecodeError: 'ascii' codec can't decode byte 0x92 in position 55: ordinal not in range(128)
WARNING *** OLE2 inconsistency: SSCS size is 0 but SSAT size is non-zero
*** No CODEPAGE record, no encoding_override: will use 'ascii'
*** No CODEPAGE record, no encoding_override: will use 'ascii'
我也试过了:
workbook = xlrd.open_workbook('export_data.xls', encoding_override="utf-8")
导致:
Traceback (most recent call last):
File "C:\Users\Administrator\workspace\tufinderxlstoxml\tufinderxlstoxml2.py", line 2, in <module>
workbook = xlrd.open_workbook('export_data.xls', encoding_override="utf-8")
File "C:\Python27_32\lib\site-packages\xlrd\__init__.py", line 435, in open_workbook
ragged_rows=ragged_rows,
File "C:\Python27_32\lib\site-packages\xlrd\book.py", line 119, in open_workbook_xls
bk.get_sheets()
File "C:\Python27_32\lib\site-packages\xlrd\book.py", line 705, in get_sheets
self.get_sheet(sheetno)
File "C:\Python27_32\lib\site-packages\xlrd\book.py", line 696, in get_sheet
sh.read(self)
File "C:\Python27_32\lib\site-packages\xlrd\sheet.py", line 796, in read
strg = unpack_string(data, 6, bk.encoding or bk.derive_encoding(), lenlen=2)
File "C:\Python27_32\lib\site-packages\xlrd\biffh.py", line 269, in unpack_string
return unicode(data[pos:pos+nchars], encoding)
UnicodeDecodeError: 'utf8' codec can't decode byte 0x92 in position 55: invalid start byte
WARNING *** OLE2 inconsistency: SSCS size is 0 but SSAT size is non-zero
并包括顶部的各种版本:
# -*- coding: utf-8 -*-
我在Windows Server 2008计算机上的python 2.7上运行它。
答案 0 :(得分:1)
感谢大家的反馈!
我最终使用encoding_override函数修复了它。我无法找到cp代码对应德语字符的Microsoft文档,所以我尝试了所有这些。最终我得到了cp1251,它有效!
workbook = xlrd.open_workbook(path, encoding_override="cp1251")
答案 1 :(得分:0)
从我对OOo文档的阅读中,xls使用了unfode的utf_16_le风格,而不是utf8(即每个字符存储的小端只使用两个字节),请尝试:
workbook = xlrd.open_workbook('export_data.xls', encoding_override="utf_16_le")
答案 2 :(得分:0)
有点晚了,但我希望您尝试unicodecsv进行编码。