Question

我正在尝试用xlrd读取.xlsx。我有一切设置和工作。它适用于具有普通英文字母和数字的数据。然而，当它到达瑞典字母（ÄÅ）时，它给了我这个错误：

print str(sheet.cell_value(1, 2)) + " " + str(sheet.cell_value(1, 3)) + " " + str(sheet.cell_value(1, 4)) + " " + str(sheet.cell_value(1, 5))
UnicodeEncodeError: 'ascii' codec can't encode character u'\xd6' in position 1: ordinal not in range(128)

我的代码：

# -*- coding: cp1252 -*-
import xlrd

file_location = "test.xlsx"

workbook = xlrd.open_workbook(file_location)
sheet = workbook.sheet_by_index(0)

print str(sheet.cell_value(1, 2)) + " " + str(sheet.cell_value(1, 3)) + " " + str(sheet.cell_value(1, 4)) + " " + str(sheet.cell_value(1, 5))

我甚至尝试过：

workbook = xlrd.open_workbook("test.xlsx", encoding_override="utf-8")

以及：

workbook = xlrd.open_workbook("test.xlsx", encoding="utf-8")

编辑：我在Windows 7 64位计算机上运行Python 2.7。

Answer 1

＆＃39; ASCII＆＃39;编解码器无法编码

这里的问题不在于读取文件时的解码，而是打印所需的编码。您的环境使用ASCII进行sys.stdout，因此当您尝试打印任何无法用ASCII编码的Unicode字符时，您将收到该错误。

Documentation reference:

字符编码与平台有关。在Windows下，如果流是交互式的（即，如果其isatty（）方法返回True），则使用控制台代码页，否则使用ANSI代码页。在其他平台下，使用语言环境编码（请参阅locale.getpreferredencoding（））。

但是在所有平台下，您可以通过在启动Python之前设置PYTHONIOENCODING环境变量来覆盖此值。

Answer 2

在打印前尝试将utf-8用作@Anand S Kumar建议字符串和decode字符串。

# -*- coding: utf-8 -*-
import xlrd

file_location = "test.xlsx"

workbook = xlrd.open_workbook(file_location)
sheet = workbook.sheet_by_index(0)

cells = [sheet.cell_value(1, i).decode('utf-8') for i in range(2, 6)]
print ' '.join(cells)

Answer 3

默认情况下，xlrd使用Unicode编码。如果xlrd无法识别编码，那么它会认为excel文件中使用的编码是ASCII，字符编码。最后，如果编码不是ASCII，或者如果python无法将数据转换为Unicode，那么它将引发UnicodeDecodeError。

不要担心我们有解决这类问题的方法。看来你正在使用cp1252。因此，当您使用open_workbook()打开文件时，可以按如下方式调用它：

>>> book = xlrd.open_workbook(filename='filename',encoding_override="cp1252")

当您使用上述功能时，xlrd将解码相应的编码，您将会很高兴源（多个）：

带有xlrd的UnicodeEncodeError

3 个答案: