Question

我正在尝试使用xlrd读取Excel文件以写入txt文件。除了一些有“Téd”等西班牙语字符的行外，一切都写得很好。我可以使用latin-1编码对其进行编码。然而，对于具有unicode u'\ u2013'的'â'的其他行，代码失败了。 u'\ 2013'无法使用latin-1进行编码。使用UTF-8时，''写得很好，但'Téd'写成'Téd'，这是不可接受的。我该如何纠正这个。

以下代码：

#!/usr/bin/python
import xlrd
import csv
import sys

filePath     = sys.argv[1]

with xlrd.open_workbook(filePath) as wb:
     shNames = wb.sheet_names()
     for shName in shNames:
         sh = wb.sheet_by_name(shName)
         csvFile = shName + ".csv"
         with open(csvFile, 'wb') as f:
              c = csv.writer(f)
              for row in range(sh.nrows):
                  sh_row = []
                  cell = ''
                  for item in sh.row_values(row):
                      if isinstance(item, float):
                         cell=item
                      else:
                         cell=item.encode('utf-8')
                      sh_row.append(cell)
                      cell=''
                  c.writerow(sh_row)
         print shName + ".csv File Created"

Answer 1

Python's csv module

不支持Unicode输入。

您在编写输入之前正确编码输入 - 因此您不需要codecs。只需open(csvFile, "wb")（b很重要）并将该对象传递给编写者：

with open(csvFile, "wb") as f:
    writer = csv.writer(f)
    writer.writerow([entry.encode("utf-8") for entry in row])

或者，unicodecsv是处理编码的csv的替代品。

顺便说一下，检查xlrd单元类型的正确方法是cell.ctype == xlrd.ONE_OF_THE_TYPES。

使用xlrd读取Excel时要使用的编码

1 个答案: