Question

我正在使用python2.7。我不能使用python 3.我写这个将excel电子表格转换为csv。它为“u2013”抛出了一个错误，这是一个“冲刺”字符。在perl中 - 你可以使用open命令以unicode加载文件，但我不知道如何在python中执行此操作。

#!/home/casper/python/core/2.7.14/exec/bin/python2.7
# -*- coding: utf-8 -*-
import openpyxl
import csv

wb = openpyxl.load_workbook('RiskLimitSnapshot.xlsx')
sh = wb.get_active_sheet()
with open('goodRiskLimitSnapshot.csv', 'wb') as f: 
    c = csv.writer(f)
    for r in sh.rows:
        c.writerow([cell.value for cell in r])

错误：

Traceback (most recent call last):
  File "/home/casper/pyExceltoCSV", line 16, in <module>
    c.writerow([cell.value for cell in r])
UnicodeEncodeError: 'ascii' codec can't encode character u'\u2013' in position 74: ordinal not in range(128)

我将脚本更改为使用io.open：

wb = openpyxl.load_workbook('DailyETRiskLimitSnapshot.xlsx' ,   data_only=True)
sh = wb.get_active_sheet()
    with io.open('goodDailyETRiskLimitSnapshot.csv', 'w', encoding='utf8') as f:
    c = csv.writer(f, dialect='excel')
    for r in sh.rows:
        c.writerow([cell.value for cell in r])

但是它会抛出一个不同的错误：

Traceback (most recent call last):
  File "./pyExceltoCVS.py", line 20, in <module>
    c.writerow([cell.value for cell in r])
TypeError: write() argument 1 must be unicode, not str

Answer 1

为编码输出打开文件的正确方法是使用io模块：

import io

with io.open('goodRiskLimitSnapshot.csv', 'w', encoding='utf8') as f: 
    c = csv.writer(f)
    for r in sh.rows:
        c.writerow([cell.value for cell in r])

Answer 2

您是否尝试过使用dotnet publish -c release -r win10-x64：

pandas

或者，您可以使用wb = pd.read_excel('RiskLimitSnapshot.xlsx') #you can specify the sheet name using sheetname argument wb.to_csv('goodRiskLimitSnapshot.csv', encoding='utf-8')并执行：

codecs

Answer 3

使用默认编码作为unicode，Python3使事情变得更容易。但是在Python2中，您将获得默认的str和不同的unicode表示

现在考虑en-dash = –的情况，看起来与普通-相似，但不是。

让我们启动一个python 2.7控制台并查看差异

>>> val_str = '–'
>>> val_str
'\xe2\x80\x93'

以上是en-dash由str表示的方式。适用于unicode

>>> val_unicode = u'–'
>>> val_unicode
u'\u2013'

现在让我们尝试使用不同的组合将这些文件写入csv文件

# -*- coding: utf-8 -*-

import csv
import io

val_str = '–'
val_unicode = u'–'


def try_writing_csv(filename, data, mode='w', **kwargs):
    try:
        with io.open(filename, mode=mode, **kwargs) as f:
            c = csv.writer(f, dialect='excel')
            c.writerow([data])
    except Exception, ex:
        print("failed to write - " + filename)


try_writing_csv("ascii1.csv", val_str)
try_writing_csv("ascii2.csv", val_str, encoding="utf8")
try_writing_csv("ascii3.csv", val_str.decode('utf8'), encoding="utf8")

try_writing_csv("unicode1.csv", val_unicode)
try_writing_csv("unicode2.csv", val_unicode, encoding="utf8")
try_writing_csv("unicode3.csv", val_unicode.encode('utf8'), encoding="utf8")

现在让我们运行相同的

failed to write - ascii1.csv
failed to write - ascii2.csv
failed to write - ascii3.csv
failed to write - unicode1.csv
failed to write - unicode2.csv
failed to write - unicode3.csv

由于所有方法都失败，结果令人难以招架。所以我们需要看看它有什么问题。让我们再做一些试验

try_writing_csv("ascii4.csv", val_str.decode('utf8'), mode="wb")
try_writing_csv("ascii5.csv", val_str, mode="utf8")
try_writing_csv("ascii6.csv", val_str.decode('utf8').encode('utf8'), mode="wb")

try_writing_csv("unicode4.csv", val_unicode, mode="wb")
try_writing_csv("unicode5.csv", val_unicode.encode('utf8'), mode='wb')

现在运行会产生输出

failed to write - ascii1.csv
failed to write - ascii2.csv
failed to write - ascii3.csv
failed to write - ascii4.csv
failed to write - ascii5.csv
failed to write - unicode1.csv
failed to write - unicode2.csv
failed to write - unicode3.csv
failed to write - unicode4.csv

所以ascii6.csv和unicode.csv实际上是成功的。我们也检查文件

看起来我们确实为这两个文件做了正确的选择。所以最后两个有效的陈述在

之下

try_writing_csv("ascii6.csv", val_str.decode('utf8').encode('utf8'), mode="wb")
try_writing_csv("unicode5.csv", val_unicode.encode('utf8'), mode='wb')

所以关键学习

打开文件时不要使用encoding=utf8
使用二进制模式写入文件
如果是str，那么decode为utf8，然后编码为utf8
如果是unicode，那么encode为utf8

然后现在是解释时间，你可以从下面的SO线程中获得

2.7 CSV module wants unicode, but doesn't want unicode

如果您尝试写出unicode数据，则必须在将数据传递给csv.writer()对象之前对其进行编码。 csv module examples section包括在编写之前使用Unicode进行编码的代码。

Answer 4

你试试吗？

items

Answer 5

Python2 csv库不支持良好的unicode。您是否考虑过使用图书馆unicodecsv或backports.csv？

干杯！

Answer 6

该错误告诉您正在尝试将python unicode对象写入文件，并且默认的ASCII编解码器无法对其进行编码。隐式调用的编码将python字符串/ unicode对象转换为字节。您应该使用所需的编码自己完成 - utf-8在您的情况下：

更改该行：

c.writerow([cell.value for cell in r])

为：

c.writerow([cell.value.encode('utf-8') for cell in r])

如果没有明确指定所需的编码，则使用默认值，并且您的编写器调用的变体可以写为：c.writerow([cell.value.encode('ascii') for cell in r])，当然会引发UnicodeEncodeError，因为您有unicode字符串。

您可以使用以下代码检查默认编码：

import sys
sys.getdefaultencoding()

u2013错误与openpyxl - python 2.7

6 个答案: