Question

我有这个确切的问题：https://www.en.adwords-community.com/t5/Basics-for-New-Advertisers/Character-Encoding-used-by-the-editor/td-p/100244（tl; dr：尝试将文件上传到谷歌，包含外国字符，在excel中打开它们看起来很有趣而谷歌因为没有正确编码而拒绝它们）< / p>

我有以下代码。请注意，我已尝试在http响应对象的开头添加字节顺序标记，并尝试将所有字符串编码为utf-8。

<some code where workbook is created and populated via xlwt>
output = StringIO.StringIO()
workbook.save(output)
wb = open_workbook(file_contents=output.getvalue())
sheet = wb.sheet_by_name(spreadsheet)

response = HttpResponse(content_type='text/csv')
response['Content-Disposition'] = 'attachment; filename='+(account.name+'-'+spreadsheet).replace(',', '')+'.csv'
response.write('\xEF\xBB\xBF')
writer = csv.writer(response)
for rownum in xrange(sheet.nrows):
    newRow = []
    for s in sheet.row_values(rownum):
        if isinstance(s,unicode):
            newRow.append(s.encode("utf-8"))
        elif isinstance(s, float):
            newRow.append(int(s))
        else:
            newRow.append(s.decode('utf-8'))
    writer.writerow(newRow)
return response

但是在Excel中打开它们时仍然看起来不对劲！为什么呢？

Answer 1

您希望始终编写编码数据，但对于字符串值，您解码到Unicode值：

else:
    newRow.append(s.decode('utf-8'))

在这种情况下，您的Web框架很可能会将该数据编码为Latin-1。

只需附加值而不用解码：

for s in sheet.row_values(rownum):
    if isinstance(s, unicode):
        s = s.encode("utf-8"))
    elif isinstance(s, float):
        s = int(s)
    newRow.append(s)

进一步提示：

在响应标头中传递字符集也是一个好主意：

response = HttpResponse(content_type='text/csv; charset=utf-8')

使用codecs.BOM_UTF8编写BOM而不是硬编码值。更不容易出错。
```
response.write(codecs.BOM_UTF8)
```

Answer 2

每当您将Unicode字符串写入文件或流时，都必须对其进行编码。您可以自己进行编码，也可以让各种模块和库函数尝试为您完成。如果您不确定将为您选择哪种编码，并且知道您想要编写哪种编码，那么自己进行编码会更好。

当您在输入中遇到Unicode字符串时，您已经遵循了此建议。但是，当您遇到已编码为UTF-8的字符串时，您decode将其恢复为Unicode！这导致在writerow中完成了反向转换，显然它没有选择utf-8作为默认编码。通过单独留下字符串而不是解码它，writerow将完全按照您的意图将其写出来。

无法正确编码csv文件？

2 个答案: