csv文件中字符串的表示不正确

时间:2016-01-30 02:11:04

标签: windows python-2.7 csv unicode

我在Win7上,Python2.7。

有字符串。 原始视图

  

一个。 P.MøllerMærsk

UTF-8:

s = 'A. P. M\xc3\xb8ller M\xc3\xa6rsk'

我需要在csv中编写它。 试试这个:

with open('14.09 Anbefalte aksjer.csv', 'w') as csvfile:
    writer = csv.writer(csvfile)
    writer.writerow([s])

得到了这个:

  

一个。 P.MГёllerMГ| rsk

尝试使用UnicodeWriter:

class UnicodeWriter:
    """
    A CSV writer which will write rows to CSV file "f",
    which is encoded in the given encoding.
    """

    def __init__(self, f, dialect=csv.excel, encoding="utf-8", **kwds):
        # Redirect output to a queue
        self.queue = StringIO()
        self.writer = csv.writer(self.queue, dialect=dialect, **kwds)
        self.stream = f
        self.encoder = codecs.getincrementalencoder(encoding)()

    def writerow(self, row):
        self.writer.writerow([s.encode("utf-8") for s in row])
        # Fetch UTF-8 output from the queue ...
        data = self.queue.getvalue()
        data = data.decode("utf-8")
        # ... and reencode it into the target encoding
        data = self.encoder.encode(data)
        # write to the target stream
        self.stream.write(data)
        # empty queue
        self.queue.truncate(0)

    def writerows(self, rows):
        for row in rows:
            self.writerow(row)

s = 'A. P. M\xc3\xb8ller M\xc3\xa6rsk'.decode('utf8')
with open('14.09 Anbefalte aksjer.csv', 'w') as csvfile:
    writer = UnicodeWriter(csvfile)
    writer.writerow([s])

又回来了:

  

一个。 P.MГёllerMГ| rsk

尝试unicodecsv:

再次:

  

一个。 P.MГёllerMГ| rsk

怎么了?我该怎么写呢?

2 个答案:

答案 0 :(得分:0)

你看到的是一个mojibake:表示以一种字符编码编码的Unicode文本的字节以另一种(不兼容的)字符编码显示。

如果''.decode('utf8')没有提升AttributeError,则意味着您不在Python 3上(尽管您提出了问题)。在Python 2上,csv不直接支持Unicode,您必须手动编码:

#!/usr/bin/env python2
# -*- coding: utf-8 -*-
from __future__ import unicode_literals

text = "A. P. Møller Mærsk"
with open('14.09 Anbefalte aksjer.csv', 'wb') as csvfile:
    writer = csv.writer(csvfile)
    writer.writerow([text.encode('utf-8')])

如果UnicodeWriter包含未损坏的数据,unicodecsvtext模块也应该正常工作。

答案 1 :(得分:0)

Windows假定使用记事本或Excel等工具进行默认的窗口区域设置编码,因此对于UTF-8,必须在文件的开头编码字节顺序标记(BOM,U + FEFF)。 Python为此提供了utf-8-sig的编码。另请注意,使用#coding:utf8并以UTF-8保存源文件,您可以直接将字符串声明为Unicode字符串。最后,用于csv模块的文件应该在Python 2.7上以wb打开,否则您将看到在Windows上编写换行符时出现问题。

#coding:utf8
import csv
from StringIO import StringIO
import codecs

class UnicodeWriter:
    """
    A CSV writer which will write rows to CSV file "f",
    which is encoded in the given encoding.
    """

    # Use utf-8-sig encoding here.
    def __init__(self, f, dialect=csv.excel, encoding="utf-8-sig", **kwds):
        # Redirect output to a queue
        self.queue = StringIO()
        self.writer = csv.writer(self.queue, dialect=dialect, **kwds)
        self.stream = f
        self.encoder = codecs.getincrementalencoder(encoding)()

    def writerow(self, row):
        self.writer.writerow([s.encode("utf-8") for s in row])
        # Fetch UTF-8 output from the queue ...
        data = self.queue.getvalue()
        data = data.decode("utf-8")
        # ... and reencode it into the target encoding
        data = self.encoder.encode(data)
        # write to the target stream
        self.stream.write(data)
        # empty queue
        self.queue.truncate(0)

    def writerows(self, rows):
        for row in rows:
            self.writerow(row)

s = u'A. P. Møller Mærsk' # declare as Unicode string.
with open('14.09 Anbefalte aksjer.csv', 'wb') as csvfile:
    writer = UnicodeWriter(csvfile)
    writer.writerow([s])

输出:

A. P. Møller Mærsk