Windows命令行中的Python编码:Chcp 932不起作用?

时间:2017-07-29 00:31:25

标签: python windows csv unicode encoding

我已经查看了其他答案,并完成了他们的建议:

1. Changed system locale to Japanese
2. Chcp 932 (Japanese)
3. Python file saved as UTF-8
4. All inputs are subject to the unicode(input, 'utf-8') function as seen below.

注意:我也尝试过使用chcp 65001,但这也不起作用。

我试图用日语读取csv文件,但以下错误不断出现。

Traceback (most recent call last):
...
...
UnicodeEncodeError: 'cp932' codec can't encode character u'\ufeff' in position 0: illegal multibyte sequence

我的代码和示例文件内容:

    def setFood(self):
        reader = self.unicode_csv_reader(open("food.csv"))
        aDict = {}
        for field1, field2 in reader:
            if field2 not in aDict.keys():
                aDict[field2] = [field1]
            else: 
                aDict[field2] += [field1]
        return aDict

    def unicode_csv_reader(self, utf8_data, dialect=csv.excel, **kwargs):
        reader = csv.reader(utf8_data, dialect=dialect, **kwargs)
        for row in reader:
            yield [unicode(cell, 'utf-8') for cell in row]

    def recFood(self, inp):
        print inp
        for key in self.foodDict.keys():
            for value in self.foodDict[key]:
                print(key)
                print(value)

示例csv

ヤクルト,飲み物
カキフライ,洋食
エビフライ,洋食
豚カツ,洋食

1 个答案:

答案 0 :(得分:2)

The example at the bottom of the Python 2.7 csv module documentation是您想要的,但使用utf-8-sig进行编码。 \ufeff是一个字节顺序标记(BOM)字符,如果存在,该编码将正确处理它。

需要在Windows控制台中打印日语系统区域设置。更好的是,切换到Python 3.6,它将使用Unicode API在控制台中打印...您只需要一种支持日语的字体。 Python 3中的csv模块也支持Unicode,效果更好。

import csv, codecs

class UTF8Recoder:
    """
    Iterator that reads an encoded stream and reencodes the input to UTF-8
    """
    def __init__(self, f, encoding):
        self.reader = codecs.getreader(encoding)(f)

    def __iter__(self):
        return self

    def next(self):
        return self.reader.next().encode("utf-8")

class UnicodeReader:
    """
    A CSV reader which will iterate over lines in the CSV file "f",
    which is encoded in the given encoding.
    """

    def __init__(self, f, dialect=csv.excel, encoding="utf-8-sig", **kwds):
        f = UTF8Recoder(f, encoding)
        self.reader = csv.reader(f, dialect=dialect, **kwds)

    def next(self):
        row = self.reader.next()
        return [unicode(s, "utf-8") for s in row]

    def __iter__(self):
        return self

with open('food.csv','rb') as f:
    r = UnicodeReader(f)
    for key,value in r:
        print key,value
ヤクルト 飲み物
カキフライ 洋食
エビフライ 洋食
豚カツ 洋食