无法在CSV文件中读取Unicode字符串到Python中的DictReader

时间:2015-06-15 20:09:46

标签: python-2.7 csv dictionary unicode codec

我有一个CSV文件,我试图使用DictReader读取。

但这样做:

CLIPS> (clear)
CLIPS> (watch compilations)
CLIPS> (load problem.clp)
Defining deffunction: check-YNoptions-input
Defining deffunction: output-exitmessage
Defining deffunction: ask-question
Defining defrule: UFP +j+j+j

[CSTRCPSR1] Expected the beginning of a construct.
Defining defrule: rule2 +j+j
FALSE
CLIPS> 

给了我一些丑陋的unicode:

with("BeerRatings.csv", "r", "utf-8") as f:
    reader = csv.DictReader(f)
    for line in reader:
        print line

因此,阅读stackoverflow,我使用编解码器模块编辑了我的代码:

{'Rating': '4', 'Brewery': 'Tr\xc3\xb6egs Brewing Company', 'Beer name': 'Tr\xc3\xb6egs Hopback Amber Ale'}
{'Rating': '4.59', 'Brewery': 'Brasserie Dieu Du Ciel', 'Beer name': 'P\xc3\xa9ch\xc3\xa9 Mortel - Bourbon Barrel Aged'} etc.

但这给了我一个import codecs with codecs.open("BeerRatings.csv", "r", "utf-8") as f: reader = csv.DictReader(f) for line in reader: print line

有关如何解决此问题的任何提示?

UPDATE又名更加炫耀:

UnicodeEncodeError: 'ascii' codec can't encode character u'\xea' in position 9: ordinal not in range(128)

这仍然给我一个不太理想的输出...

def UnicodeDictReader(utf8_data, **kwargs):
    csv_reader = csv.DictReader(utf8_data, **kwargs)
    for row in csv_reader:
        yield {key: unicode(value, 'utf-8') for key, value in row.iteritems()}

with open("BeerRatings.csv", "r") as f:
    reader = UnicodeDictReader(f)
    for line in reader:
        print line

1 个答案:

答案 0 :(得分:1)

Python 2.X中的csv模块期望输入文件以二进制形式打开,并且不支持编码。但是,它与UTF-8兼容,但您必须自己解码为Unicode:

import csv

with open('BeerRatings.csv','rb') as f:
    reader = csv.DictReader(f)
    for line in reader:
        for k,v in line.iteritems():
            print k.decode('utf8'),':',v.decode('utf8')
        print

输出:

Rating : 4
Brewery : Tröegs Brewing Company
Beer name : Tröegs Hopback Amber Ale

Rating : 4.59
Brewery : Brasserie Dieu Du Ciel
Beer name : Péché Mortel - Bourbon Barrel Aged

修改

根据您的UnicodeDictReader,您仍然需要像我一样打印键/值对,或者获得dict的默认打印,其中repr()显示转义数据字符串。也以二进制模式打开。这在一些操作系统上很重要,尤其是Windows。

import csv

def UnicodeDictReader(utf8_data, **kwargs):
    csv_reader = csv.DictReader(utf8_data, **kwargs)
    for row in csv_reader:
        yield {key.decode('utf8'):value.decode('utf8') for key, value in row.iteritems()}

def prettydict(D):
    return u'{' + u', '.join(u"'{}': '{}'".format(k,v) for k,v in D.iteritems()) + u'}'

with open("BeerRatings.csv", "rb") as f:
    reader = UnicodeDictReader(f)
    for line in reader:
        print prettydict(line)

输出:

{'Rating': '4', 'Brewery': 'Tröegs Brewing Company', 'Beer name': 'Tröegs Hopback Amber Ale'}
{'Rating': '4.59', 'Brewery': 'Brasserie Dieu Du Ciel', 'Beer name': 'Péché Mortel - Bourbon Barrel Aged'}