我尝试使用csv
模块来解析csv文件,但它不处理utf-8编码。
所以我尝试了文档中建议的这些方法:
def unicode_csv_reader(unicode_csv_data, dialect=csv.excel, **kwargs):
# csv.py doesn't do Unicode; encode temporarily as UTF-8:
csv_reader = csv.reader(utf_8_encoder(unicode_csv_data),
dialect=dialect, **kwargs)
for row in csv_reader:
# decode UTF-8 back to Unicode, cell by cell:
yield [unicode(cell, 'utf-8') for cell in row]
def utf_8_encoder(unicode_csv_data):
for line in unicode_csv_data:
yield line.encode('utf-8')
但是,如果我尝试这样使用它:
with open(u'spam1.csv', 'rb') as csvfile:
spamreader = unicode_csv_reader(csvfile, delimiter=',', quotechar='"')
for row in spamreader:
print row
我收到此错误:
yield line.encode('utf-8')
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 15: ordinal not in range(128)
但是如果我用libreoffice打开那个文件,它会打开那个带有utf-8编码的csv文件。
答案 0 :(得分:3)
该代码适用于 unicode值;例如在将数据传递给替换阅读器之前,您需要将数据解码为unicode
。
使用io.open()
将数据读取为Unicode:
import io
with io.open(u'spam1.csv', 'r', encoding='utf8') as csvfile:
spamreader = unicode_csv_reader(csvfile, delimiter=',', quotechar='"')
for row in spamreader:
print row
这基本上暂时将unicode编码为UTF8以供CSV模块处理。
由于您的数据已经编码为UTF8,因此您可以逃脱:
with open(u'spam1.csv', 'rb') as csvfile:
spamreader = csv.reader(csvfile, delimiter=',', quotechar='"')
for row in spamreader:
row = [unicode(cell, 'utf-8') for cell in row]
也是;所以直接从UTF8解码你的行单元格而不先解码为Unicode,然后再次编码为UTF8字节,然后重新解码。