当unicodecsv.DictReader在Python2.7中解析UTF-8-BOM文件时,如何从第一个字段名中删除引号字符?

时间:2017-03-24 04:01:08

标签: python python-2.7 csv unicode python-unicode

问题是当类unicodecsv.DictReader在字段包含引号并且文件以UTF-8-BOM编码时解析CSV文件的字段时,第一个字段保留引号字符,其中所有连续字段都正确地使用它们除去。

示例UTF-8-BOM编码的CSV文件:

"Field1","Field2","Field3"
content1,content2,content3

示例Python代码:

from unicodecsv import DictReader
filename = "/tmp/test.csv"
with open(filename, mode='r') as read_stream:
     reader = DictReader(read_stream, encoding='utf-8-sig')
     print reader.fieldnames

打印价值:

['"Field1"','Field2','Field3']

有没有办法让第一个字段与其他字段一样并删除引号字符?

1 个答案:

答案 0 :(得分:0)

一种方法是自己手动使用BOM(虽然我希望编写的代码演示了底层库中的实际错误,并应添加到他们的issues on github)。使用BOM后,请改用utf-8编解码器。

# My test code to write a file with a BOM
import io
filename = "/tmp/test.csv"
with io.open('test.csv', 'w', encoding='utf-8-sig') as f:
    f.write(u'''\
"Field1","Field2","Field3"
content1,content2,content3
''')

from unicodecsv import DictReader
with open(filename, mode='r') as read_stream:
     # Consume the BOM
     read_stream.read(3)
     reader = DictReader(read_stream, encoding='utf-8')
     print reader.fieldnames