问题是当类unicodecsv.DictReader
在字段包含引号并且文件以UTF-8-BOM编码时解析CSV文件的字段时,第一个字段保留引号字符,其中所有连续字段都正确地使用它们除去。
示例UTF-8-BOM编码的CSV文件:
"Field1","Field2","Field3"
content1,content2,content3
示例Python代码:
from unicodecsv import DictReader
filename = "/tmp/test.csv"
with open(filename, mode='r') as read_stream:
reader = DictReader(read_stream, encoding='utf-8-sig')
print reader.fieldnames
打印价值:
['"Field1"','Field2','Field3']
有没有办法让第一个字段与其他字段一样并删除引号字符?
答案 0 :(得分:0)
一种方法是自己手动使用BOM(虽然我希望编写的代码演示了底层库中的实际错误,并应添加到他们的issues on github)。使用BOM后,请改用utf-8编解码器。
# My test code to write a file with a BOM
import io
filename = "/tmp/test.csv"
with io.open('test.csv', 'w', encoding='utf-8-sig') as f:
f.write(u'''\
"Field1","Field2","Field3"
content1,content2,content3
''')
from unicodecsv import DictReader
with open(filename, mode='r') as read_stream:
# Consume the BOM
read_stream.read(3)
reader = DictReader(read_stream, encoding='utf-8')
print reader.fieldnames