我有一个类似于下面的csv文件。我正在使用python pandas阅读它。
"Col1","Col2","Col3","Col4","Col5" "XXX","1234","asdf " asdf, asdf","1","1234" "XXX","1234","asdf asdf, asdf","1","1234"
df = pd.read_csv(BytesIO(content),quotechar = '"')
问题在于引用字段"asdf " asdf, asdf"
数据有五列,但由于第一条数据线的性质,read_csv
函数不断看到6列。
如何使用python修复此字符串,以便pandas可以正确读取
更新 这有效...不优雅但足够。
import pandas as pd
from io import BytesIO
with open('C:\\Users\\test\\Desktop\\test.csv') as f:
content = f.read()
content = content[0].replace('"','&&&') + \
content.replace('","','|||')[1:].replace('\n"','\n&&&').replace('"\n','&&&\n')[:-1] + \
content[-1].replace('"','&&&')
content = content.replace('"',' ').replace(',',' ').replace("|||",'","').replace('&&&','"')
#print content
df = pd.read_csv(BytesIO(content),quotechar = '"')
print df