我的CSV文件包含一个包含16列的标题。数据行包含16个以“,”分隔的值。
刚发现某些行包含""
中包含,
的值。这使解析器感到困惑。它没有找到15个逗号,而是找到18.下面是一个例子:
"23210","Cosmetic","Lancome","Eyes Virtuose Palette Makeup","**7,2g**","W","Decorative range","5x**1,2**g Eye Shadow + **1,2**g Powder","http://image.jpg","","3660732000104","","No","","1","1"
如何让解析器忽略""
中的逗号?
我的代码如下所示:
import pandas
csv1 = pandas.read_csv('Produktlista.csv', quoting=3)
csv2 = pandas.read_csv('Prislista.csv', quoting= 3)
merged = csv1.merge(csv2, on='id')
merged.to_csv("output.csv", index=False, quoting=3)
答案 0 :(得分:2)
传递参数quotechar='"'
。来自Pandas Documentation:
quotechar :str(长度1),可选
用于表示引用项目的开头和结尾的字符。引用的项目可以包括分隔符,它将被忽略。
e.g:
In [9]:
t='''"23210","Cosmetic","Lancome","Eyes Virtuose Palette Makeup","7,2g","W","Decorative range","5x1,2g Eye Shadow + 1,2g Powder","http://image.jpg","","3660732000104","","No","","1","1"'''
df = pd.read_csv(io.StringIO(t), quotechar='"', header=None)
df
Out[9]:
0 1 2 3 4 5 \
0 23210 Cosmetic Lancome Eyes Virtuose Palette Makeup 7,2g W
6 7 8 9 \
0 Decorative range 5x1,2g Eye Shadow + 1,2g Powder http://image.jpg NaN
10 11 12 13 14 15
0 3660732000104 NaN No NaN 1 1