Python使用pandas:如何忽略“”中的分隔符?

时间:2015-03-07 10:54:30

标签: python csv pandas

我的CSV文件包含一个包含16列的标题。数据行包含16个以“,”分隔的值。

刚发现某些行包含""中包含,的值。这使解析器感到困惑。它没有找到15个逗号,而是找到18.下面是一个例子:

"23210","Cosmetic","Lancome","Eyes Virtuose Palette Makeup","**7,2g**","W","Decorative range","5x**1,2**g Eye Shadow + **1,2**g Powder","http://image.jpg","","3660732000104","","No","","1","1"

如何让解析器忽略""中的逗号?

我的代码如下所示:

import pandas

csv1 = pandas.read_csv('Produktlista.csv', quoting=3)
csv2 = pandas.read_csv('Prislista.csv', quoting= 3)
merged = csv1.merge(csv2, on='id')
merged.to_csv("output.csv", index=False, quoting=3)

1 个答案:

答案 0 :(得分:2)

传递参数quotechar='"'。来自Pandas Documentation

  

quotechar :str(长度1),可选

     

用于表示引用项目的开头和结尾的字符。引用的项目可以包括分隔符,它将被忽略。

e.g:

In [9]:

t='''"23210","Cosmetic","Lancome","Eyes Virtuose Palette Makeup","7,2g","W","Decorative range","5x1,2g Eye Shadow + 1,2g Powder","http://image.jpg","","3660732000104","","No","","1","1"'''
df = pd.read_csv(io.StringIO(t), quotechar='"', header=None)
df
Out[9]:
      0         1        2                             3     4  5   \
0  23210  Cosmetic  Lancome  Eyes Virtuose Palette Makeup  7,2g  W   

                 6                                7                 8   9   \
0  Decorative range  5x1,2g Eye Shadow + 1,2g Powder  http://image.jpg NaN   

              10  11  12  13  14  15  
0  3660732000104 NaN  No NaN   1   1