使用带有pandas的引号读取csv文件

时间:2018-02-19 15:01:54

标签: python python-3.x pandas csv

我有麻烦用pandas正确读取csv文件。我已经搜索了我的问题的解决方案,但我找不到。

所以,我的文件包含有关轨道的信息,但它的结构有点特殊。这是标题结构,大多数行都遵循它。

artist, trackname, albumname, tracknum, year, mp3genre
Anton Cosmo, Cry, The In Between, 12, 2010, Electro Rock

但有些行的结构与此类似:

artist, trackname, albumname, tracknum, year, mp3genre
Anne Garner, "Home, Outbound","Long Journey Here, 1, 2011, Electronica  

我已经尝试了很多方法将这个csv文件读入pandas DataFrame,但我没有成功。我想:

df = pd.read_csv("songs.csv", quotechar="\"")

可以工作,但它给了我一行这样的一句话:

artist                                Anne Garner
trackname    "Home, Outbound" , Long Journey Here
albumname                                       1
tracknum                                     2011
year                                  Electronica
mp3genre                                      NaN

而不是:

artist                                Anne Garner
trackname                          Home, Outbound
albumname                       Long Journey Here                   
tracknum                                        1
year                                         2011
mp3genre                              Electronica

您是否知道如何正确阅读?

提前致谢

1 个答案:

答案 0 :(得分:0)

如上所述,您可以尝试在阅读时用""替换"。一种方法是使用csv.reader()在将数据传递给Pandas之前预先解析数据。例如:

import pandas as pd
import csv
import io

with open('songs.csv') as f_input:
    data = [next(csv.reader(io.StringIO(line.replace('""', '"')))) for line in f_input]

print(pd.DataFrame(data[1:], columns=data[0]))

这会给你:

        artist       trackname          albumname tracknum   year      mp3genre
0  Anton Cosmo             Cry     The In Between       12   2010  Electro Rock
1  Anne Garner  Home, Outbound  Long Journey Here        1   2011   Electronica