我有麻烦用pandas正确读取csv文件。我已经搜索了我的问题的解决方案,但我找不到。
所以,我的文件包含有关轨道的信息,但它的结构有点特殊。这是标题结构,大多数行都遵循它。
artist, trackname, albumname, tracknum, year, mp3genre
Anton Cosmo, Cry, The In Between, 12, 2010, Electro Rock
但有些行的结构与此类似:
artist, trackname, albumname, tracknum, year, mp3genre
Anne Garner, "Home, Outbound","Long Journey Here, 1, 2011, Electronica
我已经尝试了很多方法将这个csv文件读入pandas DataFrame,但我没有成功。我想:
df = pd.read_csv("songs.csv", quotechar="\"")
可以工作,但它给了我一行这样的一句话:
artist Anne Garner
trackname "Home, Outbound" , Long Journey Here
albumname 1
tracknum 2011
year Electronica
mp3genre NaN
而不是:
artist Anne Garner
trackname Home, Outbound
albumname Long Journey Here
tracknum 1
year 2011
mp3genre Electronica
您是否知道如何正确阅读?
提前致谢
答案 0 :(得分:0)
如上所述,您可以尝试在阅读时用""
替换"
。一种方法是使用csv.reader()
在将数据传递给Pandas之前预先解析数据。例如:
import pandas as pd
import csv
import io
with open('songs.csv') as f_input:
data = [next(csv.reader(io.StringIO(line.replace('""', '"')))) for line in f_input]
print(pd.DataFrame(data[1:], columns=data[0]))
这会给你:
artist trackname albumname tracknum year mp3genre
0 Anton Cosmo Cry The In Between 12 2010 Electro Rock
1 Anne Garner Home, Outbound Long Journey Here 1 2011 Electronica