我正在尝试创建一个数据透视表来使用pandas分析数据。我的数据位于没有标题的csv文件(data.csv
)中。通过pandas读取时,我将以下数组附加到文件的顶部:
Labels = ['voter_id_org','State ID','city','ward','pct','name_last','name_first','name_middle','name_suffix','Status,party','Registration Date','Last Registration Date','house_no','pre_dir','street','apartment','zip','birth_date','voter_id','Source','P_05_02_2017','S_12_06_2016','G_11_08_2016','S_08_02_2016','S_06_21_2016','P_03_15_2016','S_12_08_2015','G_11_03_2015','P_09_08_2015','P_05_05_2015','S_02_03_2015','G_11_04_2014','S_08_05_2014','P_05_06_2014','G_11_05_2013','P_10_01_2013','P_09_10_2013','S_08_06_2013','P_05_07_2013','G_11_06_2012','S_08_07_2012','P_03_06_2012','G_11_08_2011','P_09_13_2011','S_08_02_2011','P_05_03_2011','S_02_08_2011','G_11_02_2010','P_09_07_2010','S_08_03_2010','P_05_04_2010','G_11_03_2009','P_09_29_2009','P_09_08_2009','S_08_04_2009','P_05_05_2009','S_02_03_2009','SG_12_23_2008','SG_11_18_2008','G_11_04_2']
但是,我无法通过标签准确引用特定列,因此我的数据透视表是空的。当csv严格以逗号分隔时,我的代码会创建一个数据透视表,所以我认为问题是"
中行之间的data.csv
。如何正确读取此文件以便我可以访问每个列?
data.csv:
547212,OH0014718999,CLEVELAND,03,H,JOHNSON,JAMES,M,,A,NOPTY,01/01/1901,09/19/2016,1500,,DETROIT AVE, APT 505,44113,1959,547212,VOTER PARTICIPATION CENTER,,,Y,,,,,,,,,,,,Y,,,,,Y,,,Y,,,,,Y,,,,Y,,,,,,,,Y,,,,D,,,,,,,,,,Y,,,,,CLEV CSD,CONG 11,HSE 10,SEN 21,CLE MCD,"CCD 07
"
652898,OH0014779218,CLEVELAND,03,Q,WOLSTEIN,JILLIAN,MARCY,,A,NOPTY,01/01/1901,03/22/2017,1055,,OLD RIVER RD, APT 811,44113,1960,652898,5 - RECEIVED IN MAIL,,,Y,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,CLEV CSD,CONG 11,HSE 10,SEN 21,CLE MCD,"CCD 07
"
2417233,OH0020357576,CLEVELAND,07,J,PYNE,DANIEL,J,,I,NOPTY,10/06/2008,10/06/2008,1701,E, 12TH ST, 14Q,44114,1984,2417233,SECRETARY OF STATE S OFFICE,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Y,,,,,,,,Y,,,,,,,,,,,,,,,,,,,CLEV CSD,CONG 11,HSE 10,SEN 21,CLE MCD,"CCD 07
"
2407693,OH0020299723,CLEVELAND,03,H,ANGELO,CELIA,E,,A,NOPTY,10/06/2008,07/08/2015,1500,,DETROIT AVE, APT 102,44113,1985,2407693,5 - RECEIVED IN MAIL,,,Y,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Y,,,,,,,,,,,,,,,,,,,CLEV CSD,CONG 11,HSE 10,SEN 21,CLE MCD,"CCD 07
"
...
我的档案:
def analyzefile(file):
f = pd.read_csv(file,header=None,names=labels)
pt = pd.pivot_table(f,index=['State ID'], aggfunc='count')
print pt
答案 0 :(得分:1)
您无法准确引用数据框中的特定列,因为df.columns
的长度为85,Labels
列表的长度为60.如果您想像这样转动数据框,可以这样做。
df = pd.read_csv('Data.csv',delimiter=',',header=None)
pd.pivot_table(df,index=1,aggfunc='count')
问题不在于"
中行之间的data.csv
,因为它们是该行中最后一项的结束"