如何从具有多层标题的Python Dataframe中检索数据?

时间:2016-03-06 12:54:46

标签: python pandas dataframe

我有以下格式的CSV文件:

            Level1_head1    Level1_head2    Level1_head3
            Level2_head1    Level2_head2    Level2_head3 
ID
S0000001    someValue       someValue       someValue       
S0000002    someValue       someValue       someValue       
S0000003    someValue       someValue       someValue       
S0000004    someValue       someValue       someValue       
S0000005    someValue       someValue       someValue

请注意,ID上方的单元格为空,而ID右侧的单元格也为空。

我已将上述数据放在Python Dataframe对象df中,并尝试从中提取列ID

df = pd.read_csv("data.csv", header=[0,1], index_col=0)
date_series = df[0:]

但是,我已经获得了整个数据帧而不是单列。输出数据帧时,显示如下:

            Level2_head1    Level2_head2    Level2_head3 
ID
S0000001    someValue       someValue       someValue       
S0000002    someValue       someValue       someValue       
S0000003    someValue       someValue       someValue       
S0000004    someValue       someValue       someValue       
S0000005    someValue       someValue       someValue

我也尝试过:

date_series = df['ID']

date_series = df.ID

但是,对于前者,我发现了一个关键错误,df无法找到值为“ID'”的密钥。对于后者,我收到错误消息称df没有属性' ID'。

我现在完全糊涂了。如何检索包含ID的第一列(ID)?

1 个答案:

答案 0 :(得分:0)

您无法使用date_series = df['ID'],因为IDname的{​​{1}}。

但是使用index.to_series获取第一列indexindex

Series

或使用print df Level1_head1 Level1_head2 Level1_head3 Level2_head1 Level2_head2 Level2_head3 ID S0000001 someValue someValue someValue S0000002 someValue someValue someValue S0000003 someValue someValue someValue S0000004 someValue someValue someValue S0000005 someValue someValue someValue print df.index.name ID print df.index Index([u'S0000001', u'S0000002', u'S0000003', u'S0000004', u'S0000005'], dtype='object', name=u'ID') print df.index.to_series() ID S0000001 S0000001 S0000002 S0000002 S0000003 S0000003 S0000004 S0000004 S0000005 S0000005 Name: ID, dtype: object #if you need reset index print df.index.to_series().reset_index(drop=True) 0 S0000001 1 S0000002 2 S0000003 3 S0000004 4 S0000005 Name: ID, dtype: object 的解决方案:

pd.Series