Question

我有以下格式的CSV文件：

            Level1_head1    Level1_head2    Level1_head3
            Level2_head1    Level2_head2    Level2_head3 
ID
S0000001    someValue       someValue       someValue       
S0000002    someValue       someValue       someValue       
S0000003    someValue       someValue       someValue       
S0000004    someValue       someValue       someValue       
S0000005    someValue       someValue       someValue

请注意，ID上方的单元格为空，而ID右侧的单元格也为空。

我已将上述数据放在Python Dataframe对象df中，并尝试从中提取列ID：

df = pd.read_csv("data.csv", header=[0,1], index_col=0)
date_series = df[0:]

但是，我已经获得了整个数据帧而不是单列。输出数据帧时，显示如下：

            Level2_head1    Level2_head2    Level2_head3 
ID
S0000001    someValue       someValue       someValue       
S0000002    someValue       someValue       someValue       
S0000003    someValue       someValue       someValue       
S0000004    someValue       someValue       someValue       
S0000005    someValue       someValue       someValue

我也尝试过：

date_series = df['ID']

和

date_series = df.ID

但是，对于前者，我发现了一个关键错误，df无法找到值为“ID＆＃39;”的密钥。对于后者，我收到错误消息称df没有属性＆＃39; ID＆＃39;。

我现在完全糊涂了。如何检索包含ID的第一列（ID）？

Answer 1

您无法使用date_series = df['ID']，因为ID是name的{{1}}。

但是使用index.to_series获取第一列index到index：

Series

或使用print df Level1_head1 Level1_head2 Level1_head3 Level2_head1 Level2_head2 Level2_head3 ID S0000001 someValue someValue someValue S0000002 someValue someValue someValue S0000003 someValue someValue someValue S0000004 someValue someValue someValue S0000005 someValue someValue someValue print df.index.name ID print df.index Index([u'S0000001', u'S0000002', u'S0000003', u'S0000004', u'S0000005'], dtype='object', name=u'ID') print df.index.to_series() ID S0000001 S0000001 S0000002 S0000002 S0000003 S0000003 S0000004 S0000004 S0000005 S0000005 Name: ID, dtype: object #if you need reset index print df.index.to_series().reset_index(drop=True) 0 S0000001 1 S0000002 2 S0000003 3 S0000004 4 S0000005 Name: ID, dtype: object的解决方案：

pd.Series

如何从具有多层标题的Python Dataframe中检索数据？

1 个答案: