我有以下格式的CSV文件:
Level1_head1 Level1_head2 Level1_head3
Level2_head1 Level2_head2 Level2_head3
ID
S0000001 someValue someValue someValue
S0000002 someValue someValue someValue
S0000003 someValue someValue someValue
S0000004 someValue someValue someValue
S0000005 someValue someValue someValue
请注意,ID
上方的单元格为空,而ID
右侧的单元格也为空。
我已将上述数据放在Python Dataframe对象df
中,并尝试从中提取列ID
:
df = pd.read_csv("data.csv", header=[0,1], index_col=0)
date_series = df[0:]
但是,我已经获得了整个数据帧而不是单列。输出数据帧时,显示如下:
Level2_head1 Level2_head2 Level2_head3
ID
S0000001 someValue someValue someValue
S0000002 someValue someValue someValue
S0000003 someValue someValue someValue
S0000004 someValue someValue someValue
S0000005 someValue someValue someValue
我也尝试过:
date_series = df['ID']
和
date_series = df.ID
但是,对于前者,我发现了一个关键错误,df
无法找到值为“ID'”的密钥。对于后者,我收到错误消息称df
没有属性' ID'。
我现在完全糊涂了。如何检索包含ID的第一列(ID)?
答案 0 :(得分:0)
您无法使用date_series = df['ID']
,因为ID
是name
的{{1}}。
但是使用index.to_series
获取第一列index
到index
:
Series
或使用print df
Level1_head1 Level1_head2 Level1_head3
Level2_head1 Level2_head2 Level2_head3
ID
S0000001 someValue someValue someValue
S0000002 someValue someValue someValue
S0000003 someValue someValue someValue
S0000004 someValue someValue someValue
S0000005 someValue someValue someValue
print df.index.name
ID
print df.index
Index([u'S0000001', u'S0000002', u'S0000003', u'S0000004', u'S0000005'], dtype='object', name=u'ID')
print df.index.to_series()
ID
S0000001 S0000001
S0000002 S0000002
S0000003 S0000003
S0000004 S0000004
S0000005 S0000005
Name: ID, dtype: object
#if you need reset index
print df.index.to_series().reset_index(drop=True)
0 S0000001
1 S0000002
2 S0000003
3 S0000004
4 S0000005
Name: ID, dtype: object
的解决方案:
pd.Series