我有一个由datetime索引的数据框df1
,其中包含每周几分钟的条目
样品:
SAMPLE_TIME Bottom Top Out state
0 2015-07-15 16:41:56 48.625 55.812 43.875 1
1 2015-07-15 16:42:55 48.750 55.812 43.875 1
2 2015-07-15 16:43:55 48.937 55.812 43.875 1
3 2015-07-15 16:44:56 49.125 55.812 43.812 1
4 2015-07-15 16:45:55 49.312 55.812 43.812 1
我想找到具有最低平均值(TempBottom,TempTop)的日子,然后按分钟获取整天的数据,以便我可以绘制那天,我试过:
df2 = df1.groupby(pd.TimeGrouper('D')).agg(min) \
.sort(['TempTop','TempBottom'], ascending=[True,True])
这给了我订购的最低温度天数。 样品:
SAMPLE_TIME Bottom Top Out state
2015-10-17 19.994 25.840 21.875 0
2015-08-29 26.182 28.777 25.937 0
2015-11-19 19.244 33.027 28.937 0
2015-11-07 19.744 33.527 28.125 0
然后我虽然我需要的是从df2获取第一个条目的索引:
df1[df2.index[1]]
但是我收到了一个错误:
KeyError: Timestamp('2015-08-29 00:00:00')
答案 0 :(得分:3)
来自docs:
警告强>
以下选择将引发
KeyError
;否则这种选择方法将与pandas中的其他选择方法不一致(因为这不是一个切片,也不是解决方案)
dft['2013-1-15 12:30:00']
要选择单行,请使用
.loc
In [71]: dft.loc['2013-1-15 12:30:00'] Out[71]: A 0.193284 Name: 2013-01-15 12:30:00, dtype: float64
所以你需要在你的情况下使用loc
方法:
In [103]: df1.loc[df2.index[0]]
Out[103]:
SAMPLE_TIME TempBottom TempTop TempOut State Bypass
2015-07-15 16:41:56 48.625 55.812 43.875 1 1
2015-07-15 16:42:55 48.750 55.812 43.875 1 1
2015-07-15 16:43:55 48.937 55.812 43.875 1 1
2015-07-15 16:44:56 49.125 55.812 43.812 1 1
2015-07-15 16:45:55 49.312 55.812 43.812 1 1
修改强>
当您传递单个参数时,它尝试使用标签进行访问。但是,当您通过间隔时,它会使用切片。你可以做到传递值+ 1天的技巧:
In [276]: df2.index[0]
Out[276]: Timestamp('2015-07-15 00:00:00', offset='D')
In [277]: df2.index[0] + 1
Out[277]: Timestamp('2015-07-16 00:00:00', offset='D')
In [278]: df1.loc[df2.index[0]: df2.index[0] + 1]
Out[278]:
TempBottom TempTop TempOut State Bypass
SAMPLE_TIME
2015-07-15 16:41:56 48.625 55.812 43.875 1 1
2015-07-15 16:42:55 48.750 55.812 43.875 1 1
2015-07-15 16:43:55 48.937 55.812 43.875 1 1
2015-07-15 16:44:56 49.125 55.812 43.812 1 1
2015-07-15 16:45:55 49.312 55.812 43.812 1 1
<强> EDIT2 强>
或者您可以将date
的{{1}}转换为Timestamp
:
str
答案 1 :(得分:2)
所以这是我做过的思考过程,结合@Anton Protopopov
回答:
In [1]: df1.ix[df2]
# call trace
ValueError: Cannot index with multidimensional key
In [2]: df1.ix[df2.index]
out[2]:
SAMPLE_TIME Bottom Top Out state
2015-10-17 NaN NaN NaN NaN
2015-08-29 NaN NaN NaN NaN
2015-11-19 NaN NaN NaN NaN
2015-11-07 NaN NaN NaN NaN
In [3]: df1.ix[df2.index[4:5]]
Out[3]:
SAMPLE_TIME Bottom Top Out state
2015-11-04 NaN NaN NaN NaN
In [33]: df1.loc[df2.index[4:5]]
KeyError: "None of [DatetimeIndex(['2015-11-04'], dtype='datetime64[ns]', name=u'SAMPLE_TIME', freq=None, tz=None)] are in the [index]"
最后我放弃了ix
并决定让loc
工作,我推荐Anton
尝试:
In [4]: df1.loc[df2.index[0].date()]
KeyError: 'the label [2015-11-04] is not in the [index]'
让我以为loc只接受最终有效的字符串:
In [5]: df1.loc[df2.index[4].strftime('%Y-%m-%d')]
Out[5]:
SAMPLE_TIME Bottom Top Out state
2015-11-04 00:00:22 56.256 56.300 43.750 0
2015-11-04 00:01:22 56.256 56.300 43.812 0
2015-11-04 00:02:22 56.256 56.300 43.812 0
2015-11-04 00:03:22 56.256 56.300 43.812 0