Question

我有一个由datetime索引的数据框df1，其中包含每周几分钟的条目样品：

           SAMPLE_TIME       Bottom     Top      Out     state                                                                    
0  2015-07-15 16:41:56      48.625   55.812   43.875        1       
1  2015-07-15 16:42:55      48.750   55.812   43.875        1     
2  2015-07-15 16:43:55      48.937   55.812   43.875        1       
3  2015-07-15 16:44:56      49.125   55.812   43.812        1      
4  2015-07-15 16:45:55      49.312   55.812   43.812        1

我想找到具有最低平均值（TempBottom，TempTop）的日子，然后按分钟获取整天的数据，以便我可以绘制那天，我试过：

df2 = df1.groupby(pd.TimeGrouper('D')).agg(min) \
.sort(['TempTop','TempBottom'], ascending=[True,True])

这给了我订购的最低温度天数。样品：

SAMPLE_TIME       Bottom     Top      Out     state                                                                    
2015-10-17       19.994   25.840   21.875        0       
2015-08-29       26.182   28.777   25.937        0       
2015-11-19       19.244   33.027   28.937        0        
2015-11-07       19.744   33.527   28.125        0

然后我虽然我需要的是从df2获取第一个条目的索引：

 df1[df2.index[1]]

但是我收到了一个错误：

KeyError: Timestamp('2015-08-29 00:00:00')

Answer 1

来自docs：

警告

以下选择将引发KeyError;否则这种选择方法将与pandas中的其他选择方法不一致（因为这不是一个切片，也不是解决方案）

dft['2013-1-15 12:30:00']

要选择单行，请使用.loc

In [71]: dft.loc['2013-1-15 12:30:00'] Out[71]: A 0.193284 Name: 2013-01-15 12:30:00, dtype: float64

所以你需要在你的情况下使用loc方法：

In [103]: df1.loc[df2.index[0]] Out[103]: SAMPLE_TIME TempBottom TempTop TempOut State Bypass 2015-07-15 16:41:56 48.625 55.812 43.875 1 1 2015-07-15 16:42:55 48.750 55.812 43.875 1 1 2015-07-15 16:43:55 48.937 55.812 43.875 1 1 2015-07-15 16:44:56 49.125 55.812 43.812 1 1 2015-07-15 16:45:55 49.312 55.812 43.812 1 1

修改

当您传递单个参数时，它尝试使用标签进行访问。但是，当您通过间隔时，它会使用切片。你可以做到传递值+ 1天的技巧：

In [276]: df2.index[0] Out[276]: Timestamp('2015-07-15 00:00:00', offset='D') In [277]: df2.index[0] + 1 Out[277]: Timestamp('2015-07-16 00:00:00', offset='D') In [278]: df1.loc[df2.index[0]: df2.index[0] + 1] Out[278]: TempBottom TempTop TempOut State Bypass SAMPLE_TIME 2015-07-15 16:41:56 48.625 55.812 43.875 1 1 2015-07-15 16:42:55 48.750 55.812 43.875 1 1 2015-07-15 16:43:55 48.937 55.812 43.875 1 1 2015-07-15 16:44:56 49.125 55.812 43.812 1 1 2015-07-15 16:45:55 49.312 55.812 43.812 1 1

<强> EDIT2

或者您可以将date的{{1}}转换为Timestamp：

str

Answer 2

所以这是我做过的思考过程，结合@Anton Protopopov回答：

In [1]: df1.ix[df2]
# call trace
ValueError: Cannot index with multidimensional key

In [2]: df1.ix[df2.index]
out[2]:
SAMPLE_TIME       Bottom     Top      Out     state                                                                    
2015-10-17          NaN      NaN      NaN      NaN        
2015-08-29          NaN      NaN      NaN      NaN         
2015-11-19          NaN      NaN      NaN      NaN        
2015-11-07          NaN      NaN      NaN      NaN         

In [3]: df1.ix[df2.index[4:5]]
Out[3]: 
SAMPLE_TIME       Bottom     Top      Out     state                                                                    
2015-11-04           NaN      NaN      NaN      NaN     

In [33]: df1.loc[df2.index[4:5]]
KeyError: "None of [DatetimeIndex(['2015-11-04'], dtype='datetime64[ns]', name=u'SAMPLE_TIME', freq=None, tz=None)] are in the [index]"

最后我放弃了ix并决定让loc工作，我推荐Anton尝试：

In [4]: df1.loc[df2.index[0].date()]
KeyError: 'the label [2015-11-04] is not in the [index]'

让我以为loc只接受最终有效的字符串：

In [5]: df1.loc[df2.index[4].strftime('%Y-%m-%d')]
Out[5]: 
SAMPLE_TIME              Bottom     Top      Out     state                                                                    
2015-11-04 00:00:22      56.256   56.300   43.750        0     
2015-11-04 00:01:22      56.256   56.300   43.812        0      
2015-11-04 00:02:22      56.256   56.300   43.812        0       
2015-11-04 00:03:22      56.256   56.300   43.812        0

如何从索引匹配中获取一天的全天数据

2 个答案: