python pandas - groupby.first()返回NaT值

时间:2016-11-26 16:37:30

标签: python pandas

我正在检索以下数据帧的cummax()值,

                     exit_price  trend  netgain     high      low   MFE_pr
exit_time                                                                 
2000-02-01 01:00:00     1400.25     -1     1.00  1401.50  1400.25  1400.25
2000-02-01 01:30:00     1400.75     -1     0.50  1401.00  1399.50  1399.50
2000-02-01 02:00:00     1400.00     -1     1.25  1401.00  1399.75  1399.50
2000-02-01 02:30:00     1399.25     -1     2.00  1399.75  1399.25  1399.25
2000-02-01 03:00:00     1399.50     -1     1.75  1400.00  1399.50  1399.25
2000-02-01 03:30:00     1398.25     -1     3.00  1399.25  1398.25  1398.25
2000-02-01 04:00:00     1398.75     -1     2.50  1399.00  1398.25  1398.25
2000-02-01 04:30:00     1400.00     -1     1.25  1400.25  1399.00  1398.25
2000-02-01 05:00:00     1400.25     -1     1.00  1400.50  1399.25  1398.25
2000-02-01 05:30:00     1400.50     -1     0.75  1400.75  1399.50  1398.25

使用以下公式

trade ['MFE_pr'] = np.nan
trade ['MFE_pr'] = trade ['MFE_pr'].where(trade ['trend']<0, trade.high.cummax())
trade ['MFE_pr'] = trade ['MFE_pr'].where(trade ['trend']>0, trade.low.cummin())

现在我想检索每行的cummax()行的时间戳。

我一直在尝试以下方法:

trade['timestamp']= trade.index
trade ['MFE_ts'] = trade.groupby('MFE_pr')['timestamp'].first() 

但我收到了结果:

                     exit_price  trend  netgain     high      low   MFE_pr  \
exit_time                                                                    
2000-02-01 01:00:00     1400.25     -1     1.00  1401.50  1400.25  1400.25   
2000-02-01 01:30:00     1400.75     -1     0.50  1401.00  1399.50  1399.50   
2000-02-01 02:00:00     1400.00     -1     1.25  1401.00  1399.75  1399.50   
2000-02-01 02:30:00     1399.25     -1     2.00  1399.75  1399.25  1399.25   
2000-02-01 03:00:00     1399.50     -1     1.75  1400.00  1399.50  1399.25   
2000-02-01 03:30:00     1398.25     -1     3.00  1399.25  1398.25  1398.25   
2000-02-01 04:00:00     1398.75     -1     2.50  1399.00  1398.25  1398.25   
2000-02-01 04:30:00     1400.00     -1     1.25  1400.25  1399.00  1398.25   
2000-02-01 05:00:00     1400.25     -1     1.00  1400.50  1399.25  1398.25   
2000-02-01 05:30:00     1400.50     -1     0.75  1400.75  1399.50  1398.25   

                              timestamp MFE_ts  
exit_time                                       
2000-02-01 01:00:00 2000-02-01 01:00:00    NaT  
2000-02-01 01:30:00 2000-02-01 01:30:00    NaT  
2000-02-01 02:00:00 2000-02-01 02:00:00    NaT  
2000-02-01 02:30:00 2000-02-01 02:30:00    NaT  
2000-02-01 03:00:00 2000-02-01 03:00:00    NaT  
2000-02-01 03:30:00 2000-02-01 03:30:00    NaT  
2000-02-01 04:00:00 2000-02-01 04:00:00    NaT  
2000-02-01 04:30:00 2000-02-01 04:30:00    NaT  
2000-02-01 05:00:00 2000-02-01 05:00:00    NaT  
2000-02-01 05:30:00 2000-02-01 05:30:00    NaT 

我做错了什么?

2 个答案:

答案 0 :(得分:3)

现在,它计算并返回每个组中第一个值的结果。

trade.groupby('MFE_pr')['timestamp'].first()
MFE_pr
1398.25   2000-02-01 03:30:00
1399.25   2000-02-01 02:30:00
1399.50   2000-02-01 01:30:00
1400.25   2000-02-01 01:00:00
Name: timestamp, dtype: datetime64[ns]

因此,当您尝试将此重新索引回原始DF(通过将此值分配给新列)时,会导致创建NaTs,因为它们没有公共索引重新索引:

trade.groupby('MFE_pr')['timestamp'].first().reindex(trade.index)
exit_time
2000-02-01 01:00:00   NaT
2000-02-01 01:30:00   NaT
2000-02-01 02:00:00   NaT
2000-02-01 02:30:00   NaT
2000-02-01 03:00:00   NaT
2000-02-01 03:30:00   NaT
2000-02-01 04:00:00   NaT
2000-02-01 04:30:00   NaT
2000-02-01 05:00:00   NaT
2000-02-01 05:30:00   NaT
Name: timestamp, dtype: datetime64[ns]

您需要transform而是将这些计算值累积应用于分组系列中的所有行,从而保持原始DF的形状完好无损:

trade['MFE_ts'] = trade.groupby('MFE_pr')['timestamp'].transform('first') 
trade

enter image description here

答案 1 :(得分:1)

您需要分配到新的DataFrame,因为first汇总数据。如果指定新列,因为索引是从列MFE_pr创建的,而原始列是DatetimeIndex,则它不匹配并获取NaT

trade1 = trade.groupby('MFE_pr', as_index=False)['timestamp'].first() 

print (trade1)
    MFE_pr           timestamp
0  1398.25 2000-02-01 03:30:00
1  1399.25 2000-02-01 02:30:00
2  1399.50 2000-02-01 01:30:00
3  1400.25 2000-02-01 01:00:00

您还可以使用to_seriesindex转换为Series,然后使用groupbyMFE_pr

trade1 = trade.index.to_series().groupby([trade['MFE_pr']]).first().reset_index()
print (trade1)
   MFE_pr           exit_time
0  1398.25 2000-02-01 03:30:00
1  1399.25 2000-02-01 02:30:00
2  1399.50 2000-02-01 01:30:00
3  1400.25 2000-02-01 01:00:00

如果需要第一列可能的解决方案使用transform - 输出为Series,其长度与原始DataFrame相同:

trade['MFE_ts'] = trade.index.to_series().groupby([trade['MFE_pr']]).transform('first')

print (trade)
                     exit_price  trend  netgain     high      low   MFE_pr  \
exit_time                                                                    
2000-02-01 01:00:00     1400.25     -1     1.00  1401.50  1400.25  1400.25   
2000-02-01 01:30:00     1400.75     -1     0.50  1401.00  1399.50  1399.50   
2000-02-01 02:00:00     1400.00     -1     1.25  1401.00  1399.75  1399.50   
2000-02-01 02:30:00     1399.25     -1     2.00  1399.75  1399.25  1399.25   
2000-02-01 03:00:00     1399.50     -1     1.75  1400.00  1399.50  1399.25   
2000-02-01 03:30:00     1398.25     -1     3.00  1399.25  1398.25  1398.25   
2000-02-01 04:00:00     1398.75     -1     2.50  1399.00  1398.25  1398.25   
2000-02-01 04:30:00     1400.00     -1     1.25  1400.25  1399.00  1398.25   
2000-02-01 05:00:00     1400.25     -1     1.00  1400.50  1399.25  1398.25   
2000-02-01 05:30:00     1400.50     -1     0.75  1400.75  1399.50  1398.25   

                                 MFE_ts  
exit_time                                
2000-02-01 01:00:00 2000-02-01 01:00:00  
2000-02-01 01:30:00 2000-02-01 01:30:00  
2000-02-01 02:00:00 2000-02-01 01:30:00  
2000-02-01 02:30:00 2000-02-01 02:30:00  
2000-02-01 03:00:00 2000-02-01 02:30:00  
2000-02-01 03:30:00 2000-02-01 03:30:00  
2000-02-01 04:00:00 2000-02-01 03:30:00  
2000-02-01 04:30:00 2000-02-01 03:30:00  
2000-02-01 05:00:00 2000-02-01 03:30:00  
2000-02-01 05:30:00 2000-02-01 03:30:00