我正在检索以下数据帧的cummax()值,
exit_price trend netgain high low MFE_pr
exit_time
2000-02-01 01:00:00 1400.25 -1 1.00 1401.50 1400.25 1400.25
2000-02-01 01:30:00 1400.75 -1 0.50 1401.00 1399.50 1399.50
2000-02-01 02:00:00 1400.00 -1 1.25 1401.00 1399.75 1399.50
2000-02-01 02:30:00 1399.25 -1 2.00 1399.75 1399.25 1399.25
2000-02-01 03:00:00 1399.50 -1 1.75 1400.00 1399.50 1399.25
2000-02-01 03:30:00 1398.25 -1 3.00 1399.25 1398.25 1398.25
2000-02-01 04:00:00 1398.75 -1 2.50 1399.00 1398.25 1398.25
2000-02-01 04:30:00 1400.00 -1 1.25 1400.25 1399.00 1398.25
2000-02-01 05:00:00 1400.25 -1 1.00 1400.50 1399.25 1398.25
2000-02-01 05:30:00 1400.50 -1 0.75 1400.75 1399.50 1398.25
使用以下公式
trade ['MFE_pr'] = np.nan
trade ['MFE_pr'] = trade ['MFE_pr'].where(trade ['trend']<0, trade.high.cummax())
trade ['MFE_pr'] = trade ['MFE_pr'].where(trade ['trend']>0, trade.low.cummin())
现在我想检索每行的cummax()行的时间戳。
我一直在尝试以下方法:
trade['timestamp']= trade.index
trade ['MFE_ts'] = trade.groupby('MFE_pr')['timestamp'].first()
但我收到了结果:
exit_price trend netgain high low MFE_pr \
exit_time
2000-02-01 01:00:00 1400.25 -1 1.00 1401.50 1400.25 1400.25
2000-02-01 01:30:00 1400.75 -1 0.50 1401.00 1399.50 1399.50
2000-02-01 02:00:00 1400.00 -1 1.25 1401.00 1399.75 1399.50
2000-02-01 02:30:00 1399.25 -1 2.00 1399.75 1399.25 1399.25
2000-02-01 03:00:00 1399.50 -1 1.75 1400.00 1399.50 1399.25
2000-02-01 03:30:00 1398.25 -1 3.00 1399.25 1398.25 1398.25
2000-02-01 04:00:00 1398.75 -1 2.50 1399.00 1398.25 1398.25
2000-02-01 04:30:00 1400.00 -1 1.25 1400.25 1399.00 1398.25
2000-02-01 05:00:00 1400.25 -1 1.00 1400.50 1399.25 1398.25
2000-02-01 05:30:00 1400.50 -1 0.75 1400.75 1399.50 1398.25
timestamp MFE_ts
exit_time
2000-02-01 01:00:00 2000-02-01 01:00:00 NaT
2000-02-01 01:30:00 2000-02-01 01:30:00 NaT
2000-02-01 02:00:00 2000-02-01 02:00:00 NaT
2000-02-01 02:30:00 2000-02-01 02:30:00 NaT
2000-02-01 03:00:00 2000-02-01 03:00:00 NaT
2000-02-01 03:30:00 2000-02-01 03:30:00 NaT
2000-02-01 04:00:00 2000-02-01 04:00:00 NaT
2000-02-01 04:30:00 2000-02-01 04:30:00 NaT
2000-02-01 05:00:00 2000-02-01 05:00:00 NaT
2000-02-01 05:30:00 2000-02-01 05:30:00 NaT
我做错了什么?
答案 0 :(得分:3)
现在,它计算并返回每个组中第一个值的结果。
trade.groupby('MFE_pr')['timestamp'].first()
MFE_pr
1398.25 2000-02-01 03:30:00
1399.25 2000-02-01 02:30:00
1399.50 2000-02-01 01:30:00
1400.25 2000-02-01 01:00:00
Name: timestamp, dtype: datetime64[ns]
因此,当您尝试将此重新索引回原始DF
(通过将此值分配给新列)时,会导致创建NaTs
,因为它们没有公共索引重新索引:
trade.groupby('MFE_pr')['timestamp'].first().reindex(trade.index)
exit_time
2000-02-01 01:00:00 NaT
2000-02-01 01:30:00 NaT
2000-02-01 02:00:00 NaT
2000-02-01 02:30:00 NaT
2000-02-01 03:00:00 NaT
2000-02-01 03:30:00 NaT
2000-02-01 04:00:00 NaT
2000-02-01 04:30:00 NaT
2000-02-01 05:00:00 NaT
2000-02-01 05:30:00 NaT
Name: timestamp, dtype: datetime64[ns]
您需要transform
而是将这些计算值累积应用于分组系列中的所有行,从而保持原始DF
的形状完好无损:
trade['MFE_ts'] = trade.groupby('MFE_pr')['timestamp'].transform('first')
trade
答案 1 :(得分:1)
您需要分配到新的DataFrame
,因为first
汇总数据。如果指定新列,因为索引是从列MFE_pr
创建的,而原始列是DatetimeIndex
,则它不匹配并获取NaT
:
trade1 = trade.groupby('MFE_pr', as_index=False)['timestamp'].first()
print (trade1)
MFE_pr timestamp
0 1398.25 2000-02-01 03:30:00
1 1399.25 2000-02-01 02:30:00
2 1399.50 2000-02-01 01:30:00
3 1400.25 2000-02-01 01:00:00
您还可以使用to_series
将index
转换为Series
,然后使用groupby
列MFE_pr
:
trade1 = trade.index.to_series().groupby([trade['MFE_pr']]).first().reset_index()
print (trade1)
MFE_pr exit_time
0 1398.25 2000-02-01 03:30:00
1 1399.25 2000-02-01 02:30:00
2 1399.50 2000-02-01 01:30:00
3 1400.25 2000-02-01 01:00:00
如果需要第一列可能的解决方案使用transform
- 输出为Series
,其长度与原始DataFrame
相同:
trade['MFE_ts'] = trade.index.to_series().groupby([trade['MFE_pr']]).transform('first')
print (trade)
exit_price trend netgain high low MFE_pr \
exit_time
2000-02-01 01:00:00 1400.25 -1 1.00 1401.50 1400.25 1400.25
2000-02-01 01:30:00 1400.75 -1 0.50 1401.00 1399.50 1399.50
2000-02-01 02:00:00 1400.00 -1 1.25 1401.00 1399.75 1399.50
2000-02-01 02:30:00 1399.25 -1 2.00 1399.75 1399.25 1399.25
2000-02-01 03:00:00 1399.50 -1 1.75 1400.00 1399.50 1399.25
2000-02-01 03:30:00 1398.25 -1 3.00 1399.25 1398.25 1398.25
2000-02-01 04:00:00 1398.75 -1 2.50 1399.00 1398.25 1398.25
2000-02-01 04:30:00 1400.00 -1 1.25 1400.25 1399.00 1398.25
2000-02-01 05:00:00 1400.25 -1 1.00 1400.50 1399.25 1398.25
2000-02-01 05:30:00 1400.50 -1 0.75 1400.75 1399.50 1398.25
MFE_ts
exit_time
2000-02-01 01:00:00 2000-02-01 01:00:00
2000-02-01 01:30:00 2000-02-01 01:30:00
2000-02-01 02:00:00 2000-02-01 01:30:00
2000-02-01 02:30:00 2000-02-01 02:30:00
2000-02-01 03:00:00 2000-02-01 02:30:00
2000-02-01 03:30:00 2000-02-01 03:30:00
2000-02-01 04:00:00 2000-02-01 03:30:00
2000-02-01 04:30:00 2000-02-01 03:30:00
2000-02-01 05:00:00 2000-02-01 03:30:00
2000-02-01 05:30:00 2000-02-01 03:30:00