使用groupby和transform来获得每周的idxmax

时间:2017-02-04 21:00:34

标签: python pandas

我想做什么?

获取每周最高价格的日期和时间。

我尝试了什么?

我正在尝试使用groupbytransform来获取idxmax。当transformidxmax一起使用时,我得到的输出是numpy.float64数据但是当我在df上使用idxmax时,它会返回pandas.tslib.Timestamp数据。

测试数据:

rng = pd.date_range('1/1/2011', periods=100, freq='D') 
np.random.seed(4)

stocks = pd.DataFrame({ 
    'price':(np.random.randn(100).cumsum() + 10) },index = rng)

stocks['week_id'] = pd.to_datetime(stocks.index).week #used for the groupby

因此,如果我使用groupby获得max价格,则会返回我期望的数据:

stocks.groupby(stocks['week_id'])['price'].transform('max')

2011-01-01    10.550513
2011-01-02    10.550513
2011-01-03    10.248203
2011-01-04    10.248203.....

如果我尝试获取idxmax所有数据:

stocks['price'].idxmax()

返回pandas.tslib.Timestamp数据:

Timestamp('2011-03-20 00:00:00', freq='D')

但如果我尝试使用via groupby和transform:

stocks.groupby(stocks['week_id'])['price'].transform('idxmax')

我得到numpy.float64数据:

stocks.groupby(stocks['week_id'])['price'].transform('idxmax').head()

    2011-01-01    1.293926e+18
2011-01-02    1.293926e+18
2011-01-03    1.294099e+18
2011-01-04    1.294099e+18
2011-01-05    1.294099e+18
Freq: D, Name: price, dtype: float64

所需的输出是一个新的week_max列,如下所示:

            price   week_id week_high   week_max
2011-01-01  10.050562   52  10.550513   2011-01-02
2011-01-02  10.550513   52  10.550513   2011-01-02
2011-01-03  9.554604    1   10.248203   2011-01-04
2011-01-04  10.248203   1   10.248203   2011-01-04

感谢您的帮助。

1 个答案:

答案 0 :(得分:1)

似乎是错误。

替代解决方案是join aggregate ed maxidxmax

rng = pd.date_range('1/1/2011', periods=15, freq='D') 
np.random.seed(4)

stocks = pd.DataFrame({ 
    'price':(np.random.randn(15).cumsum() + 10) },index = rng)

stocks['week_id'] = pd.to_datetime(stocks.index).week #used for the groupby
#print (stocks)

df1 = stocks.groupby('week_id')['price'].agg(['max', 'idxmax'])
df1.columns = ['week_high', 'week_max']
print (df1)
         week_high   week_max
week_id                      
1        10.248203 2011-01-04
2         8.668974 2011-01-14
52       10.550513 2011-01-02

print (stocks.join(df1, on='week_id'))
                price  week_id  week_high   week_max
2011-01-01  10.050562       52  10.550513 2011-01-02
2011-01-02  10.550513       52  10.550513 2011-01-02
2011-01-03   9.554604        1  10.248203 2011-01-04
2011-01-04  10.248203        1  10.248203 2011-01-04
2011-01-05   9.829901        1  10.248203 2011-01-04
2011-01-06   8.245324        1  10.248203 2011-01-04
2011-01-07   7.597617        1  10.248203 2011-01-04
2011-01-08   8.196192        1  10.248203 2011-01-04
2011-01-09   8.528442        1  10.248203 2011-01-04
2011-01-10   7.380966        2   8.668974 2011-01-14
2011-01-11   7.999635        2   8.668974 2011-01-14
2011-01-12   7.911648        2   8.668974 2011-01-14
2011-01-13   8.336721        2   8.668974 2011-01-14
2011-01-14   8.668974        2   8.668974 2011-01-14
2011-01-15   7.512158        2   8.668974 2011-01-14