计算熊猫数据框中各组内的价格回报

时间:2018-07-12 22:57:44

标签: pandas python-3.5

我有一个数据框df,其中包含以下信息:

DateTime    MDate       Fwd    Type
1/4/2010    2/1/2010    61.17   A
1/5/2010    2/1/2010    59.73   A
1/6/2010    2/1/2010    62.2    A
1/7/2010    2/1/2010    61.1    A
1/8/2010    2/1/2010    60.25   A
1/11/2010   2/1/2010    57.12   A
1/12/2010   2/1/2010    57.35   A
1/13/2010   2/1/2010    58.12   B
1/14/2010   2/1/2010    57.12   B
1/15/2010   2/1/2010    59.38   B
8/1/2013    5/1/2014    57.67   B
8/2/2013    5/1/2014    57.25   B
8/3/2013    5/1/2014    57.9    B
8/4/2013    5/1/2014    59.25   B
8/5/2013    5/1/2014    57.67   B

我要创建以下内容:

DateTime    MDate      Fwd    Type   pctChange 
1/4/2010    2/1/2010    61.17   A   
1/5/2010    2/1/2010    59.73   A    (0.02)
1/6/2010    2/1/2010    62.2    A    0.04 
1/7/2010    2/1/2010    61.1    A    (0.02)
1/8/2010    2/1/2010    60.25   A    (0.01)
1/11/2010   2/1/2010    57.12   A    (0.05)
1/12/2010   2/1/2010    57.35   A    0.00 
1/13/2010   2/1/2010    58.12   B   
1/14/2010   2/1/2010    57.12   B    (0.02)
1/15/2010   2/1/2010    59.38   B    0.04 
8/1/2013    5/1/2014    57.67   B   
8/2/2013    5/1/2014    57.25   B    (0.01)
8/3/2013    5/1/2014    57.9    B    0.01 
8/4/2013    5/1/2014    59.25   B    0.02 
8/5/2013    5/1/2014    57.67   B    (0.03)

我想基于(MDate, Type)组来隔离时间序列并计算pctChgange

因此,在上面的示例中,第一个组的创建如下。它的所有行都具有相同的MDateType

DateTime    MDate      Fwd    Type   pctChange 
1/4/2010    2/1/2010    61.17   A   
1/5/2010    2/1/2010    59.73   A    (0.02)
1/6/2010    2/1/2010    62.2    A    0.04 
1/7/2010    2/1/2010    61.1    A    (0.02)
1/8/2010    2/1/2010    60.25   A    (0.01)
1/11/2010   2/1/2010    57.12   A    (0.05)
1/12/2010   2/1/2010    57.35   A    0.00 

我将pctChange计算为59.73/61.17 - 1 = (0.02)

我正在考虑实施以下版本:

import pandas as pd
df2 = pd.pivot_table(df, index=['MDate', 'Type'], values=['Fwd'], aggfunc=someFunction)

我无法确定我可以为someFunction实现什么功能

1 个答案:

答案 0 :(得分:1)

这应该做到:

df[['MDate', 'DateTime']] = df[['MDate', 'DateTime']].apply(lambda x: pd.to_datetime(x, infer_datetime_format=True))

df['pctChange'] = df.groupby(['MDate', 'Type'])['Fwd'].transform(pd.Series.pct_change).fillna('').apply(lambda x: '({0:.2f})'.format(-x) if x < 0 else '{0:.2f}'.format(x) if x else x)

df

#     DateTime    Fwd      MDate Type pctChange
#0  2010-01-04  61.17 2010-02-01    A          
#1  2010-01-05  59.73 2010-02-01    A    (0.02)
#2  2010-01-06  62.20 2010-02-01    A      0.04
#3  2010-01-07  61.10 2010-02-01    A    (0.02)
#4  2010-01-08  60.25 2010-02-01    A    (0.01)
#5  2010-01-11  57.12 2010-02-01    A    (0.05)
#6  2010-01-12  57.35 2010-02-01    A      0.00
#7  2010-01-13  58.12 2010-02-01    B          
#8  2010-01-14  57.12 2010-02-01    B    (0.02)
#9  2010-01-15  59.38 2010-02-01    B      0.04
#10 2013-08-01  57.67 2014-05-01    B          
#11 2013-08-02  57.25 2014-05-01    B    (0.01)
#12 2013-08-03  57.90 2014-05-01    B      0.01
#13 2013-08-04  59.25 2014-05-01    B      0.02
#14 2013-08-05  57.67 2014-05-01    B    (0.03)

第一行将MDateDateTime转换为datetime,因为我不确定它们的格式是否正确。