熊猫百分比变化基于日期时间标识的上一列值

时间:2019-06-09 16:16:45

标签: python pandas dataframe pandas-groupby

我想在下面的数据框中的每组“行情指示器”中计算1年,2年和3年的年度股息增长(几何平均值),其中增长率始终是相对于最近时间每个组中的时间段。

我有:

   ticker        date  dividends
0       A   3/31/2019       0.63
1       A   3/31/2018       0.56
2       A   3/31/2017       0.49
3       A   3/31/2016       0.43
4       A   3/31/2015      16.13
5       A   3/31/2014       0.50
6     AAU  12/31/2018          0
7     AAU  12/31/2017          0
8     AAU  12/31/2016          0
9     AAU  12/31/2015          0
10    AAU  12/31/2014          0
11     AB   3/31/2019       2.68
12     AB   3/31/2018       2.30
13     AB   3/31/2017       1.92
14     AB   3/31/2016       1.86
15     AB   3/31/2015       1.86
16     AB   3/31/2014       1.79
17   ADIL   3/31/2019          0
18   ADIL   3/31/2018          0

使用@ anky_91在以下注释中给出的指导:

df2 = df1.assign(div_1yr_cagr=df1.sort_values(['ticker', 'date']).dividends.pct_change(periods=1,
                 div_2yr_cagr=pow(df1.sort_values(['ticker', 'date']).dividends.pct_change(periods=2) + 1, 0.5) - 1,
                 div_3yr_cagr=pow(df1.sort_values(['ticker', 'date']).dividends.pct_change(periods=3) + 1, 0.3333) - 1)

有了这个,我得到了以下内容。问题在于报价不到3年的代码组,上面的代码用-1.0填充了单元格,我希望这些代码取值改为NaN(例如当股息为零时)。另外,我只关心每个组中最近日期的增长,是否有Python方式无法在每个组内不计算早于最近日期的增长统计信息?

我得到了:

   ticker        date  dividends  div_1yr_cagr  div_2yr_cagr  div_3yr_cagr
0       A   3/31/2019      0.626      0.113879      0.267206      0.455814
1       A   3/31/2018      0.562      0.137652      0.306977     -0.965158
2       A   3/31/2017      0.494      0.148837     -0.969374     -0.019841
3       A   3/31/2016      0.430     -0.973342     -0.146825           NaN
4       A   3/31/2015     16.130     31.003968           NaN           NaN
5       A   3/31/2014      0.504           NaN           NaN           NaN
6     AAU  12/31/2018      0.000           NaN           NaN           NaN
7     AAU  12/31/2017      0.000           NaN           NaN           NaN
8     AAU  12/31/2016      0.000           NaN           NaN     -1.000000
9     AAU  12/31/2015      0.000           NaN     -1.000000     -1.000000
10    AAU  12/31/2014      0.000     -1.000000     -1.000000     -1.000000
11     AB   3/31/2019      2.680      0.165217      0.395833      0.440860
12     AB   3/31/2018      2.300      0.197917      0.236559      0.236559
13     AB   3/31/2017      1.920      0.032258      0.032258      0.072626
14     AB   3/31/2016      1.860      0.000000      0.039106           inf
15     AB   3/31/2015      1.860      0.039106           inf           inf
16     AB   3/31/2014      1.790           inf           inf           inf
17   ADIL   3/31/2019      0.000           NaN     -1.000000     -1.000000
18   ADIL   3/31/2018      0.000     -1.000000     -1.000000     -1.000000

但想要:

   ticker        date  dividends  div_1yr_cagr  div_2yr_cagr  div_3yr_cagr
0       A   3/31/2019      0.626      0.113879      0.267206      0.455814
1       A   3/31/2018      0.562           NaN           NaN           NaN
2       A   3/31/2017      0.494           NaN           NaN           NaN
3       A   3/31/2016      0.430           NaN           NaN           NaN
4       A   3/31/2015     16.130           NaN           NaN           NaN
5       A   3/31/2014      0.504           NaN           NaN           NaN
6     AAU  12/31/2018      0.000           NaN           NaN           NaN
7     AAU  12/31/2017      0.000           NaN           NaN           NaN
8     AAU  12/31/2016      0.000           NaN           NaN           NaN
9     AAU  12/31/2015      0.000           NaN           NaN           NaN
10    AAU  12/31/2014      0.000           NaN           NaN           NaN
11     AB   3/31/2019      2.680      0.165217      0.395833      0.440860
12     AB   3/31/2018      2.300           NaN           NaN           NaN
13     AB   3/31/2017      1.920           NaN           NaN           NaN
14     AB   3/31/2016      1.860           NaN           NaN           NaN
15     AB   3/31/2015      1.860           NaN           NaN           NaN
16     AB   3/31/2014      1.790           NaN           NaN           NaN
17   ADIL   3/31/2019      0.000           NaN           NaN           NaN
18   ADIL   3/31/2018      0.000           NaN           NaN           NaN

谢谢!

1 个答案:

答案 0 :(得分:1)

这是仅计算您关心的问题的解决方案。它基于这样的假设,即每个股票行情每年大约有一行。

df.sort_values(by=["ticker", "date"], ascending=[True, False], inplace=True)

# Find date, dividends, and index of the most recent record for each ticker
# and populate result to the entire dataframe
df["index"] = df.index
df[["rec_date", "rec_div", "rec_idx"]] = df.groupby("ticker").transform("first")
df["offset"] = df["rec_date"].dt.year - df["date"].dt.year   # Compute time offset by year

# Copy relevant rows and columns into a new dataframe for further computation
mdf = df.loc[df["offset"].between(1, 3), ["dividends", "rec_div", "offset", "rec_idx"]].copy()

# Compute annualized growth and organize result into desired format
mdf["cagr"] = (mdf["rec_div"] / mdf["dividends"]).pow(1 / mdf["offset"]) - 1
cagr_df = mdf.pivot(index="rec_idx", columns="offset", values="cagr")
cagr_df.columns = ["div_{}yr_cagr".format(i) for i in cagr_df.columns]

# Merge the calculated numbers with original df to get desired output
result_df = df[["ticker", "date", "dividends"]].join(cagr_df)