我想在下面的数据框中的每组“行情指示器”中计算1年,2年和3年的年度股息增长(几何平均值),其中增长率始终是相对于最近时间每个组中的时间段。
我有:
ticker date dividends
0 A 3/31/2019 0.63
1 A 3/31/2018 0.56
2 A 3/31/2017 0.49
3 A 3/31/2016 0.43
4 A 3/31/2015 16.13
5 A 3/31/2014 0.50
6 AAU 12/31/2018 0
7 AAU 12/31/2017 0
8 AAU 12/31/2016 0
9 AAU 12/31/2015 0
10 AAU 12/31/2014 0
11 AB 3/31/2019 2.68
12 AB 3/31/2018 2.30
13 AB 3/31/2017 1.92
14 AB 3/31/2016 1.86
15 AB 3/31/2015 1.86
16 AB 3/31/2014 1.79
17 ADIL 3/31/2019 0
18 ADIL 3/31/2018 0
使用@ anky_91在以下注释中给出的指导:
df2 = df1.assign(div_1yr_cagr=df1.sort_values(['ticker', 'date']).dividends.pct_change(periods=1,
div_2yr_cagr=pow(df1.sort_values(['ticker', 'date']).dividends.pct_change(periods=2) + 1, 0.5) - 1,
div_3yr_cagr=pow(df1.sort_values(['ticker', 'date']).dividends.pct_change(periods=3) + 1, 0.3333) - 1)
有了这个,我得到了以下内容。问题在于报价不到3年的代码组,上面的代码用-1.0填充了单元格,我希望这些代码取值改为NaN(例如当股息为零时)。另外,我只关心每个组中最近日期的增长,是否有Python方式无法在每个组内不计算早于最近日期的增长统计信息?
我得到了:
ticker date dividends div_1yr_cagr div_2yr_cagr div_3yr_cagr
0 A 3/31/2019 0.626 0.113879 0.267206 0.455814
1 A 3/31/2018 0.562 0.137652 0.306977 -0.965158
2 A 3/31/2017 0.494 0.148837 -0.969374 -0.019841
3 A 3/31/2016 0.430 -0.973342 -0.146825 NaN
4 A 3/31/2015 16.130 31.003968 NaN NaN
5 A 3/31/2014 0.504 NaN NaN NaN
6 AAU 12/31/2018 0.000 NaN NaN NaN
7 AAU 12/31/2017 0.000 NaN NaN NaN
8 AAU 12/31/2016 0.000 NaN NaN -1.000000
9 AAU 12/31/2015 0.000 NaN -1.000000 -1.000000
10 AAU 12/31/2014 0.000 -1.000000 -1.000000 -1.000000
11 AB 3/31/2019 2.680 0.165217 0.395833 0.440860
12 AB 3/31/2018 2.300 0.197917 0.236559 0.236559
13 AB 3/31/2017 1.920 0.032258 0.032258 0.072626
14 AB 3/31/2016 1.860 0.000000 0.039106 inf
15 AB 3/31/2015 1.860 0.039106 inf inf
16 AB 3/31/2014 1.790 inf inf inf
17 ADIL 3/31/2019 0.000 NaN -1.000000 -1.000000
18 ADIL 3/31/2018 0.000 -1.000000 -1.000000 -1.000000
但想要:
ticker date dividends div_1yr_cagr div_2yr_cagr div_3yr_cagr
0 A 3/31/2019 0.626 0.113879 0.267206 0.455814
1 A 3/31/2018 0.562 NaN NaN NaN
2 A 3/31/2017 0.494 NaN NaN NaN
3 A 3/31/2016 0.430 NaN NaN NaN
4 A 3/31/2015 16.130 NaN NaN NaN
5 A 3/31/2014 0.504 NaN NaN NaN
6 AAU 12/31/2018 0.000 NaN NaN NaN
7 AAU 12/31/2017 0.000 NaN NaN NaN
8 AAU 12/31/2016 0.000 NaN NaN NaN
9 AAU 12/31/2015 0.000 NaN NaN NaN
10 AAU 12/31/2014 0.000 NaN NaN NaN
11 AB 3/31/2019 2.680 0.165217 0.395833 0.440860
12 AB 3/31/2018 2.300 NaN NaN NaN
13 AB 3/31/2017 1.920 NaN NaN NaN
14 AB 3/31/2016 1.860 NaN NaN NaN
15 AB 3/31/2015 1.860 NaN NaN NaN
16 AB 3/31/2014 1.790 NaN NaN NaN
17 ADIL 3/31/2019 0.000 NaN NaN NaN
18 ADIL 3/31/2018 0.000 NaN NaN NaN
谢谢!
答案 0 :(得分:1)
这是仅计算您关心的问题的解决方案。它基于这样的假设,即每个股票行情每年大约有一行。
df.sort_values(by=["ticker", "date"], ascending=[True, False], inplace=True)
# Find date, dividends, and index of the most recent record for each ticker
# and populate result to the entire dataframe
df["index"] = df.index
df[["rec_date", "rec_div", "rec_idx"]] = df.groupby("ticker").transform("first")
df["offset"] = df["rec_date"].dt.year - df["date"].dt.year # Compute time offset by year
# Copy relevant rows and columns into a new dataframe for further computation
mdf = df.loc[df["offset"].between(1, 3), ["dividends", "rec_div", "offset", "rec_idx"]].copy()
# Compute annualized growth and organize result into desired format
mdf["cagr"] = (mdf["rec_div"] / mdf["dividends"]).pow(1 / mdf["offset"]) - 1
cagr_df = mdf.pivot(index="rec_idx", columns="offset", values="cagr")
cagr_df.columns = ["div_{}yr_cagr".format(i) for i in cagr_df.columns]
# Merge the calculated numbers with original df to get desired output
result_df = df[["ticker", "date", "dividends"]].join(cagr_df)