我有一个看起来像这样的数据框:
我在末尾附加了一个聚合行,该行计算列的均值,而忽略空值。在这里查看我的代码:
repayments_amt_pivot.loc['Aggregated'] = repayments_amt_pivot.iloc[:, 3:].mean(skipna=True)
但是,我实际上需要计算的百分比的加权总和乘以principal_due_per_month中的比例。
在这种情况下,对于第4个月,第0行将乘以(27,845 / 27,845 + 310,506 + 659,705 + 1,433,121)。
对于第3个月,第4行将乘以(1,941,036 / 27,845 + 310,506 + 659,705 + 1,433,121 + 1,941,036)
等
任何帮助将不胜感激,因为我无法弄清楚这一点!
有关如何在excel中进行计算的屏幕截图,请参见以下内容。
数据:
pd.DataFrame([{'$ Amount Due': 27845.312793586978,
'Month 0': 56.479872661140476,
'Month 1': 92.94027983726657,
'Month 2': 100.00000000000003,
'Month 3': 100.00000000000003,
'Month 4': 100.00000000000003},
{'$ Amount Due': 310505.5597382864,
'Month 0': 78.34839385064039,
'Month 1': 79.58303224427453,
'Month 2': 79.58303224427453,
'Month 3': 81.43498983472573,
'Month 4': 92.54673537743292},
{'$ Amount Due': 659705.2173778547,
'Month 0': 90.79718901057414,
'Month 1': 97.8418387417451,
'Month 2': 97.85768670717538,
'Month 3': 97.85768670717538,
'Month 4': 97.85768670717538},
{'$ Amount Due': 1433121.318250646,
'Month 0': 91.7207168764003,
'Month 1': 94.34283888419282,
'Month 2': 94.51326381568556,
'Month 3': 94.8581612152927,
'Month 4': 94.91544740629973},
{'$ Amount Due': 1941036.1276433321,
'Month 0': 79.75029644420579,
'Month 1': 85.62252846197367,
'Month 2': 86.59251760542142,
'Month 3': 86.70920561577343,
'Month 4': np.nan},
{'$ Amount Due': 3448302.2801859295,
'Month 0': 75.83697471065258,
'Month 1': 83.6700011095642,
'Month 2': 86.16217213969533,
'Month 3': np.nan,
'Month 4': np.nan},
{'$ Amount Due': 3190042.0279137697,
'Month 0': 76.69574360823212,
'Month 1': 85.4625418697537,
'Month 2': np.nan,
'Month 3': np.nan,
'Month 4': np.nan},
{'$ Amount Due': 2614440.2956102462,
'Month 0': 74.87175589142862,
'Month 1': np.nan,
'Month 2': np.nan,
'Month 3': np.nan,
'Month 4': np.nan}])
答案 0 :(得分:2)
我的方法是:
months = df.iloc[:, 1:] # dataframe of months only
due_row = months.where(months.isna(), df['$ Amount Due'], axis=0) # single due values
due_sum = due_row.sum() # summed due values
(months*due_row/due_sum).sum() # sum of product and quotient like requested
#Month 0 78.823057
#Month 1 86.680023
#Month 2 88.573969
#Month 3 90.772494
#Month 4 95.469538
#dtype: float64
,以及是否应将其作为数据的最后一行附加到数据框:
df.loc['Aggregated', df.columns[1:]] = (months*due_row/due_sum).sum().values
# $ Amount Due Month 0 ... Month 3 Month 4
#0 2.784531e+04 56.479873 ... 100.000000 100.000000
#1 3.105056e+05 78.348394 ... 81.434990 92.546735
#2 6.597052e+05 90.797189 ... 97.857687 97.857687
#3 1.433121e+06 91.720717 ... 94.858161 94.915447
#4 1.941036e+06 79.750296 ... 86.709206 NaN
#5 3.448302e+06 75.836975 ... NaN NaN
#6 3.190042e+06 76.695744 ... NaN NaN
#7 2.614440e+06 74.871756 ... NaN NaN
#Aggregated NaN 78.823057 ... 90.772494 95.469538
补充编辑:
这段代码短了一点,IMO几乎可以自我解释了,而且更加干净:
timings:
AnanayMital : 0.0209
SpghttCd : 0.00538
答案 1 :(得分:1)
df.loc[8, "Month 0":] = [(df["Month "+str(i)]*df[df["Month "+str(i)].notnull()]["$ Amount Due"]/df[df["Month "+str(i)].notnull()]["$ Amount Due"].sum()).sum() for i in range(5)]
df
$ Amount Due Month 0 Month 1 Month 2 Month 3 Month 4
0 2.784531e+04 56.479873 92.940280 100.000000 100.000000 100.000000
1 3.105056e+05 78.348394 79.583032 79.583032 81.434990 92.546735
2 6.597052e+05 90.797189 97.841839 97.857687 97.857687 97.857687
3 1.433121e+06 91.720717 94.342839 94.513264 94.858161 94.915447
4 1.941036e+06 79.750296 85.622528 86.592518 86.709206 NaN
5 3.448302e+06 75.836975 83.670001 86.162172 NaN NaN
6 3.190042e+06 76.695744 85.462542 NaN NaN NaN
7 2.614440e+06 74.871756 NaN NaN NaN NaN
8 NaN 78.823057 86.680023 88.573969 90.772494 95.469538