我必须在代码中多次进行以下计算(或类似计算),并且运行时间很长。我想知道是否有可能使代码更具pythonic(减少运行时间)。
我正在计算“ loan_size”的权重,该权重与所有具有相同始发月份的其他贷款成比例
loan_plans['weighting'] = loan_plans.loan_size / loan_plans.apply(lambda S: loan_plans.loc[(loan_plans.origination_month == S.origination_month) 'loan_size'].sum(), axis=1)
以下是一组具有所需结果的示例数据:
loan_size origination_month weighting
1000 01-2018 0.25
2000 02-2018 0.2
3000 01-2018 0.75
8000 02-2018 0.8
答案 0 :(得分:1)
更新(每个OP更新):
您的方法没有错;您可以使用groupby
来获取origination_month
的总和,然后进行加权:
loan_plans = loan_plans.reset_index().merge(
loan_plans.groupby("origination_month").loan_size.sum().reset_index(), on="origination_month"
)
loan_plans["weighting"] = loan_plans.loan_size_x / loan_plans.loan_size_y
loan_plans.sort_values("index").set_index("index")
loan_size_x origination_month loan_size_y weighting
index
0 1000 01-2018 4000 0.25
1 2000 02-2018 10000 0.20
2 3000 01-2018 4000 0.75
3 8000 02-2018 10000 0.80
化妆品:
(loan_plans
.sort_values("index")
.set_index("index")
.rename(columns={"loan_size_x": "loan_size"})
.drop("loan_size_y", 1))
loan_size origination_month weighting
index
0 1000 01-2018 0.25
1 2000 02-2018 0.20
2 3000 01-2018 0.75
3 8000 02-2018 0.80
更早的答案
您可以使用div
和sum
,而无需apply
:
loan_plans.loan_size.div(
loan_plans.loc[loan_plans.loan_number.eq(1), "loan_size"].sum()
)
输出:
0 0.024714
1 0.053143
2 0.012143
3 0.010929
4 0.039643
...
数据:
N = 100
data = {"loan_size": np.random.randint(100, 1000, size=N),
"loan_number": np.random.binomial(n=1, p=.3, size=N)}
loan_plans = pd.DataFrame(data)