计算特定段中的比例加权值:使其更具Pythonic

时间:2018-09-25 17:26:10

标签: python pandas

我必须在代码中多次进行以下计算(或类似计算),并且运行时间很长。我想知道是否有可能使代码更具pythonic(减少运行时间)。

我正在计算“ loan_size”的权重,该权重与所有具有相同始发月份的其他贷款成比例

loan_plans['weighting'] = loan_plans.loan_size / loan_plans.apply(lambda S: loan_plans.loc[(loan_plans.origination_month == S.origination_month) 'loan_size'].sum(), axis=1)

以下是一组具有所需结果的示例数据:

loan_size   origination_month   weighting
1000        01-2018             0.25
2000        02-2018             0.2
3000        01-2018             0.75
8000        02-2018             0.8

1 个答案:

答案 0 :(得分:1)

更新(每个OP更新):
您的方法没有错;您可以使用groupby来获取origination_month的总和,然后进行加权:

loan_plans = loan_plans.reset_index().merge(
    loan_plans.groupby("origination_month").loan_size.sum().reset_index(), on="origination_month"
)
loan_plans["weighting"] = loan_plans.loan_size_x / loan_plans.loan_size_y
loan_plans.sort_values("index").set_index("index")

       loan_size_x origination_month  loan_size_y  weighting
index                                                       
0             1000           01-2018         4000       0.25
1             2000           02-2018        10000       0.20
2             3000           01-2018         4000       0.75
3             8000           02-2018        10000       0.80

化妆品:

(loan_plans
    .sort_values("index")
    .set_index("index")
    .rename(columns={"loan_size_x": "loan_size"})
    .drop("loan_size_y", 1))

       loan_size origination_month  weighting
index                                        
0           1000           01-2018       0.25
1           2000           02-2018       0.20
2           3000           01-2018       0.75
3           8000           02-2018       0.80

更早的答案
您可以使用divsum,而无需apply

loan_plans.loan_size.div(
    loan_plans.loc[loan_plans.loan_number.eq(1), "loan_size"].sum()
)

输出:

0     0.024714
1     0.053143
2     0.012143
3     0.010929
4     0.039643
           ...

数据:

N = 100
data = {"loan_size": np.random.randint(100, 1000, size=N), 
        "loan_number": np.random.binomial(n=1, p=.3, size=N)}
loan_plans = pd.DataFrame(data)