我已经设法使它与for循环一起使用,但是在我正在使用的大型数据集上,这非常慢,所以我试图找到一种方法来使用pandas,groupby,apply和lamda函数。
import pandas as pd
example_df = pd.DataFrame({"scen": [1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2],
"cusip": ['031162CF5', '031162CF5', '031162CF5', '031162CF5', '38141GWM2', '38141GWM2', '38141GWM2', '38141GWM2', '031162CF5', '031162CF5', '031162CF5', '031162CF5', '38141GWM2', '38141GWM2', '38141GWM2', '38141GWM2'],
"wal": [50, 55, 60, 65, 40, 50, 60, 70, 40, 45, 50, 55, 30, 40, 50, 60],
"par_val": [900000, 800000, 700000, 600000, 900000, 800000, 700000, 600000, 900000, 800000, 700000, 600000, 900000, 800000, 700000, 600000],
"prin_cf": [0, 100000, 100000, 100000, 0, 100000, 100000, 100000, 0, 100000, 100000, 100000, 0, 100000, 100000, 100000],
"amortization": [166.67, 0, 0, 0, 208.33, 0, 0, 0, 208.33, 0, 0, 0, 277.78, 0, 0, 0],
"book_val": [1000000, 0, 0, 0, 1000000, 0, 0, 0, 1000000, 0, 0, 0, 1000000, 0, 0, 0]})
for x in range(1, len(example_df['scen'])):
if (example_df['cusip'][x] == example_df['cusip'][x-1]):
# If bond matures, don't report book value
if(example_df['par_val'][x] == 0):
example_df['book_val'][x] = 0
else:
example_df['book_val'][x] = example_df['book_val'][x-1] - example_df['amortization'][x-1] - example_df['prin_cf'][x-1]
example_df['amortization'][x] = (example_df['book_val'][x] - example_df['par_val'][x]) / example_df['wal'][x] / 12
example_df
棘手的部分是每一行的帐面价值取决于上一行的摊销价值,而每个摊销值取决于同一行中的帐面价值。从这里对类似问题的回答来看,我认为也许可以使用跟踪先前值的全局变量来做到这一点。
类似的东西:
def calc_bv(prin_cf, par_val, wal):
global bvalue, amort
bvalue = bvalue - amort - prin_cf
amort = (bvalue - par_val)/wal/12
return bvalue, amort
bvalue = example_df.loc[0, 'book_val']
amort = example_df.loc[0, 'amortization']
example_df[1:][['book_val','amortization']] = example_df2[1:].apply(lambda row: calc_bv(row['prev_prin_cf'],row['par_val'],row['wal']), axis=1, result_type="expand")
example_df
答案 0 :(得分:0)
毫无疑问,将有一个基于groupby
的智能熊猫解决方案。但是只需使用numba
重写循环,您就可以获得大约1000倍的不错的性能改善。
# Python 3.6.0, Pandas 0.19.2
assert jpp(df).equals(original(df))
%timeit jpp(df) # 929 µs per loop
%timeit original(df) # 1.05 s per loop
基准代码
原文:
def original(example_df):
for x in range(1, len(example_df['scen'])):
if (example_df['cusip'][x] == example_df['cusip'][x-1]):
# If bond matures, don't report book value
if(example_df['par_val'][x] == 0):
example_df['book_val'][x] = 0
else:
example_df['book_val'][x] = example_df['book_val'][x-1] - example_df['amortization'][x-1] - example_df['prin_cf'][x-1]
example_df['amortization'][x] = (example_df['book_val'][x] - example_df['par_val'][x]) / example_df['wal'][x] / 12
return example_df
Numba:
from numba import njit
@njit
def calculator(cusip, par, book, amort, prin_cf, wal):
n = len(par)
for i in range(1, n):
if cusip[i] == cusip[i-1]:
if par[i] == 0:
book[i] == 0
else:
book[i] = book[i-1] - amort[i-1] - prin_cf[i-1]
amort[i] = (book[i] - par[i]) / wal[i] / 12
return book, amort
def jpp(df):
df['book_val'], df['amortization'] = calculator(pd.factorize(df['cusip'])[0], df['par_val'].values,
df['book_val'].values, df['amortization'].values,
df['prin_cf'].values, df['wal'].values)
return df