我有一个循环太多时间,我想知道是否有更好的方法?或者如果我犯了菜鸟的错误?
我做循环的原因是第一个值不同,需要以前的值。
# create var and set to 0
df [ 'amt_model' ] = 0
# create the cashflow variable
df [ 'cf' ] = df [ 'cash_in' ] - df [ 'cash_out' ] + df [ 'transfer' ]
现在我循环遍历几个月的范围来创建'amt_model'值。
for i in range ( len ( df ) ):
# adjust for the first month
if i == 0:
df [ 'amt_model' ].iloc [ i ] = df [ 'contrib' ].iloc [ i ]
else:
amt1 = df [ 'amt_model' ].iloc [ i - 1 ] * (1 + df [ 'pct_model' ].iloc [ i ])
amt2 = df [ 'cf' ] [ i ] * (1 + df [ 'pct_model' ].iloc [ i ] / 2)
df [ 'amt_model' ].iloc [ i ] = amt1 + amt2
这花费了太多时间来仅循环20或50个值。
index_values- start 19:28
index_values - end 19:42
谢谢!
答案 0 :(得分:1)
我的解决方案,用:
df = pd.DataFrame(columns=['cf','cash_in','cash_out','transfer','contrib','pct_model'])
for c in df.columns:
df[c] = np.random.rand(100)*100
print(df.head())
cf cash_in cash_out transfer contrib pct_model
0 18.478061 80.073920 19.041986 8.859406 85.695653 18.174608
1 96.172043 72.786434 54.215755 76.859253 87.934012 47.415420
2 79.026521 63.252437 29.094382 23.460806 30.547062 36.154976
3 64.630058 85.409417 98.469148 84.905463 32.859257 75.908211
4 54.121041 8.823944 48.835937 5.194054 17.004900 25.130477
迭代rows
以创建新的array
并分配给df
#amt_model is your future column
amt_model = [df.loc[0,'contrib']] #init with first row
#Calling df[1:] will get all your df except first row, iterate over it
for i, row in df[1:].iterrows():
_amt_model = amt_model[-1] * (1 + row.pct_model)
amt_model.append( _amt_model + row.cf * (1 + row.pct_model/2))
df['amt_model'] = amt_model #assign to your df
print(df.amt_model.head())
0 8.569565e+01
1 6.525182e+03
2 2.439506e+05
3 1.876432e+07
4 4.903214e+08
Name: amt_model, dtype: float64
表演:100 loops, best of 3: 13.7 ms per loop
这是你能期待的吗?
<强>替代强>
如果是,您可以在一行中尝试:
选项1:
amt_model = [df.loc[0,'contrib']]
[amt_model.append( amt_model[-1] * (1 + row.pct_model) + row.cf * (1 + row.pct_model/2) )
for (i,row) in df[1:].iterrows()]
df['amt_model'] = amt_model
#Performances:
100 loops, best of 3: 14.7 ms per loop
Opt2 - 使用apply
:
amt_model = [df.loc[0,'contrib']]
df[1:].apply(lambda row: amt_model.append( amt_model[-1] * (1 + row.pct_model) + row.cf * (1 + row.pct_model/2) ),
axis='columns')
df['amt_model'] = amt_model
#Performances:
100 loops, best of 3: 11.7 ms per loop
答案 1 :(得分:0)
你可以通过pull&#39; amt2&#39;升级它。来自循环。我会用这样的东西:
df['amt2'] = df [ 'cf' ] * (1 + df [ 'pct_model' ] / 2)
df['amt1_1'] = 1 + df[ 'pct_model' ]
for i in range(len( df)):
# adjust for the first month
if i == 0:
df [ 'amt_model' ].iloc [ i ] = df [ 'contrib' ].iloc [ i ]
else:
amt1 = df [ 'amt_model' ].iloc [ i - 1 ] * df['amt1_1'].iloc[i]
df [ 'amt_model' ].iloc [ i ] = amt1 + df['amt2'].iloc[i]
你需要升级&#39; amt_model&#39;每次迭代都有变量,所以我没有看到任何不同的选项。
答案 2 :(得分:0)
你试过这个吗?
df.loc[0,'amt_model' ] = df.loc[0,'contrib']
amt1 = (df.loc[:(len(df)-2),'amt_model']) * (1 + df.loc[1:, 'pct_model'].reset_index(drop=True))
amt2 = (df[ 'cf' ]) * (1 + df[ 'pct_model' ]/2)
df['amt_model'] = amt1 + amt2
使用len(df)-2
为您提供t-1
值,df.iloc[1:]
为您提供t
值。相同的长度。