熊猫循环占用了很多时间 - 更好的方法?

时间:2018-02-08 04:35:32

标签: python pandas

我有一个循环太多时间,我想知道是否有更好的方法?或者如果我犯了菜鸟的错误?

我做循环的原因是第一个值不同,需要以前的值。

# create var and set to 0
df [ 'amt_model' ] = 0

# create the cashflow variable
df [ 'cf' ] = df [ 'cash_in' ] - df [ 'cash_out' ] + df [ 'transfer' ]

现在我循环遍历几个月的范围来创建'amt_model'值。

for i in range ( len ( df ) ):

    # adjust for the first month
    if i == 0:
        df [ 'amt_model' ].iloc [ i ] = df [ 'contrib' ].iloc [ i ]

    else:

        amt1 = df [ 'amt_model' ].iloc [ i - 1 ] * (1 + df [ 'pct_model' ].iloc [ i ])
        amt2 = df [ 'cf' ] [ i ] * (1 + df [ 'pct_model' ].iloc [ i ] / 2)

        df [ 'amt_model' ].iloc [ i ] = amt1 + amt2

这花费了太多时间来仅循环20或50个值。

index_values- start 19:28
index_values - end 19:42

谢谢!

3 个答案:

答案 0 :(得分:1)

我的解决方案,用:

df = pd.DataFrame(columns=['cf','cash_in','cash_out','transfer','contrib','pct_model'])
for c in df.columns:
    df[c] = np.random.rand(100)*100

print(df.head())

          cf    cash_in   cash_out   transfer    contrib  pct_model
0  18.478061  80.073920  19.041986   8.859406  85.695653  18.174608
1  96.172043  72.786434  54.215755  76.859253  87.934012  47.415420
2  79.026521  63.252437  29.094382  23.460806  30.547062  36.154976
3  64.630058  85.409417  98.469148  84.905463  32.859257  75.908211
4  54.121041   8.823944  48.835937   5.194054  17.004900  25.130477

迭代rows以创建新的array并分配给df

#amt_model is your future column
amt_model = [df.loc[0,'contrib']] #init with first row

#Calling df[1:] will get all your df except first row, iterate over it
for i, row in df[1:].iterrows():
    _amt_model = amt_model[-1] * (1 + row.pct_model)
    amt_model.append( _amt_model + row.cf * (1 + row.pct_model/2))

df['amt_model'] = amt_model #assign to your df

print(df.amt_model.head())

0    8.569565e+01
1    6.525182e+03
2    2.439506e+05
3    1.876432e+07
4    4.903214e+08
Name: amt_model, dtype: float64

表演:100 loops, best of 3: 13.7 ms per loop

这是你能期待的吗?

<强>替代

如果是,您可以在一行中尝试:

选项1:

amt_model = [df.loc[0,'contrib']]
[amt_model.append( amt_model[-1] * (1 + row.pct_model) + row.cf * (1 + row.pct_model/2) ) 
for (i,row) in df[1:].iterrows()]

df['amt_model'] = amt_model

#Performances:   
100 loops, best of 3: 14.7 ms per loop

Opt2 - 使用apply

amt_model = [df.loc[0,'contrib']]
df[1:].apply(lambda row: amt_model.append( amt_model[-1] * (1 + row.pct_model) + row.cf * (1 + row.pct_model/2) ),
             axis='columns')

df['amt_model'] = amt_model

#Performances:
100 loops, best of 3: 11.7 ms per loop

答案 1 :(得分:0)

你可以通过pull&#39; amt2&#39;升级它。来自循环。我会用这样的东西:

df['amt2'] = df [ 'cf' ] * (1 + df [ 'pct_model' ] / 2)
df['amt1_1'] = 1 + df[ 'pct_model' ]

for i in range(len( df)):
    # adjust for the first month
    if i == 0:
        df [ 'amt_model' ].iloc [ i ] = df [ 'contrib' ].iloc [ i ]
    else:
        amt1 = df [ 'amt_model' ].iloc [ i - 1 ] * df['amt1_1'].iloc[i]

    df [ 'amt_model' ].iloc [ i ] = amt1 + df['amt2'].iloc[i]

你需要升级&#39; amt_model&#39;每次迭代都有变量,所以我没有看到任何不同的选项。

答案 2 :(得分:0)

你试过这个吗?

df.loc[0,'amt_model' ] = df.loc[0,'contrib']
amt1 = (df.loc[:(len(df)-2),'amt_model']) * (1 + df.loc[1:, 'pct_model'].reset_index(drop=True))
amt2 = (df[ 'cf' ]) * (1 + df[ 'pct_model' ]/2)
df['amt_model'] = amt1 + amt2

使用len(df)-2为您提供t-1值,df.iloc[1:]为您提供t值。相同的长度。