使用lambda遍历列,并使用其他列中的值进行计算

时间:2020-03-05 10:47:44

标签: python pandas lambda

嗨,我有以下数据框

import pandas as pd
d = {'col1': [0.02,0.12,-0.1,0-0.07,0.01]}
df = pd.DataFrame(data=d)

df['new'] = ''
df['new'].iloc[0] = 100

df

我试图计算(从第1行开始)“新”列中的前一个值除以“ col1” +1的值。

例如在第一行的新列中:100 /(0.12 + 1)= 89,285

例如在第二行的新列中:89,285 /(-0.10 + 1)= 99,206 等等

我已经尝试使用lambda函数-没有成功。感谢您的帮助

3 个答案:

答案 0 :(得分:2)

尝试一下:

df['new'].iloc[0] = 100

for i in range(1,df.shape[0]):
    prev = df['new'].iloc[i-1]
    df['new'].iloc[i] = prev/(df['col1'].iloc[i]+1)

输出:

col1        new
-------------------
0   0.02    100
1   0.12    89.2857
2   -0.10   99.2063
3   -0.07   106.673
4   0.01    105.617

答案 1 :(得分:2)

如果性能很重要,我认为numba是在此处使用循环的方式:

d = {'col1': [0.02,0.12,-0.1,0-0.07,0.01]}
df = pd.DataFrame(data=d)
df = pd.concat([df] * 1000, ignore_index=True)

In [168]: %timeit df['new'] = f(df['new'].to_numpy(), df['col1'].to_numpy())
277 µs ± 11.1 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

In [169]: %%timeit
     ...: for i in range(1,df.shape[0]):
     ...:     prev = df['new'].iloc[i-1]
     ...:     df['new'].iloc[i] = prev/(df['col1'].iloc[i]+1)
     ...:     
1.31 s ± 20.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [170]: %%timeit
     ...: for i_row, row in df.iloc[1:, ].iterrows():
     ...:     df.loc[i_row, 'new'] = df.loc[i_row - 1, 'new'] / (row['col1'] + 1)
     ...:     
2.08 s ± 93.8 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

性能(用于5000行):

mutate_at

答案 2 :(得分:0)

我没有看到任何向量化解决方案。这是一个纯粹的循环:

df['new'] = 100
for i_row, row in df.iloc[1:, ].iterrows():
    df.loc[i_row, 'new'] = df.loc[i_row - 1, 'new'] / (row['col1'] + 1)