嗨,我有以下数据框
import pandas as pd
d = {'col1': [0.02,0.12,-0.1,0-0.07,0.01]}
df = pd.DataFrame(data=d)
df['new'] = ''
df['new'].iloc[0] = 100
df
我试图计算(从第1行开始)“新”列中的前一个值除以“ col1” +1的值。
例如在第一行的新列中:100 /(0.12 + 1)= 89,285
例如在第二行的新列中:89,285 /(-0.10 + 1)= 99,206 等等
我已经尝试使用lambda函数-没有成功。感谢您的帮助
答案 0 :(得分:2)
尝试一下:
df['new'].iloc[0] = 100
for i in range(1,df.shape[0]):
prev = df['new'].iloc[i-1]
df['new'].iloc[i] = prev/(df['col1'].iloc[i]+1)
输出:
col1 new
-------------------
0 0.02 100
1 0.12 89.2857
2 -0.10 99.2063
3 -0.07 106.673
4 0.01 105.617
答案 1 :(得分:2)
如果性能很重要,我认为numba是在此处使用循环的方式:
d = {'col1': [0.02,0.12,-0.1,0-0.07,0.01]}
df = pd.DataFrame(data=d)
df = pd.concat([df] * 1000, ignore_index=True)
In [168]: %timeit df['new'] = f(df['new'].to_numpy(), df['col1'].to_numpy())
277 µs ± 11.1 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
In [169]: %%timeit
...: for i in range(1,df.shape[0]):
...: prev = df['new'].iloc[i-1]
...: df['new'].iloc[i] = prev/(df['col1'].iloc[i]+1)
...:
1.31 s ± 20.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
In [170]: %%timeit
...: for i_row, row in df.iloc[1:, ].iterrows():
...: df.loc[i_row, 'new'] = df.loc[i_row - 1, 'new'] / (row['col1'] + 1)
...:
2.08 s ± 93.8 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
性能(用于5000行):
mutate_at
答案 2 :(得分:0)
我没有看到任何向量化解决方案。这是一个纯粹的循环:
df['new'] = 100
for i_row, row in df.iloc[1:, ].iterrows():
df.loc[i_row, 'new'] = df.loc[i_row - 1, 'new'] / (row['col1'] + 1)