我有一个数据框:
import numpy as np
import pandas as pd
np.random.seed(18)
df = pd.DataFrame(np.random.randint(0,50,size=(10, 2)), columns=list('AB'))
df['Min'] = np.nan
n = 3 # can be changed
目前,我使用迭代方法来完成此操作:
for row in range (0, df.shape[0]-n):
low = []
for i in range (1, n+1):
low.append(df.loc[df.index[row+i], 'B'])
df.loc[df.index[row], 'Min'] = min(low)
但这是一个非常缓慢的过程。请问有没有更有效的方法?谢谢。
答案 0 :(得分:4)
df['Min'] = df['B'].rolling(n).min().shift(-n)
print (df)
A B Min
0 42 19 2.0
1 5 49 2.0
2 46 2 17.0
3 8 24 17.0
4 34 17 11.0
5 5 21 4.0
6 47 42 1.0
7 10 11 NaN
8 36 4 NaN
9 43 1 NaN
如果性能很重要,请使用this solution:
def rolling_window(a, window):
shape = a.shape[:-1] + (a.shape[-1] - window + 1, window)
strides = a.strides + (a.strides[-1],)
return np.lib.stride_tricks.as_strided(a, shape=shape, strides=strides)
arr = rolling_window(df['B'].values, n).min(axis=1)
df['Min'] = np.concatenate([arr[1:], [np.nan] * n])
print (df)
A B Min
0 42 19 2.0
1 5 49 2.0
2 46 2 17.0
3 8 24 17.0
4 34 17 11.0
5 5 21 4.0
6 47 42 1.0
7 10 11 NaN
8 36 4 NaN
9 43 1 NaN
答案 1 :(得分:3)
Jez知道了。就像另一个选择一样,您也可以向前滚动整个系列(如Andy here所建议的那样)
df.B[::-1].rolling(3).min()[::-1].shift(-1)
0 2.0
1 2.0
2 17.0
3 17.0
4 11.0
5 4.0
6 1.0
7 NaN
8 NaN
9 NaN