如何用其他列的下n个条目的最小值填充DataFrame列

时间:2018-07-29 18:33:49

标签: python performance pandas dataframe

我有一个数据框:

import numpy as np
import pandas as pd
np.random.seed(18)
df = pd.DataFrame(np.random.randint(0,50,size=(10, 2)), columns=list('AB'))
df['Min'] = np.nan
n = 3   # can be changed

enter image description here

我需要在列“ Min”中填充列“ B”的下n个值的最小值: enter image description here

目前,我使用迭代方法来完成此操作:

for row in range (0, df.shape[0]-n):
    low = []
    for i in range (1, n+1):
        low.append(df.loc[df.index[row+i], 'B'])
    df.loc[df.index[row], 'Min'] = min(low)

但这是一个非常缓慢的过程。请问有没有更有效的方法?谢谢。

2 个答案:

答案 0 :(得分:4)

先将rollingminshift一起使用:

df['Min'] = df['B'].rolling(n).min().shift(-n)
print (df)
    A   B   Min
0  42  19   2.0
1   5  49   2.0
2  46   2  17.0
3   8  24  17.0
4  34  17  11.0
5   5  21   4.0
6  47  42   1.0
7  10  11   NaN
8  36   4   NaN
9  43   1   NaN

如果性能很重要,请使用this solution

def rolling_window(a, window):
    shape = a.shape[:-1] + (a.shape[-1] - window + 1, window)
    strides = a.strides + (a.strides[-1],)
    return np.lib.stride_tricks.as_strided(a, shape=shape, strides=strides)
arr = rolling_window(df['B'].values, n).min(axis=1)
df['Min'] = np.concatenate([arr[1:], [np.nan] * n])
print (df)
    A   B   Min
0  42  19   2.0
1   5  49   2.0
2  46   2  17.0
3   8  24  17.0
4  34  17  11.0
5   5  21   4.0
6  47  42   1.0
7  10  11   NaN
8  36   4   NaN
9  43   1   NaN

答案 1 :(得分:3)

Jez知道了。就像另一个选择一样,您也可以向前滚动整个系列(如Andy here所建议的那样)

df.B[::-1].rolling(3).min()[::-1].shift(-1)

0     2.0
1     2.0
2    17.0
3    17.0
4    11.0
5     4.0
6     1.0
7     NaN
8     NaN
9     NaN