在熊猫数据框中滚动第二高

时间:2017-12-10 13:43:28

标签: pandas numpy dataframe

我试图找到顶部和第二高的值 我可以使用

获得最高分
df['B'] = df['a'].rolling(window=3).max()

但我怎么能得到第二高的呢?

这样df [' C']将按以下显示

A    B    C
1
6
5    6    5
4    6    5
12   12   5

1 个答案:

答案 0 :(得分:1)

滚动/滑动窗口中的通用n最高值

以下是使用np.lib.stride_tricks.as_strided创建滑动窗口的窗口,可让我们在滑动窗口中选择任何通用N最高值 -

# https://stackoverflow.com/a/40085052/ @Divakar
def strided_app(a, L, S ):  # Window len = L, Stride len/stepsize = S
    nrows = ((a.size-L)//S)+1
    n = a.strides[0]
    return np.lib.stride_tricks.as_strided(a, shape=(nrows,L), strides=(S*n,n))

# Return N highest nums in rolling windows of length W off array ar
def N_highest(ar, W, N=1): 
    # ar : Input array
    # W : Window length
    # N : Get us the N-highest in sliding windows 
    A2D = strided_app(ar,W,1)
    idx = (np.argpartition(A2D, -N, axis=1) == A2D.shape[1]-N).argmax(1)
    return A2D[np.arange(len(idx)), idx]

样品运行 -

In [634]: a = np.array([1,6,5,4,12]) # input array

In [635]: N_highest(a, W=3, N=1)  # highest in W=3
Out[635]: array([ 6,  6, 12])

In [636]: N_highest(a, W=3, N=2)  # second highest
Out[636]: array([5, 5, 5])

In [637]: N_highest(a, W=3, N=3)  # third highest
Out[637]: array([1, 4, 4])

另一种基于strides的简短方法是直接排序,如此 -

np.sort(strided_app(ar,W,1), axis=1)[:,-N]]

解决我们的案例

因此,为了解决我们的情况,我们需要与NaNs连接以及上述函数的结果,如此 -

W = 3
df['C'] = np.r_[ [np.nan]*(W-1), N_highest(df.A.values, W=W, N=2)]

基于直接排序,我们会有 -

df['C'] = np.r_[ [np.nan]*(W-1), np.sort(strided_app(df.A,W,1), axis=1)[:,-2]]

示例运行 -

In [578]: df
Out[578]: 
   A
0  1
1  6
2  5
3  4
4  3  # <== Different from given sample, for variety

In [619]: W = 3

In [620]: df['C'] = np.r_[ [np.nan]*(W-1), N_highest(df.A.values, W=W, N=2)]

In [621]: df
Out[621]: 
   A    C
0  1  NaN
1  6  NaN
2  5  5.0
3  4  5.0
4  3  4.0 # <== Second highest from the last group off : [5,4,3]