我试图找到顶部和第二高的值 我可以使用
获得最高分df['B'] = df['a'].rolling(window=3).max()
但我怎么能得到第二高的呢?
这样df [' C']将按以下显示
A B C
1
6
5 6 5
4 6 5
12 12 5
答案 0 :(得分:1)
以下是使用np.lib.stride_tricks.as_strided
创建滑动窗口的窗口,可让我们在滑动窗口中选择任何通用N
最高值 -
# https://stackoverflow.com/a/40085052/ @Divakar
def strided_app(a, L, S ): # Window len = L, Stride len/stepsize = S
nrows = ((a.size-L)//S)+1
n = a.strides[0]
return np.lib.stride_tricks.as_strided(a, shape=(nrows,L), strides=(S*n,n))
# Return N highest nums in rolling windows of length W off array ar
def N_highest(ar, W, N=1):
# ar : Input array
# W : Window length
# N : Get us the N-highest in sliding windows
A2D = strided_app(ar,W,1)
idx = (np.argpartition(A2D, -N, axis=1) == A2D.shape[1]-N).argmax(1)
return A2D[np.arange(len(idx)), idx]
样品运行 -
In [634]: a = np.array([1,6,5,4,12]) # input array
In [635]: N_highest(a, W=3, N=1) # highest in W=3
Out[635]: array([ 6, 6, 12])
In [636]: N_highest(a, W=3, N=2) # second highest
Out[636]: array([5, 5, 5])
In [637]: N_highest(a, W=3, N=3) # third highest
Out[637]: array([1, 4, 4])
另一种基于strides
的简短方法是直接排序,如此 -
np.sort(strided_app(ar,W,1), axis=1)[:,-N]]
解决我们的案例
因此,为了解决我们的情况,我们需要与NaNs
连接以及上述函数的结果,如此 -
W = 3
df['C'] = np.r_[ [np.nan]*(W-1), N_highest(df.A.values, W=W, N=2)]
基于直接排序,我们会有 -
df['C'] = np.r_[ [np.nan]*(W-1), np.sort(strided_app(df.A,W,1), axis=1)[:,-2]]
示例运行 -
In [578]: df
Out[578]:
A
0 1
1 6
2 5
3 4
4 3 # <== Different from given sample, for variety
In [619]: W = 3
In [620]: df['C'] = np.r_[ [np.nan]*(W-1), N_highest(df.A.values, W=W, N=2)]
In [621]: df
Out[621]:
A C
0 1 NaN
1 6 NaN
2 5 5.0
3 4 5.0
4 3 4.0 # <== Second highest from the last group off : [5,4,3]