Question

嗨，我正在尝试计算端口到地址之间的不同对值。

基本上是这种确切的情况，我希望rolloing_count是行进入窗口时PORT和ADDRESS值出现的次数。

Count distinct strings in rolling window using pandas + python (with a condition)

但是，问题的答案不正确，也没有后续解决方案。

我们说我有桌子

ID  PORT    ADDRESS  
1    21     ad3  
2    22     ad1       
3    23     ad2      
4    23     ad2            
5    21     ad4            
6    22     ad1            
7    22     ad1            
8    21     ad4

例如，如果我的窗口大小为3，则所需输出为

 ID  PORT    ADDRESS  rolling_count
 1    21     ad3            1
 2    22     ad1            1
 3    23     ad2            1
 4    23     ad2            2
 5    21     ad4            1
 6    22     ad1            1
 7    22     ad1            2
 8    21     ad4            1

链接的帖子答案似乎不仅仅在窗口内计数

df['rolling_count']=df.groupby('ADDRESS').PORT.apply(lambda x : pd.Series(x).rolling(3,min_periods=1).apply(lambda y: len(set(y))))

是我尝试使用的，并不正确。这就是它的输出

 ID  PORT    ADDRESS  rolling_count
 1    21     ad3            1
 2    22     ad1            1
 3    23     ad2            1
 4    23     ad2            1
 5    21     ad4            1
 6    22     ad1            1
 7    22     ad1            1
 8    21     ad4            1

任何反馈都是有用的。

Answer 1

对于您的应用程序，您可以选择按顺序计算端口和地址的重复值样本df

ID  PORT    ADDRESS
0   1   21  ad3
1   2   22  ad1
2   3   23  ad2
3   4   23  ad2
4   5   21  ad4
5   6   22  ad1
6   7   22  ad1
7   8   22  ad1

x = df.PORT.astype(str) + df.ADDRESS
x = (x.eq(x.shift())+ x.eq(x.shift(-1))).astype(int)
a = x == 1
b = a.cumsum()
arr = np.where(a, b-b.mask(a).ffill().fillna(0).astype(int), 1)

出

array([1, 1, 1, 2, 1, 1, 2, 3])

滚动窗口实际上将（windowlength -1）重复窗口保留在序列上，而该序列没有您选择的窗口长度，从而导致在不同位置计数值。

from collections import Counter
def unique_values(x,window):
    # External numpy array to generate windows
    a = (np.arange(window)[None, :] + np.arange(len(x))[:, None])-(window-1)
    b = np.where(a<0,0,a)
    return [max(Counter(i).values()) for i in np.where(a<0,a,x.values.take(a))]
unique_values(df.PORT.astype(str) +df.ADDRESS,3)

出局：

[1, 1, 1, 2, 2, 1, 2, 2]

滚动窗口中的不同对熊猫

1 个答案: