Question

以下是我的数据框的外观。 Expected_Output列是我想要的/目标列。

   Group  Value  Expected_Output
0      1      2                1
1      1      3                1
2      1      6                1
3      1     11                0
4      1      7                0
5      2      3                1
6      2     13                1
7      2     14                0

对于给定的Group，从给定的行开始，我正在研究下一个 5行，并检查是否有任何Value > 10。如果为true，那么我想在Expected_Output中返回1，否则返回0。

例如，在Group 1的第一行中，Value为11（大于10）出现在3行中，并且确实位于“下5行窗口”内满足条件，因此在Expected_Output中返回1。类似地，从Group 2中的第6行开始，Value中的14（大于10）出现在1行之内，并且落在满足条件的“下5行窗口”之内，因此1在Expected_Output中返回。

我尝试df.groupby('Group')['Value'].rolling(-5).max() > 10无济于事。

Answer 1

pd.Series.rolling默认情况下向后看。若要向前看，可以反转数据帧，然后反转GroupBy结果。您需要包含一个shift，因为您要查找 next 5个值。

def roller(x):
    return x.rolling(window=5, min_periods=1)['Value'].max().shift().gt(10).astype(int)

df['Result'] = df.iloc[::-1].groupby('Group', sort=False).apply(roller).iloc[::-1].values

print(df)

   Group  Value  Result
0      1      2       1
1      1      3       1
2      1      6       1
3      1     11       0
4      1      7       0
5      2      3       1
6      2     13       1
7      2     14       0

Answer 2

您可以尝试对数据帧进行分组，并利用数据帧索引来获取下一个可能的5个值，并检查任何大于10的值

df['Expected_Output'] =df.groupby(['Group'])['Value'].transform(lambda y:list(map(lambda x: 1 if any(y.loc[set(np.arange(x+1,x+6)).intersection(y.index)] >10) else 0,y.index)))

出局：

    Group   Value   Expected_Output
0   1   2   1
1   1   3   1
2   1   6   1
3   1   11  0
4   1   7   0
5   2   3   1
6   2   13  1
7   2   14  0

在熊猫的GroupBy中检查负向滚动窗口中的条件

2 个答案: