Question

我试图弄清楚如何在Pandas数据框中突出显示第n个后续值（及其后的值），以获得类似以下内容：

 Example 1: highlight 3rd subsequent equal value in Column A:

 Column A | Desired_output
 1        | 0
 1        | 0
 1        | 1
 1        | 1
 1        | 1
 1        | 1
 0        | 0
 0        | 0

 Example 2: highlight 5th subsequent equal value in Column A:

 Column A | Desired_output
 1        | 0
 1        | 0
 1        | 0
 1        | 0
 1        | 1
 1        | 1
 0        | 0
 0        | 0

这不仅在列A等于1时有效，而且在零时也有效。主要思想是：如果我没有足够的后续相等值，则我的代码不应考虑它们。

我当时想在动态窗口中使用pd.rolling_sum命令，但是我在应用程序方面很挣扎，您对如何进行操作有任何想法吗？谢谢

Answer 1

考虑您的代码段：

import pandas as pd
df = pd.DataFrame({'A': [1,1,1,1,1,1,0,0]})

# set n as the number of repetitions to highlight:
n=3 #or n=5

您有两种不同的处理方式：

特殊情况

它可以解决您的特定问题（它要求您的列仅包含1和0）并且需要numpy：

import numpy as np

df['Desired Output']=np.where(df.rolling(n).sum()%n==0, True, False)

一般情况

它允许您解决行之间的不同种类的比较（不仅检查相等性），例如：

comparison = True

for i in range(n):
    comparison &= df['A'] == df['A'].shift(i)

df['Desired Output'] = comparison

两种情况的结果

对于n = 3，您将拥有：

    A   Desired Output
0   1   False
1   1   False
2   1   True
3   1   True
4   1   True
5   1   True
6   0   False
7   0   False

对于n = 5，您将拥有：

    A   Desired Output
0   1   False
1   1   False
2   1   False
3   1   False
4   1   True
5   1   True
6   0   False
7   0   False

格式：

如果您需要使用1和0的新列，则使用特殊情况下的方法，创建列时可以只使用1和0而不是True和False，就像这样：

# consider that in this scenario, a sequence of zeros it will be flagged with 1
df['Desired Output']=np.where(df.rolling(n).sum()%n==0, 1, 0)

如果选择采用一般情况，则在创建列时只需包括astype（int），如下所示：

df['Desired Output'] = comparison.astype(int)

熊猫-突出显示列中的第n个后续相等值

1 个答案:

特殊情况

一般情况

两种情况的结果

格式：