Question

本质上，我想将连续的Trues重复项转换为False，如标题所示。

例如，假设我有一个0和1的数组

x = pd.Series([1,0,0,1,1])

应成为：

y = pd.Series([0,0,0,0,1])
# where the 1st element of x becomes 0 since its not a consecutive
# and the 4th element becomes 0 because its the first instance of the consecutive duplicate
# And everything else should remain the same.

这也可以应用于两个以上的连续数组，假设我的数组更长：例如。

x = pd.Series([1,0,0,1,1,1,0,1,1,0,1,1,1,1,0,0,1,1,1,1,1])

成为；

y = pd.Series([0,0,0,0,0,1,0,0,1,0,0,0,0,1,0,0,0,0,0,0,1])

我搜索的帖子大多是删除连续的重复项，并且不保留原始长度。在这种情况下，应保留原始长度。

它类似于以下代码：

for i in range(len(x)):
    if x[i] == x[i+1]:
        x[i] = True
    else:
       x[i] = False

但这给了我永无止境的奔跑。并且不能容纳连续的两个以上。

Answer 1

熊猫解决方案-创建Series，然后按shift和cumsum创建连续的组，并按Series.duplicated过滤重复项中的最后1个值：

s = pd.Series(x)
g = s.ne(s.shift()).cumsum()
s1 = (~g.duplicated(keep='last') & g.duplicated(keep=False) & s.eq(1)).astype(int)

print (s1.tolist())
[0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1]

编辑：

对于多列，请使用功能：

x = pd.Series([1,0,0,1,1,1,0,1,1,0,1,1,1,1,0,0,1,1,1,1,1])
df = pd.DataFrame({'a':x, 'b':x})

def f(s):
    g = s.ne(s.shift()).cumsum()
    return (~g.duplicated(keep='last') & g.duplicated(keep=False) & s.eq(1)).astype(int)

df = df.apply(f)
print (df)
    a  b
0   0  0
1   0  0
2   0  0
3   0  0
4   0  0
5   1  1
6   0  0
7   0  0
8   1  1
9   0  0
10  0  0
11  0  0
12  0  0
13  1  1
14  0  0
15  0  0
16  0  0
17  0  0
18  0  0
19  0  0
20  1  1

Answer 2

香草Python：

x = [1,0,0,1,1,1,0,1,1,0,1,1,1,1,0,0,1,1,1,1,1]
counter = 0
for i, e in enumerate(x):
    if not e:
        counter = 0
        continue
    if not counter or (i < len(x) - 1 and x[i+1]):
        counter += 1
        x[i] = 0
print(x)

打印：

[0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1]

将连续的True转换为False python

2 个答案: