将连续的True转换为False python

时间:2019-05-28 11:06:20

标签: python pandas conditional-statements

本质上,我想将连续的Trues重复项转换为False,如标题所示。

例如,假设我有一个0和1的数组

x = pd.Series([1,0,0,1,1])

应成为:

y = pd.Series([0,0,0,0,1])
# where the 1st element of x becomes 0 since its not a consecutive
# and the 4th element becomes 0 because its the first instance of the consecutive duplicate
# And everything else should remain the same.

这也可以应用于两个以上的连续数组,假设我的数组更长: 例如。

x = pd.Series([1,0,0,1,1,1,0,1,1,0,1,1,1,1,0,0,1,1,1,1,1])

成为;

y = pd.Series([0,0,0,0,0,1,0,0,1,0,0,0,0,1,0,0,0,0,0,0,1])

我搜索的帖子大多是删除连续的重复项,并且不保留原始长度。在这种情况下,应保留原始长度。

它类似于以下代码:

for i in range(len(x)):
    if x[i] == x[i+1]:
        x[i] = True
    else:
       x[i] = False

但这给了我永无止境的奔跑。并且不能容纳连续的两个以上。

2 个答案:

答案 0 :(得分:2)

熊猫解决方案-创建Series,然后按shiftcumsum创建连续的组,并按Series.duplicated过滤重复项中的最后1个值:

s = pd.Series(x)
g = s.ne(s.shift()).cumsum()
s1 = (~g.duplicated(keep='last') & g.duplicated(keep=False) & s.eq(1)).astype(int)

print (s1.tolist())
[0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1]

编辑:

对于多列,请使用功能:

x = pd.Series([1,0,0,1,1,1,0,1,1,0,1,1,1,1,0,0,1,1,1,1,1])
df = pd.DataFrame({'a':x, 'b':x})

def f(s):
    g = s.ne(s.shift()).cumsum()
    return (~g.duplicated(keep='last') & g.duplicated(keep=False) & s.eq(1)).astype(int)

df = df.apply(f)
print (df)
    a  b
0   0  0
1   0  0
2   0  0
3   0  0
4   0  0
5   1  1
6   0  0
7   0  0
8   1  1
9   0  0
10  0  0
11  0  0
12  0  0
13  1  1
14  0  0
15  0  0
16  0  0
17  0  0
18  0  0
19  0  0
20  1  1

答案 1 :(得分:2)

香草Python:

x = [1,0,0,1,1,1,0,1,1,0,1,1,1,1,0,0,1,1,1,1,1]
counter = 0
for i, e in enumerate(x):
    if not e:
        counter = 0
        continue
    if not counter or (i < len(x) - 1 and x[i+1]):
        counter += 1
        x[i] = 0
print(x)

打印:

[0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1]