Question

假设我们有一个数据框：

   num  line    
0   1    56
1   1    90  
2   2    66  
3   3    4  
4   3    55  
5   3    104
6   1    23  
7   5    22  
8   3    144

我想删除在num列中重复3的行，并保留第一个。因此，在num列中重复1的两行应与所有其他列一起保留在结果DataFrame中。

到目前为止，我拥有的东西会删除每个double值，而不仅仅是3个：

data.groupby((data['num'] != data['num'].shift()).cumsum().values).first()

预期结果或正确的代码：

   num  line    
0   1    56
1   1    90  
2   2    66  
3   3    4  
4   1    23  
5   5    22  
6   3    144

Answer 1

您可以使用以下条件以便在数据框中执行布尔索引：

# True where num is 3
c1 = df['num'].eq(3)
# True where num is repeated
c2 = df['num'].eq(df['num'].shift(1))
# boolean indexation on df
df[(c1 & ~c2) | ~(c1)]

    num  line
0    1    56
1    1    90
2    2    66
3    3     4
6    1    23
7    5    22
8    3   144

详细信息

df.assign(is_3=c1, is_repeated=c2, filtered=(c1 & ~c2) | ~(c1))

   num  line   is_3  is_repeated  filtered
0    1    56  False        False      True
1    1    90  False         True      True
2    2    66  False        False      True
3    3     4   True        False      True
4    3    55   True         True     False
5    3   104   True         True     False
6    1    23  False        False      True
7    5    22  False        False      True
8    3   144   True        False      True

Answer 2

使用：

df = data[data['num'].ne(3) | data['num'].ne(data['num'].shift())]
print (df)
   num  line
0    1    56
1    1    90
2    2    66
3    3     4
6    1    23
7    5    22
8    3   144

详细信息：

比较不等于：

print (data['num'].ne(3))
0     True
1     True
2     True
3    False
4    False
5    False
6     True
7     True
8    False
Name: num, dtype: bool

通过连续第一个连续的移位值进行比较：

print (data['num'].ne(data['num'].shift()))
0     True
1    False
2     True
3     True
4    False
5    False
6     True
7     True
8     True
Name: num, dtype: bool

按|进行按位OR的束缚：

print (data['num'].ne(3) | data['num'].ne(data['num'].shift()))
0     True
1     True
2     True
3     True
4    False
5    False
6     True
7     True
8     True
Name: num, dtype: bool

熊猫-如果在列中重复指定值，则删除该行，并保留在第一位

2 个答案: