我有一个数据框,例如:
Cluster sequence_name
1 specie1
1 specie2
1 specie3
1 sequence1
1 sequence2
2 specie8
3 specie2
4 sequence1
4 sequence3
4 specie56
...
我想删除仅包含一个序列的所有聚类,在这里我应该得到:
Cluster sequence_name
1 specie1
1 specie2
1 specie3
1 sequence1
1 sequence2
4 sequence1
4 sequence3
4 specie56
...
谢谢您的帮助。
答案 0 :(得分:1)
Boolean indexing和groupby
和transform
:
df[df.groupby('Cluster')['sequence_name'].transform('size') > 1]
Cluster sequence_name
0 1 specie1
1 1 specie2
2 1 specie3
3 1 sequence1
4 1 sequence2
7 4 sequence1
8 4 sequence3
9 4 specie56
答案 1 :(得分:1)
Groupby.filter在这里效果很好
df = df.groupby('Cluster').filter(lambda x: x.sequence_name.nunique() > 1)
Cluster sequence_name
0 1 specie1
1 1 specie2
2 1 specie3
3 1 sequence1
4 1 sequence2
7 4 sequence1
8 4 sequence3
9 4 specie56