根据列熊猫中元素的唯一数量删除行

时间:2019-12-01 01:35:09

标签: python pandas numpy

给出一个包含两列AB的数据框:

df = 

A      B
cat    3
cat    4
cat    2
bird   1
bird   3
bird   2
bird   5
bird   3

如果列A中唯一元素的数量少于3,我想删除行 cat的len是-3(删除) bird的len是-5(保持)

所需的输出:

df = 

A      B
bird   1
bird   3
bird   2
bird   5
bird   3

2 个答案:

答案 0 :(得分:2)

使用filter

result = df.groupby('A').filter(lambda x: len(x) > 3)
print(result)

输出

      A  B
3  bird  1
4  bird  3
5  bird  2
6  bird  5
7  bird  3

您也可以使用value_counts

# find the count by each value of A
counts = df.A.value_counts().to_frame()

# keep those with count above 3
keep = counts[counts.A > 3].index

# filter
result = df[df.A.isin(keep)]
print(result)

答案 1 :(得分:0)

此问题是Python: Removing Rows on Count condition

的重复项

我敢肯定还有更好的方法,我只是还没有找到。我会继续搜索。

import pandas as pd

raw_str = \
'''
A      B
cat    3
cat    4
cat    2
bird   1
bird   3
bird   2
bird   5
bird   3'''

df_1 = pd.read_csv(StringIO(raw_str), delim_whitespace=True, header=0, dtype={'A': str, 'B': int})


val_counts = df_1['A'].value_counts()

df_1 = df_1[(val_counts[df_1['A']] > 3).reset_index(drop=True)]