给出一个包含两列A
和B
的数据框:
df =
A B
cat 3
cat 4
cat 2
bird 1
bird 3
bird 2
bird 5
bird 3
如果列A
中唯一元素的数量少于3
,我想删除行
cat
的len是-3(删除)
bird
的len是-5(保持)
所需的输出:
df =
A B
bird 1
bird 3
bird 2
bird 5
bird 3
答案 0 :(得分:2)
使用filter:
result = df.groupby('A').filter(lambda x: len(x) > 3)
print(result)
输出
A B
3 bird 1
4 bird 3
5 bird 2
6 bird 5
7 bird 3
您也可以使用value_counts:
# find the count by each value of A
counts = df.A.value_counts().to_frame()
# keep those with count above 3
keep = counts[counts.A > 3].index
# filter
result = df[df.A.isin(keep)]
print(result)
答案 1 :(得分:0)
此问题是Python: Removing Rows on Count condition
的重复项我敢肯定还有更好的方法,我只是还没有找到。我会继续搜索。
import pandas as pd
raw_str = \
'''
A B
cat 3
cat 4
cat 2
bird 1
bird 3
bird 2
bird 5
bird 3'''
df_1 = pd.read_csv(StringIO(raw_str), delim_whitespace=True, header=0, dtype={'A': str, 'B': int})
val_counts = df_1['A'].value_counts()
df_1 = df_1[(val_counts[df_1['A']] > 3).reset_index(drop=True)]