我的df如下:
foo bar baz
aaa 0 Laos
aaa 45 Nigeria
aaa 123 Panama
bbb 12 Panama
bbb 826 Nigeria
ccc 0 Laos
ccc 15 Laos
ccc 72 Panama
ddd 4 Panama
ddd 9 Laos
ddd 987 Panama
ddd 25 Nigeria
我也有一组:{“老挝”,“巴拿马”,“尼日利亚”}
我想对groupby(“ foo”)进行分组,只保留“ baz”列中包含所有值的组。
因此,生成的df仅包含这些行(因为bbb缺少老挝,而ccc缺少尼日利亚):
foo bar baz
aaa 0 Laos
aaa 45 Nigeria
aaa 123 Panama
ddd 4 Panama
ddd 9 Laos
ddd 987 Panama
ddd 25 Nigeria
答案 0 :(得分:2)
尝试
s=df.groupby('foo').\
filter(lambda x : pd.Series(["laos", "panama", "nigeria"]).isin(x['baz'].str.lower()).all())
Out[21]:
foo bar baz
0 aaa 0 Laos
1 aaa 45 Nigeria
2 aaa 123 Panama
8 ddd 4 Panama
9 ddd 9 Laos
10 ddd 987 Panama
11 ddd 25 Nigeria
答案 1 :(得分:1)
IIUC,Series.str.lower
与Series.isin
和 GroupBy.transform
l = ["laos", "panama", "nigeria"]
s = df['baz'].str.lower()
m = (s.isin(l)
.mask(df.duplicated(['baz', 'foo']), False)
.groupby(df['foo'])
.transform('sum').eq(len(l)))
df_filtered = df.loc[m]
print(df_filtered)
foo bar baz
0 aaa 0 Laos
1 aaa 45 Nigeria
2 aaa 123 Panama
8 ddd 4 Panama
9 ddd 9 Laos
10 ddd 987 Panama
11 ddd 25 Nigeria
它类似于:
m = ((s.isin(l) & (~df.duplicated(['baz', 'foo'])))
.groupby(df['foo'])
.transform('sum').eq(len(l)))
答案 2 :(得分:1)
df1 = df[df.groupby('foo')['baz'].transform('nunique').eq(3)]