熊猫按一列分组,并仅保留列中具有所有值的组

时间:2020-08-05 14:08:19

标签: python pandas

我的df如下:

foo bar baz
aaa 0   Laos
aaa 45  Nigeria
aaa 123 Panama
bbb 12  Panama
bbb 826 Nigeria
ccc 0   Laos
ccc 15  Laos
ccc 72  Panama
ddd 4   Panama
ddd 9   Laos
ddd 987 Panama
ddd 25  Nigeria

我也有一组:{“老挝”,“巴拿马”,“尼日利亚”}

我想对groupby(“ foo”)进行分组,只保留“ baz”列中包含所有值的组。

因此,生成的df仅包含这些行(因为bbb缺少老挝,而ccc缺少尼日利亚):

foo bar baz
aaa 0   Laos
aaa 45  Nigeria
aaa 123 Panama
ddd 4   Panama
ddd 9   Laos
ddd 987 Panama
ddd 25  Nigeria

3 个答案:

答案 0 :(得分:2)

尝试

s=df.groupby('foo').\
      filter(lambda x : pd.Series(["laos", "panama", "nigeria"]).isin(x['baz'].str.lower()).all())
Out[21]: 
    foo  bar      baz
0   aaa    0     Laos
1   aaa   45  Nigeria
2   aaa  123   Panama
8   ddd    4   Panama
9   ddd    9     Laos
10  ddd  987   Panama
11  ddd   25  Nigeria

答案 1 :(得分:1)

IIUC,Series.str.lowerSeries.isin GroupBy.transform

l = ["laos", "panama", "nigeria"]
s = df['baz'].str.lower()

m = (s.isin(l)
      .mask(df.duplicated(['baz', 'foo']), False)
      .groupby(df['foo'])
      .transform('sum').eq(len(l)))

df_filtered = df.loc[m]
print(df_filtered)


    foo  bar      baz
0   aaa    0     Laos
1   aaa   45  Nigeria
2   aaa  123   Panama
8   ddd    4   Panama
9   ddd    9     Laos
10  ddd  987   Panama
11  ddd   25  Nigeria

它类似于:

m = ((s.isin(l) & (~df.duplicated(['baz', 'foo'])))
       .groupby(df['foo'])
       .transform('sum').eq(len(l)))

答案 2 :(得分:1)

df1 = df[df.groupby('foo')['baz'].transform('nunique').eq(3)]