Question

我有一个DataFrame：

library(tidyverse)

df %>% 
  arrange(cat1, share) %>% 
  mutate(cat3 = factor(paste(cat1, cat2), levels=paste(cat1, cat2))) %>%
  ggplot(aes(cat1, share, group=cat3)) + 
    geom_bar(stat = "identity", aes(fill = cat2)) + 
    geom_text(aes(label = cat2), position=position_stack(vjust=0.5)) +
    coord_flip()

我想获得新的DateFrame，其中只有两个性别值的id。所以我想得到这样的东西。

df = pd.DataFrame({'id':[1,1,1,1,2,2,2,3,3,3,4,4],
              'sex': [0,0,0,1,0,0,0,1,1,0,1,1]})
    id  sex
0   1   0
1   1   0
2   1   0
3   1   1
4   2   0
5   2   0
6   2   0
7   3   1
8   3   1
9   3   0
10  4   1
11  4   1

Answer 1

将groupby和filter与所需条件一起使用

In [2952]: df.groupby('id').filter(lambda x: set(x.sex) == set([0,1]))
Out[2952]:
   id  sex
0   1    0
1   1    0
2   1    0
3   1    1
7   3    1
8   3    1
9   3    0

此外，

In [2953]: df.groupby('id').filter(lambda x: all([any(x.sex == v) for v in [0,1]]))
Out[2953]:
   id  sex
0   1    0
1   1    0
2   1    0
3   1    1
7   3    1
8   3    1
9   3    0

Answer 2

两列都使用drop_duplicates，然后首先按value_counts获取一列的大小。

然后使用boolean indexing {/ 3>按isin过滤所有值

s = df.drop_duplicates()['id'].value_counts()
print (s)
3    2
1    2
4    1
2    1
Name: id, dtype: int64

df = df[df['id'].isin(s.index[s == 2])]
print (df)
   id  sex
0   1    0
1   1    0
2   1    0
3   1    1
7   3    1
8   3    1
9   3    0

Answer 3

还有一个：）

df.groupby('id').filter(lambda x: x['sex'].nunique()>1)

    id  sex
0   1   0
1   1   0
2   1   0
3   1   1
7   3   1
8   3   1
9   3   0

Answer 4

使用isin（）

这样的事情：

df = pd.DataFrame({'id':[1,1,1,1,2,2,2,3,3,3,4,4],
              'sex': [0,0,0,1,0,0,0,1,1,0,1,1]})

male = df[df['sex'] == 0]
male = male['id']
female = df[df['sex'] == 1]
female = female['id']

df = df[(df['id'].isin(male)) & (df['id'].isin(female))]

print(df)

输出：

Answer 5

或者你可以试试这个

m=df.groupby('id')['sex'].nunique().eq(2)

df.loc[df.id.isin(m[m].index)]


Out[112]: 
   id  sex
0   1    0
1   1    0
2   1    0
3   1    1
7   3    1
8   3    1
9   3    0

熊猫：按特定标准创建新Frame的最佳方式

5 个答案: