df <- data.frame(id = c(1, 2, 3, 3, 3, 4), gender = c("Male", "Female", "Both", "Male", "Female", "Female"))
ids <- unique(df$id)
> df
id gender
1 1 Male
2 2 Female
3 3 Both
4 3 Male
5 3 Female
6 4 Female
对于每个唯一的id
,我想确保如果相应的gender
是Both
,Male
和Female
,那么我需要删除与Both
对应的行。换句话说,我想要的输出是:
> df
id gender
1 1 Male
2 2 Female
3 3 Male
4 3 Female
5 4 Female
我试过写一个循环:
按df
对id
进行子集化,并将每个子集存储到名为sub
的列表中
在每个sub
内,检查性别是否包含“两个”,“男性”和“女性”
如果是,请删除包含gender =“Both”的行
重新组合data.frame
但是,以下代码并不真正起作用且非常笨重...我想知道在group_by
中使用dplyr
是否有更简单的方法?
sub <- list()
for(i in 1:length(ids)){
sub[[i]] <- subset(df, id %in% ids[i])
if(all(grepl(sub[[i]]$gender, c("Both", "Male", "Female")))){
sub[[i]] <- sub[[i]][-which(sub[[i]]$gender == "Both"), ]
}else sub[[i]] = sub[[i]]
}
答案 0 :(得分:2)
使用dplyr
df %>%
group_by(id) %>%
mutate(A = ifelse(length(unique(gender)) >= 3 & gender == 'Both', F, T)) %>%
filter(A) %>%
select(-A)
# A tibble: 5 x 2
# Groups: id [4]
id gender
<dbl> <chr>
1 1 Male
2 2 Female
3 3 Male
4 3 Female
5 4 Female
答案 1 :(得分:0)
除了tidyverse解决方案,这里还有一个使用lapply
的解决方案:
result <- lapply(ids,function(x){
tmp <- df[df$id == x,]
if(all(c("Both","Male", "Female") %in% tmp$gender)){
tmp <- tmp[tmp$gender != "Both",]
}
return(tmp)
})
do.call("rbind",result)
# id gender
# 1 1 Male
# 2 2 Female
# 4 3 Male
# 5 3 Female
# 6 4 Female