我真的需要帮助解决问题。我有一个看起来像这样的数据集。
Name Sex Total
Anna F 10
Jamie M 2
Jamie F 7
Mike M 13
Sam F 6
Sam M 3
structure(list(Name = c("Anna", "Jamie", "Jamie", "Mike", "Sam", "Sam"),
Sex = c("F", "M", "F", "M", "F", "M"), Total = c(10L, 2L, 7L, 13L, 6L, 3L)),
.Names = c("Name", "Sex", "Total"), class = "data.frame", row.names = c(NA, -6L))
我想要的是获取男性和女性名字,因此结果看起来像......
Name Sex Total
Jamie M 2
Jamie F 7
Sam M 3
Sam F 6
但我真的很难理解如何接近它。
答案 0 :(得分:5)
假设数据存储在d
:
# get a vector (set) of names that are use by both M and F
dual.names <- intersect(d$Name[d$Sex=='M'], d$Name[d$Sex=='F'])
# use set of dual names to filter data
d[d$Name %in% dual.names, ]
答案 1 :(得分:4)
强制性Hadleyverse(dplyr
&amp; tidyr
)回答:
library(tidyr)
library(dplyr)
dat %>%
spread(Sex, Total) %>%
filter(!is.na(M), !is.na(F)) %>%
gather(Sex, Total, M, F) %>%
arrange(Name)
## Name Sex Total
## 1 Jamie M 2
## 2 Jamie F 7
## 3 Sam M 3
## 4 Sam F 6
编辑和很多通过@konvas更好dplyr
方法评论:
dat %>% group_by(Name) %>% filter(length(unique(Sex)) == 2)
编辑,并由@ David的评论进一步完善:
dat %>% group_by(Name) %>% filter(n_distinct(Sex) == 2)
(我可以将积分转移到@konvas&amp; @David?: - )
答案 2 :(得分:2)
您可以使用ave
计算每个名称的不同性别的数量,并仅将那些具有两个性别的子集计算在一起。例如,样本数据
dd<-structure(list(Name = c("Anna", "Jamie", "Jamie", "Mike", "Sam", "Sam"),
Sex = c("F", "M", "F", "M", "F", "M"), Total = c(10L, 2L, 7L, 13L, 6L, 3L)),
.Names = c("Name", "Sex", "Total"), class = "data.frame", row.names = c(NA, -6L))
你可以做到
both<-with(dd, ave(Sex, Name, FUN=function(x) length(unique(x))))=="2"
dd[both, ]
获取
Name Sex Total
2 Jamie M 2
3 Jamie F 7
5 Sam F 6
6 Sam M 3
根据需要。
答案 3 :(得分:2)
加入时间稍晚,但这是data.table
方法
library(data.table)
setDT(df)[ , .SD[length(unique(Sex)) == 2], by = Name]
## Name Sex Total
## 1: Jamie M 2
## 2: Jamie F 7
## 3: Sam F 6
## 4: Sam M 3
或者,如果你没有重复,这里有一个更快的解决方案
setDT(df)[ , .SD[.N == 2], by = Name]