TLDR:需要通过两个不同的条件来过滤个人
基本上给出以下示例,我需要知道哪个人同时吃了奶酪和面包,并返回了与此相等的行。在示例中,这些是阿里巴巴,玛丽和史蒂夫。
通常,dplyr中的多个过滤条件非常简单,但这遍历了不同的行,因此我发现这很困难。我确实提出了一个长解决方案,但是我敢肯定有一种更有效的方法。
我正在处理一个大型数据集,因此速度至关重要。
set.seed(1111)
df = data.frame(ID = sample(c("bob","steve","mary","alibaba"),20,replace = TRUE))
set.seed(1311)
df$food = sample(c("cheese","bread","olives"),20, replace = TRUE)
# finding which individuals have both cheese and bread
index = df %>% distinct(ID,food, .keep_all = TRUE) %>%
filter(food == "cheese" | food == "olives") %>%
group_by(ID) %>%
summarise(freq = n()) %>%
filter(freq > 1) %>% {as.vector(.$ID)}
# returning the rows for the individuals that have both cheese and bread
df %>% filter(ID %in% index,food == "cheese" | food == "olives")
答案 0 :(得分:0)
在按“ ID”分组后,filter
的那些同时具有“奶酪”,“橄榄”的组用all
进行换行,并同时对第二个表达式({{ 1}})
food %in% c('cheese', 'olives')
-输出
library(dplyr)
df %>%
group_by(ID) %>%
filter(all(c('cheese', 'olives') %in% food), food %in% c('cheese', 'olives'))
或者另一个可能更快的选择是先# A tibble: 13 x 2
# Groups: ID [3]
# ID food
# <chr> <chr>
# 1 alibaba olives
# 2 steve olives
# 3 steve olives
# 4 steve olives
# 5 alibaba cheese
# 6 steve olives
# 7 steve olives
# 8 mary cheese
# 9 alibaba olives
#10 mary olives
#11 steve cheese
#12 alibaba olives
#13 steve olives
,然后进行分组并过滤“食物”中具有2个不同值的那些分组
filter
或带有df %>%
filter(food %in% c('cheese', 'olives')) %>%
group_by(ID) %>%
filter(n_distinct(food) == 2)
data.table