鉴于输入数据框
library(dplyr)
( df <- data_frame(id = c(1,1,1,2,2,3), y = letters[1:6]) )
# # A tibble: 6 × 2
# id y
# <dbl> <chr>
# 1 1 a
# 2 1 b
# 3 1 c
# 4 2 d
# 5 2 e
# 6 3 f
假设只想使用两个最常见的ID df[, c("id", "y")]
id
和1
来获取2
的子集:
df %>% group_by(id) %>% tally %>% arrange(desc(n)) %>% head(2) %>% .$id %>% print -> ids #*
# [1] 1 2
问题:有没有办法在filter
内的谓词函数中使用管道:
df %>% filter(
id %in% df %>% group_by(id) %>% tally %>% arrange(desc(n)) %>% head(2) %>% .$id )
# Error: no applicable method for 'group_by_' applied to an object of class "logical"
df %>% filter(
id %in% (df %>% group_by(id) %>% tally %>% arrange(desc(n)) %>% head(2) %>% .$id) )
# Error: cannot handle
df %>% filter(
id %in% {df %>% group_by(id) %>% tally %>% arrange(desc(n)) %>% head(2) %>% .$id} )
# Error: cannot handle
我的意思是,最后两个谓词似乎在filter
之外的预期工作:
df$id %in% (df %>% group_by(id) %>% tally %>% arrange(desc(n)) %>% head(2) %>% .$id)
# [1] TRUE TRUE TRUE TRUE TRUE FALSE
df$id %in% {df %>% group_by(id) %>% tally %>% arrange(desc(n)) %>% head(2) %>% .$id}
# [1] TRUE TRUE TRUE TRUE TRUE FALSE
旁注:我知道我可以使用临时变量ids
:
df %>% filter(id %in% ids) # *ids <- c(1,2)
或者我可以使用*_join
:
df %>% inner_join(
df %>% group_by(id) %>% tally %>% arrange(desc(n)) %>% head(2) %>% select(-n))
两者都产生预期输出:
# # A tibble: 5 × 2
# id y
# <dbl> <chr>
# 1 1 a
# 2 1 b
# 3 1 c
# 4 2 d
# 5 2 e
答案 0 :(得分:3)
不要因为它自己而复杂化。
ids <- (df %>% count(id) %>% arrange(n) %>% tail(2))$id
filter(df, id %in% ids)
答案 1 :(得分:1)
可以进行连续链接但不会过多地依赖dplyr
/ filter
,因为存在其他解决方案 - 仍然与%>%
兼容:
df %>%
group_by(id) %>%
tally %>%
arrange(desc(n)) %>%
head(2) %>%
.$id %>%
is.element(df$id, .) %>%
subset(df, .)
Source: local data frame [5 x 2]
id y
(dbl) (chr)
1 1 a
2 1 b
3 1 c
4 2 d
5 2 e
当链条变长时,链条链接然后缠绕链条会很麻烦。
对于这样的重新使用,我宁愿选择基本的R one liner:
df[df$id %in% as.integer(names(tail(sort(table(df$id)),2))),]