我有这样的数据表:
df.in <-structure(list(id = c(1, 1, 2, 3), x1 = c(0, 1, NA, 0), x2 = c("Lorem ipsum dolor sit amet",
"dolore eu fugiat nulla pariatur", "Sed ut perspiciatis unde omnis",
"Nemo enim ipsam voluptatem"), x3 = c("Donec ullamcorper elit quis risus",
"Donec ullamcorper elit quis risus", "Curabitur euismod", "Mauris felis orci"
)), row.names = c(NA, -4L), class = c("tbl_df", "tbl", "data.frame"
))
> df.in
# A tibble: 4 x 4
id x1 x2 x3
<dbl> <dbl> <chr> <chr>
1 1 0 Lorem ipsum dolor sit amet Donec ullamcorper elit quis risus
2 1 1 dolore eu fugiat nulla pariatur Donec ullamcorper elit quis risus
3 2 NA Sed ut perspiciatis unde omnis Curabitur euismod
4 3 0 Nemo enim ipsam voluptatem Mauris felis orci
我正在尝试dplyr::group_by()
来获取此信息:
df.out <- structure(list(id = c(1, 2, 3), x1 = c(1, NA, 0), x2 = c("dolore eu fugiat nulla pariatur",
"Sed ut perspiciatis unde omnis", "Nemo enim ipsam voluptatem"
), x3 = c("Donec ullamcorper elit quis risus", "Curabitur euismod",
"Mauris felis orci")), row.names = c(NA, -3L), class = c("tbl_df",
"tbl", "data.frame"))
> df.out
# A tibble: 3 x 4
id x1 x2 x3
<dbl> <dbl> <chr> <chr>
1 1 1 dolore eu fugiat nulla pariatur Donec ullamcorper elit quis risus
2 2 NA Sed ut perspiciatis unde omnis Curabitur euismod
3 3 0 Nemo enim ipsam voluptatem Mauris felis orci
我可以做到:
df.in %>%
group_by(id) %>%
summarise(x1 = max(x1))
但是,我该怎么办
x2
,x3
以保留出现max(x1)
的值吗?x
都需要相同的逻辑。有没有办法做summarize_all
?答案 0 :(得分:1)
我们可以在max
中用summarise_at
创建条件
library(dplyr)
df.in %>%
group_by(id) %>%
summarise_at(3:4, funs(if(n() == 1) . else .[x1 == max(x1, na.rm = TRUE)]))
我们也可以使用summarise_at
或filter
来代替slice
df.in %>%
group_by(id) %>%
filter((n() == 1) | (x1 == max(x1, na.rm = TRUE)))
或使用slice
df.in %>%
group_by(id) %>%
slice(which(n() == 1 | (x1 == max(x1, na.rm = TRUE)))[1])