我经常希望在"单步"中执行tidyr::spread
和dplyr::summarise
按组聚合数据我想要的内容显示在expected
中。我可以通过单独执行expected
和summarise
来获得spread
并将结果与dplyr::full_join
合并,但我正在寻找避免full_join <的替代方法/ strong>即可。不需要善意的一步法。
df <- data.frame(
id = rep(letters[1], 2),
val1 = c(10, 20),
val2 = c(100, 200),
key = c("A", "B"),
value = c(1, 2))
library(tidyverse)
result1 <- df %>%
group_by(id) %>%
summarise(
val1 = min(val1),
val2 = max(val2)
)
# A tibble: 1 x 3
# id val1 val2
# <fctr> <dbl> <dbl>
# 1 a 10.0 200
result2 <- df %>%
select(id, key, value) %>%
group_by(id) %>%
spread(key, value)
# A tibble: 1 x 3
# Groups: id [1]
# id A B
# * <fctr> <dbl> <dbl>
# 1 a 1.00 2.00
expected <- full_join(result1, result2, by="id")
# A tibble: 1 x 5
# id val1 val2 A B
# <fctr> <dbl> <dbl> <dbl> <dbl>
# 1 a 10.0 200 1.00 2.00
答案 0 :(得分:5)
我怀疑您的数据可能有更多边缘情况需要进行一些修改,但为什么不只是spread
然后summarise
?您可以为每个变量单独指定摘要函数,因此对于您实际上不需要计算任何内容的A
和B
(我假设),您可以删除所有NA
:
df %>%
spread("key", "value") %>%
group_by(id) %>%
summarise(
val1 = min(val1),
val2 = max(val2),
A = mean(A, na.rm = TRUE),
B = mean(B, na.rm = TRUE)
)
# A tibble: 1 x 5
id val1 val2 A B
<fct> <dbl> <dbl> <dbl> <dbl>
1 a 10.0 200 1.00 2.00
答案 1 :(得分:0)
自我回答:这是一种适用于tidyr::nest
的方法,但它似乎“混乱”而且不是更好
df %>%
group_by(id) %>%
nest() %>%
mutate(
min_vals = map(data, ~.x %>% summarise(min_val = min(val1), max_val = max(val2))),
data = map(data, ~select(.x, key, value) %>% spread(key, value))
) %>%
unnest()
# A tibble: 1 x 5
# id A B min_val max_val
# <fctr> <dbl> <dbl> <dbl> <dbl>
# 1 a 1.00 2.00 10.0 200
答案 2 :(得分:0)
使用do
的另一种方法:
res <- df %>%
group_by(id) %>%
summarise(
val1 = min(val1),
val2 = max(val2),
key = list(key),
value = list(value)
) %>% group_by(id, val1, val2) %>%
do( matrix(.$value[[1]], nrow=1) %>% as.data.frame %>% setNames(as.character(.$key[[1]])) )