dplyr / tidyr将两列汇总为一个命名列表列

时间:2019-06-18 19:34:47

标签: r dplyr tidyr

想象一下这个数据帧:

df <- tibble(
  key = c(rep(1, 3), rep(2, 3), rep(3, 3)),
  date = rep(Sys.Date(), 9),
  hour = rep(c('00', '01', '02'), 3),
  value = rep(c(8, 9, 10), 3)
  )

我想要输出,以便组摘要列是小时和值的命名列表。对于每个组,就像我要这样做一样:

as.list(setNames(df$value[df$key == 1], df$hour[df$key == 1]))
$`00`
[1] 8

$`01`
[1] 9

$`02`
[1] 10

遵循这些原则,但实际上是有效的:

df %>%
  group_by(key, date) %>%
  summarise(
    daily_value = sum(value),
    hourly_values = as.list(setNames(value, hour))
    )

也可以使用nest或类似的tidyr解决方案。

编辑:输出应与此处产生的内容相同:

outputDf <- df %>%
  group_by(key, date) %>%
  summarise(daily_value = sum(value))

outputDf$hourly_value <- list(
  as.list(setNames(df$value[df$key == 1], df$hour[df$key == 1])),
  as.list(setNames(df$value[df$key == 2], df$hour[df$key == 2])),
  as.list(setNames(df$value[df$key == 3], df$hour[df$key == 3]))
  )

outputDf
# A tibble: 3 x 4
# Groups:   key [?]
    key       date daily_value hourly_value
  <dbl>     <date>       <dbl>       <list>
1     1 2019-06-18          27   <list [3]>
2     2 2019-06-18          27   <list [3]>
3     3 2019-06-18          27   <list [3]>

outputDf$hourly_value
[[1]]
[[1]]$`00`
[1] 8

[[1]]$`01`
[1] 9

[[1]]$`02`
[1] 10


[[2]]
[[2]]$`00`
[1] 8

[[2]]$`01`
[1] 9

[[2]]$`02`
[1] 10


[[3]]
[[3]]$`00`
[1] 8

[[3]]$`01`
[1] 9

[[3]]$`02`
[1] 10

2 个答案:

答案 0 :(得分:2)

我们需要用list包装,因为summarise希望每组返回一行。对于as.list,它将是list,其中length与组的行数相同。通过将其包装为list,我们确保summarise的长度为1

library(dplyr)  
df %>% 
   group_by(key, date) %>% 
   summarise(daily_value = sum(value), 
              hourly_values = list(as.list(setNames(value, hour))))

答案 1 :(得分:0)

df <- tibble(
  key = c(rep(1, 3), rep(2, 3), rep(3, 3)),
  date = rep(Sys.Date(), 9),
  hour = rep(c('00', '01', '02'), 3),
  value = rep(c(8, 9, 10), 3)
)

df2 <- df %>% 
  group_by(key, date) %>% 
  mutate(daily_value = sum(value),
  hourly_value = as.list(value)) #create a list column

names(df2$hourly_value) <- df$hour #give names to the list column