按日期折叠并聚合多个行值

时间:2016-06-30 16:35:36

标签: r

我有一个如下所示的数据集:

date, location, value, tally, score
2016-06-30T09:30Z, home, foo, 1,
2016-06-30T12:30Z, work, foo, 2,
2016-06-30T19:30Z, home, bar, , 5

我需要将这些行聚合在一起,以获得如下结果:

date, location, value, tally, score
2016-06-30, [home, work], [foor, bar], 3, 5

对我来说有几个挑战:

  • 结果行(每日汇总)必须包含当天的行(2016-06-30在上面的示例中
  • 某些行(字符串)将生成一个包含当天所有值的数组
  • 其他一些(整数)会产生一笔款项

我看过dplyr,如果可能的话,我想在R中这样做。

感谢您的帮助!

编辑:

这是数据的dput

structure(list(date = structure(1:3, .Label = c("2016-06-30T09:30Z", 
"2016-06-30T12:30Z", "2016-06-30T19:30Z"), class = "factor"), 
    location = structure(c(1L, 2L, 1L), .Label = c("home", "work"
    ), class = "factor"), value = structure(c(2L, 2L, 1L), .Label = c("bar", 
    "foo"), class = "factor"), tally = c(1L, 2L, NA), score = c(NA, 
    NA, 5L)), .Names = c("date", "location", "value", "tally", 
"score"), class = "data.frame", row.names = c(NA, -3L))

1 个答案:

答案 0 :(得分:1)

mydat<-structure(list(date = structure(1:3, .Label = c("2016-06-30T09:30Z", 
                                                       "2016-06-30T12:30Z", "2016-06-30T19:30Z"), class = "factor"), 
                      location = structure(c(1L, 2L, 1L), .Label = c("home", "work"
                      ), class = "factor"), value = structure(c(2L, 2L, 1L), .Label = c("bar", 
                                                                                        "foo"), class = "factor"), tally = c(1L, 2L, NA), score = c(NA, 
                                                                                                                                                    NA, 5L)), .Names = c("date", "location", "value", "tally", 
                                                                                                                                                                         "score"), class = "data.frame", row.names = c(NA, -3L))

mydat$date <- as.Date(mydat$date)

require(data.table)
mydat.dt <- data.table(mydat)
mydat.dt <- mydat.dt[, lapply(.SD, paste0, collapse=" "), by = date]

cbind(mydat.dt, aggregate(mydat[,c("tally", "score")], by=list(mydat$date), FUN = sum, na.rm=T)[2:3])

给你:

         date       location       value tally score
1: 2016-06-30 home work home foo foo bar     3     5

请注意,如果你想要的话,你可以在重塑data.table的过程中一步到位,但我发现这是一个更快捷,更简单的方法,让我分两步完成同样的事情。