使用plyr聚合POSIXct向量

时间:2014-06-25 18:55:18

标签: r plyr

我正在尝试编写一个ddply汇总语句,该语句适用于POSIXct的向量时间。对于每个user.nm我只想获得与其名称相关联的最大和最小时间戳。数据看起来像这样:

    test.data=structure(list(user.nm = structure(c(1L, 1L, 2L, 3L, 4L, 4L), .Label = c("a", 
"b", "c", "d"), class = "factor"), ip.addr.txt = structure(c(1L, 
2L, 3L, 4L, 5L, 5L), .Label = c("a", "b", "c", "d", "e"), class = "factor"), 
    login.dt = structure(c(4L, 3L, 5L, 1L, 2L, 6L), .Label = c("11/20/2013", 
    "12/26/2013", "3/11/2013", "6/25/2013", "6/27/2013", "7/15/2013"
    ), class = "factor"), login.time = structure(c(3L, 4L, 6L, 
    1L, 2L, 5L), .Label = c("10:16:17", "11:07:27", "13:22:32", 
    "13:55:05", "9:23:33", "9:49:23"), class = "factor"), login.sessn.ts = structure(c(1372180920, 
    1363024500, 1372340940, 1384960560, 1388074020, 1373894580
    ), class = c("POSIXct", "POSIXt"), tzone = ""), month = structure(c(3L, 
    4L, 3L, 5L, 1L, 2L), .Label = c("Dec-2013", "Jul-2013", "Jun-2013", 
    "Mar-2013", "Nov-2013"), class = "factor"), quarter = c(2L, 
    1L, 2L, 4L, 4L, 3L), change.label = c(TRUE, TRUE, TRUE, TRUE, 
    TRUE, TRUE)), .Names = c("user.nm", "ip.addr.txt", "login.dt", 
"login.time", "login.sessn.ts", "month", "quarter", "change.label"
), row.names = c(NA, -6L), class = "data.frame")

plyr语句如下所示:

user.changes=ddply(test.data, c("user.nm"), summarize, 
               change.count=sum(ip.label.txt),
               max.change.time=max(login.sessn.ts),
               min.change.time=min(login.sessn.ts))

我得到的错误是:

Error in attributes(out) <- attributes(col) : 
  'names' attribute [9] must be the same length as the vector [2]

我有一些问题解释了这个错误实际意味着什么,显然one person's solution涉及将POSIXct类转换为字符,这在我的情况下并不真正起作用。

是否有人可以阐明如何使这项工作?我也对其他方法持开放态度,我只是喜欢ddply语法的相对简单性。我将在不久的将来使用更多基于时间的数据,因此,我非常感谢任何人对如何使用其他基于R的工具处理此类聚合问题的见解。

1 个答案:

答案 0 :(得分:0)

我使用str检查了您的数据,结果发现您的日期实际上是因素。您可以使用lubridate

制作日期
library(lubridate)
test.data2 <- transform(test.data,lst = dmy_hm(login.sessn.ts))

ddply(test.data2, c("user.nm"), summarize, 
      change.count=sum(ip.addr.txt),
      max.change.time=max(lst),
      min.change.time=min(lst))

  user.nm change.count     max.change.time     min.change.time
1       a            3 2013-11-03 13:55:00 2013-01-06 12:03:44
2       b            3 2013-01-06 08:35:32 2013-01-06 08:35:32
3       c            4 2013-01-11 10:16:00 2013-01-11 10:16:00
4       d           10 2046-11-24 13:24:29 2013-01-12 11:08:04