基于R中的时间(分钟)的聚合值

时间:2014-12-09 04:02:27

标签: r dataframe aggregate aggregate-functions

我的数据如下:

    X                time             value1
1    5 2014-07-01 00:02:14          PATECH-AP
2    6 2014-07-01 00:02:14              dlink
3    7 2014-07-01 00:02:14             iptime
4   16 2014-07-01 00:07:15          PATECH-AP
5   17 2014-07-01 00:07:15              dlink
6   18 2014-07-01 00:07:15             iptime
7   20 2014-07-01 00:07:15         JIN iptime
8   28 2014-07-01 00:12:14          PATECH-AP
9   29 2014-07-01 00:12:14              dlink
10  31 2014-07-01 00:12:14             iptime
11  41 2014-07-01 00:17:14          PATECH-AP
12  42 2014-07-01 00:17:14              dlink
13  43 2014-07-01 00:17:14             iptime
14  45 2014-07-01 00:17:14          PATECH-AP
15  53 2014-07-01 00:22:14          PATECH-AP
16  54 2014-07-01 00:22:14              dlink
17  55 2014-07-01 00:22:14          PATECH-AP
18  64 2014-07-01 00:27:13          PATECH-AP
19  65 2014-07-01 00:27:13              dlink
20  66 2014-07-01 00:27:13          U+Net642B
21  67 2014-07-01 00:27:13         JIN iptime
22  76 2014-07-01 00:32:14          PATECH-AP
23  77 2014-07-01 00:32:14              dlink
24  78 2014-07-01 00:32:14         JIN iptime
25  80 2014-07-01 00:32:14             U+zone
26  87 2014-07-01 00:37:14          PATECH-AP
27  88 2014-07-01 00:37:14              dlink
28  90 2014-07-01 00:37:14 Jiny's Room 2.4GHz
29  91 2014-07-01 00:37:14          PATECH-AP
30 101 2014-07-01 00:42:14          PATECH-AP

dput()

structure(list(X = c(5L, 6L, 7L, 16L, 17L, 18L, 20L, 28L, 29L, 
31L, 41L, 42L, 43L, 45L, 53L, 54L, 55L, 64L, 65L, 66L, 67L, 76L, 
77L, 78L, 80L, 87L, 88L, 90L, 91L, 101L), time = structure(c(1L, 
1L, 1L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 5L, 5L, 5L, 
6L, 6L, 6L, 6L, 7L, 7L, 7L, 7L, 8L, 8L, 8L, 8L, 9L), .Label = c("2014-07-01 00:02:14", 
"2014-07-01 00:07:15", "2014-07-01 00:12:14", "2014-07-01 00:17:14", 
"2014-07-01 00:22:14", "2014-07-01 00:27:13", "2014-07-01 00:32:14", 
"2014-07-01 00:37:14", "2014-07-01 00:42:14"), class = "factor"), 
    value1 = structure(c(5L, 1L, 2L, 5L, 1L, 2L, 3L, 5L, 1L, 
    2L, 5L, 1L, 2L, 5L, 5L, 1L, 5L, 5L, 1L, 6L, 3L, 5L, 1L, 3L, 
    7L, 5L, 1L, 4L, 5L, 5L), .Label = c("dlink", "iptime", "JIN iptime", 
    "Jiny's Room 2.4GHz", "PATECH-AP", "U+Net642B", "U+zone"), class = "factor")), .Names = c("X", 
"time", "value1"), class = "data.frame", row.names = c(NA, -30L
))

我想基于时间聚合该数据,然后将value1放在一行中,comma作为分隔符。输出将如下所示:

    X                time             value1
1    5 2014-07-01 00:02:14          PATECH-AP,dlink,iptime         
4   16 2014-07-01 00:07:15          PATECH-AP,dlink,iptime,JIN iptime
8   28 2014-07-01 00:12:14          PATECH-AP, dlink
etc.....

15  53 2014-07-01 00:22:14          PATECH-AP,dlink,PATECH-AP

我不知道那样做,有人可以帮助我吗?

谢谢,

1 个答案:

答案 0 :(得分:4)

如果您需要valuetime变量

df[,2:3] <- lapply(df[,2:3], as.character)
df$X1 <- with(df, ave(X, time, FUN=function(x) x[1]))
aggregate(value1~time+X1, df, FUN=toString)

或者,如果您希望value1列为list

 aggregate(value1~time+X1, df, c)

其他选项为dplyr

library(dplyr)
df %>%
      group_by(time) %>% 
      summarise(X=X[1], value1=toString(value1)) %>%
      head()
#                 time  X                                  value1
#1 2014-07-01 00:02:14  5                PATECH-AP, dlink, iptime
#2 2014-07-01 00:07:15 16    PATECH-AP, dlink, iptime, JIN iptime
#3 2014-07-01 00:12:14 28                PATECH-AP, dlink, iptime
#4 2014-07-01 00:17:14 41     PATECH-AP, dlink, iptime, PATECH-AP
#5 2014-07-01 00:22:14 53             PATECH-AP, dlink, PATECH-AP
#6 2014-07-01 00:27:13 64 PATECH-AP, dlink, U+Net642B, JIN iptime

或使用data.table

library(data.table)
setDT(df)[, list(X=X[1], value1=toString(value1)), by=time]