按R中的唯一值对列条目进行分组

时间:2013-07-08 12:27:07

标签: r datetime unique

我有一个按日期时间列排序的日期框架,其中包含一个日期时间戳的多行。我想将每个单独的时间戳压缩成一行。数据框包含许多列数据,其中一些仅在时间戳更改时更改,但是,即使在唯一的时间戳(例如c1a-c2b)中,也有其他列在每行上具有唯一值。对于这些列,我想在我的数据集中为所有已压缩为一行的行添加一个求和值(注意:每个唯一日期时间戳的行数不同)。

示例数据:

Data <- structure(list(datetime = structure(c(2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L), .Label = c("01/04/2011 00:13", "31/03/2011 23:14"
), class = "factor"), dist = c(210L, 210L, 210L, 210L, 210L, 
210L, 210L, 210L, 210L, 210L, 210L, 210L, 210L, 210L, 210L, 210L, 
210L, 210L, 210L, 210L, 215L, 215L, 215L, 215L, 215L, 215L, 215L, 
215L, 215L, 215L, 215L, 215L, 215L, 215L, 215L, 215L, 215L, 215L, 
215L, 215L, 215L, 215L, 215L), n = c(8L, 8L, 8L, 8L, 8L, 8L, 
8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 5L, 5L, 
5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 
5L, 5L, 5L, 5L, 5L), c1a = c(184L, 184L, 200L, 200L, 200L, 220L, 
220L, 220L, 220L, 220L, 220L, 220L, 220L, 220L, 220L, 200L, 200L, 
200L, 200L, 200L, 200L, 200L, 200L, 200L, 100L, 100L, 100L, 100L, 
100L, 100L, 100L, 100L, 100L, 100L, 70L, 70L, 70L, 70L, 70L, 
70L, 70L, 70L, 70L), c1b = c(18.4, 18.4, 20, 20, 20, 22, 22, 
22, 22, 22, 22, 22, 22, 22, 22, 20, 20, 20, 20, 20, 20, 20, 20, 
20, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 7, 7, 7, 7, 7, 7, 
7, 7, 7), c2a = c(552L, 552L, 600L, 600L, 600L, 660L, 660L, 660L, 
660L, 660L, 660L, 660L, 660L, 660L, 660L, 600L, 600L, 600L, 600L, 
600L, 600L, 600L, 600L, 600L, 300L, 300L, 300L, 300L, 300L, 300L, 
300L, 300L, 300L, 300L, 210L, 210L, 210L, 210L, 210L, 210L, 210L, 
210L, 210L), c2b = c(55.2, 55.2, 60, 60, 60, 66, 66, 66, 66, 
66, 66, 66, 66, 66, 66, 60, 60, 60, 60, 60, 60, 60, 60, 60, 30, 
30, 30, 30, 30, 30, 30, 30, 30, 30, 21, 21, 21, 21, 21, 21, 21, 
21, 21)), .Names = c("datetime", "dist", "n", "c1a", "c1b", "c2a", 
"c2b"), class = "data.frame", row.names = c(NA, -43L))

读起来像这样:

datetime            dist    n   c1a c1b     c2a c2b
31/03/2011 23:14    210     8   184 18.4    552 55.2
31/03/2011 23:14    210     8   184 18.4    552 55.2
31/03/2011 23:14    210     8   200 20      600 60       etc...

在我的输出数据框中,我还希望最后得到一个新列,列出原始日期框架中每个唯一日期时间戳的行数。

我想最终得到的例子:

 dt1               dist  n  c1a     c1b     c2a     c2b     row_sum
 31/03/2011 23:14   210  8  4168    416.8   12504   1250.4  20
 01/04/2011 00:13   215  5  2430    243     7290    729     23

我看过像to.period这样的函数,但它们并没有完全符合我的要求。我非常感谢任何建议。谢谢。

2 个答案:

答案 0 :(得分:3)

Data$datetime <- as.POSIXct(Data$datetime,format="%d/%m/%Y %H:%M",tz="GMT")

library(plyr)
ddply(Data,.(datetime,dist,n),summarise,
      c1a = sum(c1a),
      c1b = sum(c1b),
      c2a = sum(c2a),
      c2b = sum(c2b),
      row_sum = length(dist))

#             datetime dist n  c1a   c1b   c2a    c2b row_sum
#1 2011-03-31 23:14:00  210 8 4168 416.8 12504 1250.4      20
#2 2011-04-01 00:13:00  215 5 2430 243.0  7290  729.0      23

答案 1 :(得分:2)

您可以使用data.table来执行此操作

require(data.table)
Data <- as.data.table(Data)
setkeyv(Data, c("datetime", "dist", "n"))
Data[ ,c(lapply(.SD, sum), list(row_sum = .N)), by = "datetime,dist,n"]
##            datetime dist n  c1a   c1b   c2a    c2b row_sum
## 1: 01/04/2011 00:13  215 5 2430 243.0  7290  729.0      23
## 2: 31/03/2011 23:14  210 8 4168 416.8 12504 1250.4      20