Question

我有这些数据：

thedat <- structure(list(id = c("         w12", "         w12", "         w12", 
"          w11", "           w3", "          w3", "        w12", 
"         w45", "        w24", "       w24", "        w24", "        w08", 
"         w3", "         w3", "         w11"), time = structure(c(1559329080, 
1559363580, 1559416140, 1559329380, 1559278020, 1559413920, 1559285100, 
1559322660, 1559417460, 1559450220, 1559500980, 1559500980, 1559278020, 
1559413920, 1559276700), class = c("POSIXct", "POSIXt"), tzone = ""), 
    x = c(28.03333, 18.45, 3.85, 27.95, 42.216667, 4.466667, 
    64.25, 53.81667, 27.483333, 18.383333, 4.283333, 4.28333336, 
    66.21667, 28.46667, 66.58333)), .Names = c("id", "time", 
"x"), class = "data.frame", row.names = c(NA, -15L))

对于每个id，我想得到x的累计总和。所以对id = w11我会得到：

w11, 2019-05-31 05:25:00,  66.58333,
w11, 2019-05-31 20:03:00,  94.48333

我试过

ddply(thedat, .(id), summarise, 
               time = unique(time), 
               answer = cumsum(x))

但这并没有让我得到我想要的东西。任何帮助表示感谢。

Answer 1

问题是id字符串中的空格数量不同。删除它们：

thedat$id <- gsub(" ", "", thedat$id)
thedat <- thedat[order(thedat$time),]

如果计算累积总和，使用transform代替summarise似乎更明智：

library(plyr)
ddply(thedat, .(id), transform, 
      answer = cumsum(x))

    id                time         x     answer
1  w08 2019-06-02 20:43:00  4.283333   4.283333
2  w11 2019-05-31 06:25:00 66.583330  66.583330
3  w11 2019-05-31 21:03:00 27.950000  94.533330
4  w12 2019-05-31 08:45:00 64.250000  64.250000
5  w12 2019-05-31 20:58:00 28.033330  92.283330
6  w12 2019-06-01 06:33:00 18.450000 110.733330
7  w12 2019-06-01 21:09:00  3.850000 114.583330
8  w24 2019-06-01 21:31:00 27.483333  27.483333
9  w24 2019-06-02 06:37:00 18.383333  45.866666
10 w24 2019-06-02 20:43:00  4.283333  50.149999
11  w3 2019-05-31 06:47:00 42.216667  42.216667
12  w3 2019-05-31 06:47:00 66.216670 108.433337
13  w3 2019-06-01 20:32:00  4.466667 112.900004
14  w3 2019-06-01 20:32:00 28.466670 141.366674
15 w45 2019-05-31 19:11:00 53.816670  53.816670

随时间累积的总和

1 个答案: