符合R中序列的汇总数据

时间:2019-03-14 11:52:50

标签: r dplyr data.table

这部分mydata

dat=structure(list(spent = structure(c(2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 3L, 
3L, 3L, 3L, 3L, 3L), .Label = c("29.74", "73.5", "73.71"), class = "factor"), 
    date = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
    1L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 2L, 2L, 2L, 
    2L, 2L, 2L), .Label = c("04.10.2018", "08.10.2018", "26.09.2018"
    ), class = "factor"), utc_time.y = structure(c(5L, 8L, 2L, 
    1L, 4L, 4L, 9L, 10L, 6L, 3L, 7L, 5L, 8L, 2L, 1L, 4L, 4L, 
    9L, 10L, 6L, 3L, 7L, 5L, 8L, 2L, 1L, 4L, 4L), .Label = c("01.10.2018 22:26", 
    "05.10.2018 22:34", "05.10.2018 22:35", "06.10.2018 13:43", 
    "07.10.2018 15:55", "30.09.2018 11:22", "30.09.2018 11:23", 
    "30.09.2018 12:00", "30.09.2018 12:23", "30.09.2018 18:12"
    ), class = "factor"), real = 501:528, id = c(238430441353501, 
    238430441353501, 238430441353501, 238430441353501, 238430441353501, 
    238430441353501, 238430441353501, 238430441353501, 238430441353501, 
    238430441353501, 238430441353501, 238430441353501, 238430441353501, 
    238430441353501, 238430441353501, 238430441353501, 238430441353501, 
    238430441353501, 238430441353501, 238430441353501, 238430441353501, 
    238430441353501, 238430441353501, 238430441353501, 238430441353501, 
    238430441353501, 238430441353501, 238430441353501)), .Names = c("spent", 
"date", "utc_time.y", "real", "id"), class = "data.frame", row.names = c(NA, 
-28L))

如何用它们制作一些序列。

  1. 针对每个ID按日期总计汇总列支出(= 1577)
  2. 分别对每个id按utc_time.y按求和来汇总实数列(= 14406)
  3. 如果聚合数据为实值>,则此ID创建标记1,否则为0

I.E。输出 (id是字符)

spent       date       utc_time.y real           id flag
1  73.50 04.10.2018 07.10.2018 15:55  501 2.384304e+14    1
2  73.50 04.10.2018 30.09.2018 12:00  502 2.384304e+14    1
3  73.50 04.10.2018 05.10.2018 22:34  503 2.384304e+14    1
4  73.50 04.10.2018 01.10.2018 22:26  504 2.384304e+14    1
5  73.50 04.10.2018 06.10.2018 13:43  505 2.384304e+14    1
6  73.50 04.10.2018 06.10.2018 13:43  506 2.384304e+14    1
7  73.50 04.10.2018 30.09.2018 12:23  507 2.384304e+14    1
8  73.50 04.10.2018 30.09.2018 18:12  508 2.384304e+14    1
9  73.50 04.10.2018 30.09.2018 11:22  509 2.384304e+14    1
10 73.50 04.10.2018 05.10.2018 22:35  510 2.384304e+14    1
11 73.50 04.10.2018 30.09.2018 11:23  511 2.384304e+14    1
12 29.74 26.09.2018 07.10.2018 15:55  512 2.384304e+14    1
13 29.74 26.09.2018 30.09.2018 12:00  513 2.384304e+14    1
14 29.74 26.09.2018 05.10.2018 22:34  514 2.384304e+14    1
15 29.74 26.09.2018 01.10.2018 22:26  515 2.384304e+14    1
16 29.74 26.09.2018 06.10.2018 13:43  516 2.384304e+14    1
17 29.74 26.09.2018 06.10.2018 13:43  517 2.384304e+14    1
18 29.74 26.09.2018 30.09.2018 12:23  518 2.384304e+14    1
19 29.74 26.09.2018 30.09.2018 18:12  519 2.384304e+14    1
20 29.74 26.09.2018 30.09.2018 11:22  520 2.384304e+14    1
21 29.74 26.09.2018 05.10.2018 22:35  521 2.384304e+14    1
22 29.74 26.09.2018 30.09.2018 11:23  522 2.384304e+14    1
23 73.71 08.10.2018 07.10.2018 15:55  523 2.384304e+14    1
24 73.71 08.10.2018 30.09.2018 12:00  524 2.384304e+14    1
25 73.71 08.10.2018 05.10.2018 22:34  525 2.384304e+14    1
26 73.71 08.10.2018 01.10.2018 22:26  526 2.384304e+14    1
27 73.71 08.10.2018 06.10.2018 13:43  527 2.384304e+14    1
28 73.71 08.10.2018 06.10.2018 13:43  528 2.384304e+14    1

1 个答案:

答案 0 :(得分:2)

您可能可以执行以下操作:

setDT(dat)[, s1 := sum(spent), by=.(id, date)][, 
    s2 := sum(real), by=.(id, utc_time.y)][, 
        flag := +(s2 > s1)]