根据时间和ID列

时间:2017-05-29 09:14:14

标签: r dplyr aggregate

id列标识唯一的观察,而comments列描述观察。时间列对于区分第一个注释和最后一个注释很重要。 我想创建一个列,其中包含有关独特观察的所有已知信息;该行应包含观察(id)的所有先前+当前注释。我怎么能实现这个目标?我尝试了dplyer和滞后,虽然按时排序却没有正确发生。

#--- library
library(dplyr)
library(data.table)

id <- c(1, 1, 1, 2, 2, 56, 56, 56, 107, 1005, 1005, 7, 7, 7, NA)
time <- c("2017-03-25 12:58:37 GMT", "2017-03-24 05:50:22 GMT", "2017-03-23 19:10:01 GMT", "2017-03-24 13:41:18 GMT", "2017-03-26 05:49:37 GMT", "2017-03-23 16:48:04 GMT", "2017-03-23 18:38:19 GMT",
       "2017-03-23 14:50:47 GMT", "2017-03-24 02:02:53 GMT", "2017-03-24 03:10:04 GMT", "2017-03-24 21:01:02 GMT", "2017-03-23 16:16:21 GMT", "2017-03-23 20:42:46 GMT", "2017-03-24 09:03:29 GMT",
       "2017-03-23 15:46:00 GMT")
comments <- c("lajsd", 'asdf', 'qwee', 'xcx', 'serf', 'sdfe', 'hyhds', 'wafd', 'aerfd', 'sefr', 'qdfsfe', 'qwewd', 'wqdse', 'qwddr', 'qdwq')

mytable <- data.table(id, time, comments)

View(mytable)

mytable$time <- as.POSIXct(mytable$time, format = "%Y-%m-%d %H:%M:%S")


mytable %>%
  group_by(id, time) %>%
  mutate(record = lag(comments))

理想的解决方案:

    id       time            comments   record
   <dbl>              <dttm>    <chr>  <chr>
 1     1 2017-03-25 12:58:37    lajsd   qwee, asdf, lajsd
 2     1 2017-03-24 05:50:22     asdf   qwee, asdf
 3     1 2017-03-23 19:10:01     qwee   qwee 
 4     2 2017-03-24 13:41:18      xcx   xcx
 5     2 2017-03-26 05:49:37     serf   xcx, serf
 6    56 2017-03-23 16:48:04     sdfe   wafd, sdfe
 7    56 2017-03-23 18:38:19    hyhds   wafd, sdfe, hyhds
 8    56 2017-03-23 14:50:47     wafd   wafd
 9   107 2017-03-24 02:02:53    aerfd   aerfd
10  1005 2017-03-24 03:10:04     sefr   sefr
11  1005 2017-03-24 21:01:02   qdfsfe   sefr, qdfsfe
12     7 2017-03-23 16:16:21    qwewd   qwewd
13     7 2017-03-23 20:42:46    wqdse   qwewd, wqdse
14     7 2017-03-24 09:03:29    qwddr   qwewd, wqdse, qwddr
15    NA 2017-03-23 15:46:00     qdwq   qdwq

所以我试过

setDT(mytable)[, record := sapply(seq_len(.N), function(x) paste(comments[seq_len(x)], collapse = " ")), by = list(id)]



id                time comments       date     hour            record
 1:    1 2017-03-25 12:58:37    lajsd 2017-03-25 12:58:37             lajsd
 2:    1 2017-03-24 05:50:22     asdf 2017-03-24 05:50:22        lajsd asdf
 3:    1 2017-03-23 19:10:01     qwee 2017-03-23 19:10:01   lajsd asdf qwee
 4:    2 2017-03-24 13:41:18      xcx 2017-03-24 13:41:18               xcx
 5:    2 2017-03-26 05:49:37     serf 2017-03-26 05:49:37          xcx serf
 6:   56 2017-03-23 16:48:04     sdfe 2017-03-23 16:48:04              sdfe
 7:   56 2017-03-23 18:38:19    hyhds 2017-03-23 18:38:19        sdfe hyhds
 8:   56 2017-03-23 14:50:47     wafd 2017-03-23 14:50:47   sdfe hyhds wafd
 9:  107 2017-03-24 02:02:53    aerfd 2017-03-24 02:02:53             aerfd
10: 1005 2017-03-24 03:10:04     sefr 2017-03-24 03:10:04              sefr
11: 1005 2017-03-24 21:01:02   qdfsfe 2017-03-24 21:01:02       sefr qdfsfe
12:    7 2017-03-23 16:16:21    qwewd 2017-03-23 16:16:21             qwewd
13:    7 2017-03-23 20:42:46    wqdse 2017-03-23 20:42:46       qwewd wqdse
14:    7 2017-03-24 09:03:29    qwddr 2017-03-24 09:03:29 qwewd wqdse qwddr
15:   NA 2017-03-23 15:46:00     qdwq 2017-03-23 15:46:00              qdwq

产生几乎令人满意的结果。唯一的问题是使用时间是不可能的。

mytable$date <- as.IDate(mytable$time)
mytable$hour <- as.ITime(mytable$time)
setDT(mytable)[, record := sapply(seq_len(.N), function(x) paste(comments[seq_len(x)], collapse = " ")), by = list(id, date, hour)]

有关包含此内容的任何想法?不能做对..

这有效

mytable$time <- as.POSIXct(mytable$time, format = "%Y-%m-%d %H:%M:%S", tz = 'GMT')
    mytable <- arrange(mytable, id, time)
    setDT(mytable)[, record := sapply(seq_len(.N), function(x) paste(comments[seq_len(x)], collapse = " ")), by = list(id)]

0 个答案:

没有答案