id列标识唯一的观察,而comments列描述观察。时间列对于区分第一个注释和最后一个注释很重要。 我想创建一个列,其中包含有关独特观察的所有已知信息;该行应包含观察(id)的所有先前+当前注释。我怎么能实现这个目标?我尝试了dplyer和滞后,虽然按时排序却没有正确发生。
#--- library
library(dplyr)
library(data.table)
id <- c(1, 1, 1, 2, 2, 56, 56, 56, 107, 1005, 1005, 7, 7, 7, NA)
time <- c("2017-03-25 12:58:37 GMT", "2017-03-24 05:50:22 GMT", "2017-03-23 19:10:01 GMT", "2017-03-24 13:41:18 GMT", "2017-03-26 05:49:37 GMT", "2017-03-23 16:48:04 GMT", "2017-03-23 18:38:19 GMT",
"2017-03-23 14:50:47 GMT", "2017-03-24 02:02:53 GMT", "2017-03-24 03:10:04 GMT", "2017-03-24 21:01:02 GMT", "2017-03-23 16:16:21 GMT", "2017-03-23 20:42:46 GMT", "2017-03-24 09:03:29 GMT",
"2017-03-23 15:46:00 GMT")
comments <- c("lajsd", 'asdf', 'qwee', 'xcx', 'serf', 'sdfe', 'hyhds', 'wafd', 'aerfd', 'sefr', 'qdfsfe', 'qwewd', 'wqdse', 'qwddr', 'qdwq')
mytable <- data.table(id, time, comments)
View(mytable)
mytable$time <- as.POSIXct(mytable$time, format = "%Y-%m-%d %H:%M:%S")
mytable %>%
group_by(id, time) %>%
mutate(record = lag(comments))
理想的解决方案:
id time comments record
<dbl> <dttm> <chr> <chr>
1 1 2017-03-25 12:58:37 lajsd qwee, asdf, lajsd
2 1 2017-03-24 05:50:22 asdf qwee, asdf
3 1 2017-03-23 19:10:01 qwee qwee
4 2 2017-03-24 13:41:18 xcx xcx
5 2 2017-03-26 05:49:37 serf xcx, serf
6 56 2017-03-23 16:48:04 sdfe wafd, sdfe
7 56 2017-03-23 18:38:19 hyhds wafd, sdfe, hyhds
8 56 2017-03-23 14:50:47 wafd wafd
9 107 2017-03-24 02:02:53 aerfd aerfd
10 1005 2017-03-24 03:10:04 sefr sefr
11 1005 2017-03-24 21:01:02 qdfsfe sefr, qdfsfe
12 7 2017-03-23 16:16:21 qwewd qwewd
13 7 2017-03-23 20:42:46 wqdse qwewd, wqdse
14 7 2017-03-24 09:03:29 qwddr qwewd, wqdse, qwddr
15 NA 2017-03-23 15:46:00 qdwq qdwq
所以我试过
setDT(mytable)[, record := sapply(seq_len(.N), function(x) paste(comments[seq_len(x)], collapse = " ")), by = list(id)]
id time comments date hour record
1: 1 2017-03-25 12:58:37 lajsd 2017-03-25 12:58:37 lajsd
2: 1 2017-03-24 05:50:22 asdf 2017-03-24 05:50:22 lajsd asdf
3: 1 2017-03-23 19:10:01 qwee 2017-03-23 19:10:01 lajsd asdf qwee
4: 2 2017-03-24 13:41:18 xcx 2017-03-24 13:41:18 xcx
5: 2 2017-03-26 05:49:37 serf 2017-03-26 05:49:37 xcx serf
6: 56 2017-03-23 16:48:04 sdfe 2017-03-23 16:48:04 sdfe
7: 56 2017-03-23 18:38:19 hyhds 2017-03-23 18:38:19 sdfe hyhds
8: 56 2017-03-23 14:50:47 wafd 2017-03-23 14:50:47 sdfe hyhds wafd
9: 107 2017-03-24 02:02:53 aerfd 2017-03-24 02:02:53 aerfd
10: 1005 2017-03-24 03:10:04 sefr 2017-03-24 03:10:04 sefr
11: 1005 2017-03-24 21:01:02 qdfsfe 2017-03-24 21:01:02 sefr qdfsfe
12: 7 2017-03-23 16:16:21 qwewd 2017-03-23 16:16:21 qwewd
13: 7 2017-03-23 20:42:46 wqdse 2017-03-23 20:42:46 qwewd wqdse
14: 7 2017-03-24 09:03:29 qwddr 2017-03-24 09:03:29 qwewd wqdse qwddr
15: NA 2017-03-23 15:46:00 qdwq 2017-03-23 15:46:00 qdwq
产生几乎令人满意的结果。唯一的问题是使用时间是不可能的。
mytable$date <- as.IDate(mytable$time)
mytable$hour <- as.ITime(mytable$time)
setDT(mytable)[, record := sapply(seq_len(.N), function(x) paste(comments[seq_len(x)], collapse = " ")), by = list(id, date, hour)]
有关包含此内容的任何想法?不能做对..
这有效
mytable$time <- as.POSIXct(mytable$time, format = "%Y-%m-%d %H:%M:%S", tz = 'GMT')
mytable <- arrange(mytable, id, time)
setDT(mytable)[, record := sapply(seq_len(.N), function(x) paste(comments[seq_len(x)], collapse = " ")), by = list(id)]