一切正常,但是迭代很慢,有两个for循环。
基本上有2个数据帧,一个有ID和事件时间。另一个具有各种ID和读数(值和时间戳)每10秒左右。
我试图通过匹配ID和在事件时间之前的特定时间间隔的时间将一个表连接到另一个表,比方说20秒。
或者,数据位于oracle SQL服务器中,如果我可以在SQL中执行表连接,那么也可以。
readingdf <- data.frame(sensorID = c('100001','100001','100001','100001','100002','100002','100002','100002'),
readTime = as.POSIXct(c("2017-07-24 04:08:09 EDT","2017-07-24 04:08:19 EDT",
"2017-07-24 04:08:29 EDT","2017-07-24 04:08:39 EDT","2017-07-24 04:08:09 EDT","2017-07-24 04:08:19 EDT",
"2017-07-24 04:08:29 EDT","2017-07-24 04:08:39 EDT"),tz="EST"),
Value = c('17.5','15.6','12.9','12.1','22.2', '24.5','19.7','20.1'))
df <- data.frame(sensorID = c('100001','100002','100001','100002','100001','100002','100001','100001'),
eventTime = as.POSIXct(c("2017-07-24 04:08:23 EDT","2017-07-24 04:08:25 EDT","2017-07-24 07:04:40 EDT",
"2017-07-24 02:19:30 EDT","2017-07-24 04:37:08 EDT","2017-07-24 04:19:59 EDT","2017-07-24 03:26:49 EDT",
"2017-07-24 03:58:17 EDT"),tz="EST"))
答案 0 :(得分:0)
我们可以创建一个新列readTime_expand
,显示readTime
的下一个20秒。之后,根据sensorID
和eventTime = readTime_expand
执行加入。 df2
是最终输出。
library(tidyverse)
readingdf2 <- readingdf %>%
mutate(readTime_end = readTime + 20) %>%
mutate(readTime_expand = map2(readTime, readTime_end, function(x, y){
return(seq(x, y, by = 1))
})) %>%
unnest()
df2 <- df %>%
left_join(readingdf2, by = c("sensorID", "eventTime" = "readTime_expand"))
df2
sensorID eventTime readTime Value readTime_end
1 100001 2017-07-24 04:08:23 2017-07-24 04:08:09 17.5 2017-07-24 04:08:29
2 100001 2017-07-24 04:08:23 2017-07-24 04:08:19 15.6 2017-07-24 04:08:39
3 100002 2017-07-24 04:08:25 2017-07-24 04:08:09 22.2 2017-07-24 04:08:29
4 100002 2017-07-24 04:08:25 2017-07-24 04:08:19 24.5 2017-07-24 04:08:39
5 100001 2017-07-24 07:04:40 <NA> <NA> <NA>
6 100002 2017-07-24 02:19:30 <NA> <NA> <NA>
7 100001 2017-07-24 04:37:08 <NA> <NA> <NA>
8 100002 2017-07-24 04:19:59 <NA> <NA> <NA>
9 100001 2017-07-24 03:26:49 <NA> <NA> <NA>
10 100001 2017-07-24 03:58:17 <NA> <NA> <NA>