基于一列和最近未结束的时间加入

时间:2017-08-18 20:39:15

标签: sql r oracle dplyr sqldf

一切正常,但是迭代很慢,有两个for循环。

基本上有2个数据帧,一个有ID和事件时间。另一个具有各种ID和读数(值和时间戳)每10秒左右。

我试图通过匹配ID和在事件时间之前的特定时间间隔的时间将一个表连接到另一个表,比方说20秒。

或者,数据位于oracle SQL服务器中,如果我可以在SQL中执行表连接,那么也可以。

readingdf <- data.frame(sensorID = c('100001','100001','100001','100001','100002','100002','100002','100002'),
                     readTime = as.POSIXct(c("2017-07-24 04:08:09 EDT","2017-07-24 04:08:19 EDT",
                     "2017-07-24 04:08:29 EDT","2017-07-24 04:08:39 EDT","2017-07-24 04:08:09 EDT","2017-07-24 04:08:19 EDT",
                     "2017-07-24 04:08:29 EDT","2017-07-24 04:08:39 EDT"),tz="EST"),
                     Value = c('17.5','15.6','12.9','12.1','22.2', '24.5','19.7','20.1'))


df <- data.frame(sensorID = c('100001','100002','100001','100002','100001','100002','100001','100001'),
                     eventTime = as.POSIXct(c("2017-07-24 04:08:23 EDT","2017-07-24 04:08:25 EDT","2017-07-24 07:04:40 EDT",
                     "2017-07-24 02:19:30 EDT","2017-07-24 04:37:08 EDT","2017-07-24 04:19:59 EDT","2017-07-24 03:26:49 EDT",
                     "2017-07-24 03:58:17 EDT"),tz="EST"))

1 个答案:

答案 0 :(得分:0)

我们可以创建一个新列readTime_expand,显示readTime的下一个20秒。之后,根据sensorIDeventTime = readTime_expand执行加入。 df2是最终输出。

library(tidyverse)

readingdf2 <- readingdf %>%
  mutate(readTime_end = readTime + 20) %>%
  mutate(readTime_expand = map2(readTime, readTime_end, function(x, y){
    return(seq(x, y, by = 1))
  })) %>%
  unnest() 

df2 <- df %>%
  left_join(readingdf2, by = c("sensorID", "eventTime" = "readTime_expand"))

df2
   sensorID           eventTime            readTime Value        readTime_end
1    100001 2017-07-24 04:08:23 2017-07-24 04:08:09  17.5 2017-07-24 04:08:29
2    100001 2017-07-24 04:08:23 2017-07-24 04:08:19  15.6 2017-07-24 04:08:39
3    100002 2017-07-24 04:08:25 2017-07-24 04:08:09  22.2 2017-07-24 04:08:29
4    100002 2017-07-24 04:08:25 2017-07-24 04:08:19  24.5 2017-07-24 04:08:39
5    100001 2017-07-24 07:04:40                <NA>  <NA>                <NA>
6    100002 2017-07-24 02:19:30                <NA>  <NA>                <NA>
7    100001 2017-07-24 04:37:08                <NA>  <NA>                <NA>
8    100002 2017-07-24 04:19:59                <NA>  <NA>                <NA>
9    100001 2017-07-24 03:26:49                <NA>  <NA>                <NA>
10   100001 2017-07-24 03:58:17                <NA>  <NA>                <NA>