如果匹配两个时间列和打印值,如何匹配?

时间:2015-02-15 00:13:34

标签: r datetime match data.table aggregate

我在这里撞墙。希望有人可以提供帮助。

我在R中有一个聚合数据框(d1),带有时间列和带有二进制值的列。时间列没有统一的时间步。

D1:

                   Time Set
 1: 2015-01-03 14:55:00   0
 2: 2015-01-06 14:20:00   1
 3: 2015-01-06 14:25:00   1
 4: 2015-01-06 14:30:00   1
 5: 2015-01-06 14:35:00   1
 6: 2015-01-06 14:40:00   1
 7: 2015-01-06 14:45:00   0
 8: 2015-01-06 16:10:00   1
 9: 2015-01-07 07:45:00   0
10: 2015-01-07 08:00:00   1
11: 2015-01-07 08:05:00   1
12: 2015-01-07 08:45:00   0

我还有一个数据框(d2),其列具有统一的时间步长,因此d2中的行数比d1中的行数长

D2:

             Time_Ideal 
 1: 2015-01-09 14:05:00   
 2: 2015-01-09 14:10:00 
 3: 2015-01-09 14:15:00 
 4: 2015-01-09 14:20:00 
 5: 2015-01-09 14:25:00 
 6: 2015-01-09 14:30:00 
 7: 2015-01-09 14:35:00 
 8: 2015-01-09 14:40:00 
 9: 2015-01-09 14:45:00 
10: 2015-01-09 14:50:00   

我想要做的是打印Time_Ideal旁边的Set-value,其中两个时间列中的时间值分别与d1和d2匹配。

我试过

d1 <- data.table(d1, key = 'Time')
d2 <- data.table(d2, key = 'Time_Ideal')

d2[d1, nomatch=0]
d2[d1]

灵感来自this SO post

但我无法使其正常运作..

3 个答案:

答案 0 :(得分:3)

这是data.table解决这个问题的方法(因为这是实际问题)。使用@bergant提供的修改数据(因为OP数据集不匹配),只需执行:

setkey(setDT(d1), Time) # `d2` doesn't have to be a `data.table`
d1[d2] # you can set `, nomatch = 0L` if you want to remove non-matches
#                    Time Set
#  1: 2015-01-09 15:05:00  NA
#  2: 2015-01-09 15:10:00  NA
#  3: 2015-01-09 15:15:00  NA
#  4: 2015-01-09 15:20:00   1
#  5: 2015-01-09 15:25:00   1
#  6: 2015-01-09 15:30:00   1
#  7: 2015-01-09 15:35:00   1
#  8: 2015-01-09 15:40:00   1
#  9: 2015-01-09 15:45:00   0
# 10: 2015-01-09 15:50:00  NA

另一种方式(更好)是通过引用修改d2。您必须先将d2转换为data.table,然后将key转换为

setkey(setDT(d2), Time_Ideal)
d2[d1, Set := i.Set][] # `d2` was modified by reference.
#                    Time Set
#  1: 2015-01-09 15:05:00  NA
#  2: 2015-01-09 15:10:00  NA
#  3: 2015-01-09 15:15:00  NA
#  4: 2015-01-09 15:20:00   1
#  5: 2015-01-09 15:25:00   1
#  6: 2015-01-09 15:30:00   1
#  7: 2015-01-09 15:35:00   1
#  8: 2015-01-09 15:40:00   1
#  9: 2015-01-09 15:45:00   0
# 10: 2015-01-09 15:50:00  NA

答案 1 :(得分:1)

也许用dplyr?

library(dplyr)

d2 %>%
  left_join(d1, by = c("Time_Ideal" = "Time"))

要填写Set的最后一个值,请使用:

library(dplyr)
library(zoo)

d2 %>%
  left_join(d1, by = c("Time_Ideal" = "Time")) %>%
  mutate(Set = na.locf(d3$Set, na.rm = FALSE))

测试:

输入数据

没有使用日期时间类型的提示。我在下面使用POSIXct:

d1 <- 
  structure(list(Time = structure(c(1420293300, 1420550400, 1420550700, 
  1420551000, 1420551300, 1420551600, 1420551900, 1420557000, 1420613100, 
  1420614000, 1420614300, 1420616700), class = c("POSIXct", "POSIXt"
  ), tzone = ""), 
  Set = c(0L, 1L, 1L, 1L, 1L, 1L, 0L, 1L, 0L, 1L, 
  1L, 0L)), row.names = c(NA, -12L), .Names = c("Time", "Set"), 
  class = "data.frame")

d2 <- 
  structure(list(Time_Ideal = structure(c(1420808700, 1420809000, 
  1420809300, 1420809600, 1420809900, 1420810200, 1420810500, 1420810800, 
  1420811100, 1420811400), class = c("POSIXct", "POSIXt"
  ), tzone = "")), row.names = c(NA, -10L), .Names = "Time_Ideal", 
  class = "data.frame")

执行join#1

没有日期交集(d1次&lt; d2次),所以我们得到了NA:

d2 %>%
  left_join(d1, by = c("Time_Ideal" = "Time"))

                Time_Ideal Set
    1  2015-01-09 14:05:00  NA
    2  2015-01-09 14:10:00  NA
    3  2015-01-09 14:15:00  NA
    4  2015-01-09 14:20:00  NA
    5  2015-01-09 14:25:00  NA
    6  2015-01-09 14:30:00  NA
    7  2015-01-09 14:35:00  NA
    8  2015-01-09 14:40:00  NA
    9  2015-01-09 14:45:00  NA
    10 2015-01-09 14:50:00  NA

执行连接#2(更正的输入数据)

将来将d1转移3天:

d1$Time <- d1$Time + 3600*24*3 # three days shift

再次执行

d2 %>%
  left_join(d1, by = c("Time_Ideal" = "Time"))

                Time_Ideal Set
    1  2015-01-09 14:05:00  NA
    2  2015-01-09 14:10:00  NA
    3  2015-01-09 14:15:00  NA
    4  2015-01-09 14:20:00   1
    5  2015-01-09 14:25:00   1
    6  2015-01-09 14:30:00   1
    7  2015-01-09 14:35:00   1
    8  2015-01-09 14:40:00   1
    9  2015-01-09 14:45:00   0
    10 2015-01-09 14:50:00  NA      

答案 2 :(得分:0)

可能不是最好的解决方案,但我认为它有效:

library(plyr)

d3 <- d2
colnames(d3) <- c("Time")

d4 <- join(d3, d1)

for(i in 2:length(d4$Set)){
  if(is.na(d4$Set[i])){
    d4$Set[i] <- d4$Set[i - 1]
  } 
}