有条件的匹配和提取涉及2个数据表

时间:2019-03-21 18:20:07

标签: r dplyr plyr tidyr

我有2个数据表,其dput如下:

dput(x)
structure(list(site = c("A", "B", "C"), date = c("2018-05-06 00:00:05", 
"2018-05-06 12:00:00", "2018-05-06 17:00:00")), .Names = c("site", 
"date"), row.names = c(NA, -3L), class = c("data.table", "data.frame"
), .internal.selfref = <pointer: 0x0000000002570788>)


dput(y)
structure(list(sites = c("A", "A", "B"), vol = c(30, 40, 20), 
    date = structure(c(1525611600, 1525625640, 1525564805), class = c("POSIXct", 
    "POSIXt"), tzone = ""), pn = c("sp90", "sp70", "sp98")), .Names = c("sites", 
"vol", "date", "pn"), class = c("data.table", "data.frame"), row.names = c(NA, 
-3L), .internal.selfref = <pointer: 0x0000000002570788>)

结果数据表应为:

  site                date vol   pn
1:    A 2018-05-06 00:00:05  30 sp90
2:    A 2018-05-06 12:00:00  40 sp70
3:    B 2018-05-06 17:00:00  20 sp98

我需要先检查网站是否匹配,然后检查x $ date是否小于y $ date,然后将vol和pn拉到x。

有什么想法吗?

谢谢。

1 个答案:

答案 0 :(得分:0)

您可能会这样-

library(data.table)
setDT(x)[,date:=as.POSIXct(date)]
setDT(y)[,date:=as.POSIXct(date)]

x[, c("vol", "pn","site") := # Assign the below result to new columns
    x[y, # join
      .(vol, pn,site), # get the column you need
      on = .(site = sites, # join conditions
             date < date 
      ), 
      mult = "last"]]

输出-

> x
   site                date vol   pn
1:    A 2018-05-06 00:00:05  30 sp90
2:    A 2018-05-06 12:00:00  40 sp70
3:    B 2018-05-06 17:00:00  20 sp98

编辑-

您在问题中提供的数据集-

x = structure(list(site = c("A", "B", "C"), 
                   date = c("2018-05-06 00:00:05", "2018-05-06 12:00:00", "2018-05-06 17:00:00")),
                  .Names = c("site","date"), row.names = c(NA, -3L), class = c("data.table", "data.frame"))


y= structure(list(sites = c("A", "A", "B"),
                  vol = c(30, 40, 20), 
                  date = structure(c(1525611600, 1525625640, 1525564805),
                  class = c("POSIXct", "POSIXt"), tzone = ""),
                  pn = c("sp90", "sp70", "sp98")),
                 .Names = c("sites", "vol", "date", "pn"),
                  class = c("data.table", "data.frame"),
                  row.names = c(NA,-3L))