按日期加入数据框

时间:2017-01-17 10:13:24

标签: r date join dplyr

我正在尝试按日期将两个数据框连接在一起。复杂因素是两个数据帧看起来略有不同。我将使用上一篇文章中的示例数据:

> eventdates
# A tibble: 2 × 4
  event.no dr.rank   dr.start     dr.end
     <int>   <int>     <date>     <date>
1        1      14 1964-09-30 1964-10-06
2        2      16 1964-11-01 1964-12-24
> ts1964 <- data_frame(DATE = seq(from = as.Date("1964-01-01"), 
+                                 to = as.Date("1964-12-31"), 
+                                 by = "days"),
+                      Q = 1:366)
> 

我计划使用lapply创建一个列表,让我可以扩展eventdates中的数据:

lapply(split(eventdates, seq(nrow(eventdates))), 
       function(x) { 
         filter(ts1964, DATE >= x$dr.start & DATE <= x$dr.end) })

这可以扩展eventdates中的日期,并使列名更正确。但是,我已经意识到这不会保留event.no分组变量,也不会成功地重新登记到数据框中,并且熔化似乎也不起作用。

我的问题是,如何将这两个数据框组合在一起?基本上,我要求ts1964数据帧有一个event.no列(没有事件,event.no可以是零或NA等)。

预期输出的切片应如下所示:

> output <-
+   ts1964 %>%
+   mutate(event.no = 0)
> output$event.no[274:280] <- 1
> output$event.no[306:359] <- 2
> output %>%
+   slice(270:290)
# A tibble: 21 × 3
         DATE     Q event.no
       <date> <int>    <dbl>
1  1964-09-26   270        0
2  1964-09-27   271        0
3  1964-09-28   272        0
4  1964-09-29   273        0
5  1964-09-30   274        1
6  1964-10-01   275        1
7  1964-10-02   276        1
8  1964-10-03   277        1
9  1964-10-04   278        1
10 1964-10-05   279        1
# ... with 11 more rows
> 

2 个答案:

答案 0 :(得分:4)

从结果列表中提取

l1 <- lapply(split(eventdates, seq(nrow(eventdates))), 
              function(x) { 
                  filter(ts1964, DATE >= x$dr.start & DATE <= x$dr.end) })

do.call(rbind, Map(cbind, lapply(split(eventdates, seq(nrow(eventdates))), '[', 1), l1))

#     event.no       DATE   Q
#1.1         1 1964-09-30 274
#1.2         1 1964-10-01 275
#1.3         1 1964-10-02 276
#1.4         1 1964-10-03 277
#1.5         1 1964-10-04 278
#1.6         1 1964-10-05 279
#1.7         1 1964-10-06 280
#2.1         2 1964-11-01 306
#2.2         2 1964-11-02 307
#2.3         ...

答案 1 :(得分:4)

您可以使用data.table - 包,如下所示:

library(data.table)
# convert ts1964 to a 'data.table
setDT(ts1964)
# create a new 'data.table' with event dates in long form
ev.dates.2 <- setDT(eventdates)[, .(DATE = seq(dr.start,dr.end,'day')), by = .(event.no, dr.rank)]

# join with ts1964
ts1964[ev.dates.2, on = 'DATE', event := event.no]

如果您想用零替换NA,您可以将最后一行替换为:

ts1964[ev.dates.2, on = 'DATE', event := event.no][is.na(event), event := 0]

一切都在一起:

setDT(ts1964)[setDT(eventdates)[, .(DATE = seq(dr.start,dr.end,'day')), by = .(event.no, dr.rank)], 
              on = 'DATE', event := event.no
              ][is.na(event), event := 0]

使用过的数据:

ts1964 <- data.frame(DATE = seq(from = as.Date("1964-01-01"), to = as.Date("1964-12-31"), by = "days"), Q = 1:366)

eventdates <- structure(list(event.no = 1:2, dr.rank = c(14L, 16L), 
                             dr.start = structure(c(-1919, -1887), class = "Date"), 
                             dr.end = structure(c(-1913, -1834), class = "Date")), 
                        .Names = c("event.no", "dr.rank", "dr.start", "dr.end"), row.names = c(NA, -2L), class = "data.frame")