我正在尝试按日期将两个数据框连接在一起。复杂因素是两个数据帧看起来略有不同。我将使用上一篇文章中的示例数据:
> eventdates
# A tibble: 2 × 4
event.no dr.rank dr.start dr.end
<int> <int> <date> <date>
1 1 14 1964-09-30 1964-10-06
2 2 16 1964-11-01 1964-12-24
> ts1964 <- data_frame(DATE = seq(from = as.Date("1964-01-01"),
+ to = as.Date("1964-12-31"),
+ by = "days"),
+ Q = 1:366)
>
我计划使用lapply创建一个列表,让我可以扩展eventdates
中的数据:
lapply(split(eventdates, seq(nrow(eventdates))),
function(x) {
filter(ts1964, DATE >= x$dr.start & DATE <= x$dr.end) })
这可以扩展eventdates中的日期,并使列名更正确。但是,我已经意识到这不会保留event.no
分组变量,也不会成功地重新登记到数据框中,并且熔化似乎也不起作用。
我的问题是,如何将这两个数据框组合在一起?基本上,我要求ts1964数据帧有一个event.no列(没有事件,event.no可以是零或NA等)。
预期输出的切片应如下所示:
> output <-
+ ts1964 %>%
+ mutate(event.no = 0)
> output$event.no[274:280] <- 1
> output$event.no[306:359] <- 2
> output %>%
+ slice(270:290)
# A tibble: 21 × 3
DATE Q event.no
<date> <int> <dbl>
1 1964-09-26 270 0
2 1964-09-27 271 0
3 1964-09-28 272 0
4 1964-09-29 273 0
5 1964-09-30 274 1
6 1964-10-01 275 1
7 1964-10-02 276 1
8 1964-10-03 277 1
9 1964-10-04 278 1
10 1964-10-05 279 1
# ... with 11 more rows
>
答案 0 :(得分:4)
从结果列表中提取
l1 <- lapply(split(eventdates, seq(nrow(eventdates))),
function(x) {
filter(ts1964, DATE >= x$dr.start & DATE <= x$dr.end) })
do.call(rbind, Map(cbind, lapply(split(eventdates, seq(nrow(eventdates))), '[', 1), l1))
# event.no DATE Q
#1.1 1 1964-09-30 274
#1.2 1 1964-10-01 275
#1.3 1 1964-10-02 276
#1.4 1 1964-10-03 277
#1.5 1 1964-10-04 278
#1.6 1 1964-10-05 279
#1.7 1 1964-10-06 280
#2.1 2 1964-11-01 306
#2.2 2 1964-11-02 307
#2.3 ...
答案 1 :(得分:4)
您可以使用data.table
- 包,如下所示:
library(data.table)
# convert ts1964 to a 'data.table
setDT(ts1964)
# create a new 'data.table' with event dates in long form
ev.dates.2 <- setDT(eventdates)[, .(DATE = seq(dr.start,dr.end,'day')), by = .(event.no, dr.rank)]
# join with ts1964
ts1964[ev.dates.2, on = 'DATE', event := event.no]
如果您想用零替换NA,您可以将最后一行替换为:
ts1964[ev.dates.2, on = 'DATE', event := event.no][is.na(event), event := 0]
一切都在一起:
setDT(ts1964)[setDT(eventdates)[, .(DATE = seq(dr.start,dr.end,'day')), by = .(event.no, dr.rank)],
on = 'DATE', event := event.no
][is.na(event), event := 0]
使用过的数据:
ts1964 <- data.frame(DATE = seq(from = as.Date("1964-01-01"), to = as.Date("1964-12-31"), by = "days"), Q = 1:366)
eventdates <- structure(list(event.no = 1:2, dr.rank = c(14L, 16L),
dr.start = structure(c(-1919, -1887), class = "Date"),
dr.end = structure(c(-1913, -1834), class = "Date")),
.Names = c("event.no", "dr.rank", "dr.start", "dr.end"), row.names = c(NA, -2L), class = "data.frame")