我有以下数据框:
library(tidyverse)
df <- data_frame(
id = c(1, 1, 2, 2),
date1 = as.Date(c("2013-01-01", "2013-02-01", "2015-04-01", "2015-05-01")),
date2 = as.Date(c("2012-12-09", "2012-12-09", "2015-03-10", "2015-03-10"))
)
# A tibble: 4 x 3
id date1 date2
<dbl> <date> <date>
1 1 2013-01-01 2012-12-09
2 1 2013-02-01 2012-12-09
3 2 2015-04-01 2015-03-10
4 2 2015-05-01 2015-03-10
我希望完成此数据框,以便每个id
都有另一个date1
值。另一个date1
值计算为下个月。还有一个date2
值对于所有id
都是相同的。使用tidyr::complete
,可以按以下方式执行此操作:
df %>%
group_by(id) %>%
complete(date1 = seq.Date(from = min(date1), length.out = 3, by = "month"), date2 = date2[1])
# A tibble: 6 x 3
# Groups: id [2]
id date1 date2
<dbl> <date> <date>
1 1 2013-01-01 2012-12-09
2 1 2013-02-01 2012-12-09
3 1 2013-03-01 2012-12-09
4 2 2015-04-01 2015-03-10
5 2 2015-05-01 2015-03-10
6 2 2015-06-01 2015-03-10
由于我的原始数据中有大约150K组,tidyr
解决方案需要花费超过一小时才能完成。我假设使用data.table
可以获得速度。可以在data.table
中完成同样的事情吗?
在data.table equivalent of tidyr::complete()中提出了类似问题,但没有group_by
条款。
答案 0 :(得分:2)
根据一些初步基准测试,data.table
方法似乎更快
library(data.table)
setDT(df)[, .(date1 = seq(min(date1), length.out = 3, by = 'month'), date2 = date2[1]), id]
df <- data_frame(
id = rep(1:3000, each = 2),
date1 = rep(as.Date(c("2013-01-01", "2013-02-01", "2015-04-01", "2015-05-01")),
length.out = 6000),
date2 = rep(as.Date(c("2012-12-09", "2012-12-09", "2015-03-10", "2015-03-10")),
length.out = 6000))
system.time({
df %>%
group_by(id) %>%
complete(date1 = seq.Date(from = min(date1),
length.out = 3, by = "month"), date2 = date2[1])
})
#user system elapsed
#64.05 21.27 86.05
system.time({
setDT(df)[, .(date1 = seq(min(date1), length.out = 3, by = 'month'), date2 = date2[1]), id]
})
#user system elapsed
# 0.14 0.00 0.14
答案 1 :(得分:0)
如果你需要速度,尽量保持精简:
library(data.table)
library(lubridate)
> dt[, .SD
][, .(date1=max(date1)), .(id, date2)
][, date1Inc := date1 + months(1)
][, rbind(dt, .SD[, .(id, date1=date1Inc, date2)])
][order(id, date1)
]
id date1 date2
1: 1 2013-01-01 2012-12-09
2: 1 2013-02-01 2012-12-09
3: 1 2013-03-01 2012-12-09
4: 2 2015-04-01 2015-03-10
5: 2 2015-05-01 2015-03-10
6: 2 2015-06-01 2015-03-10
>
>