从非连续表计算经过的天数

时间:2016-03-16 16:04:46

标签: r data.table lubridate

我有一张加油加油表a,就像这样:

a = setDT(structure(list(date = structure(c(NA, 16837, 16843, 16847, 16852, 
16854, 16858, 16862, 16867, 16871, 16874), class = "Date"), km = c(NA, 
NA, 421, 351, 286, 350, 414, 332, 401, 321, 350)), .Names = c("date", 
"km"), class = c("data.table", "data.frame"), row.names = c(NA, 
-11L)), key = "date")

它有加油和公里开车加油的日期。我还给出了一个不同的表格,其中包含轮胎压力调整和换油日期actions,如下所示:

actions = setDT(structure(list(date = structure(c(16841, 16843, 16858, 16869), class = "Date"), 
    action = structure(c(1L, 2L, 2L, 2L), .Label = c("oil", "tires"
    ), class = "factor")), .Names = c("date", "action"), row.names = c(NA, 
-4L), class = c("data.table", "data.frame")), key = "action")

我需要将燃油消耗量(实际版本a我也有加仑)与上次轮胎压力检查后的天数和上次换油后的天数联系起来。必须有一种简单的方法来实现这一目标,但经过几个小时的努力,我就陷入困境。

这是我尝试过的:

library(data.table)
library(lubridate)
library(reshape2)

b <- dcast(actions, date ~ action, value.var = "date")

d <- seq(min(a$date, b$date, na.rm = TRUE), max(a$date, b$date, na.rm = TRUE), by = "day")
d <- data.table(date=d)
d <- b[d,]
d$daysOil <- as.double(difftime(d$date, d$date[! is.na(d$oil)], units = "days"))
d$daysOil[which(d$daysOil < 0)] <- NA

如果我试着计算自上次&#34;轮胎&#34;以来经过的天数,事情变得复杂得多。事件(加油日期之前更接近的事件),以及我被困的地方。

我的预期输出是:

expected
         date  km daysoil daysTires
1        <NA>  NA      NA        NA
2  2016-02-06  NA      NA        NA
3  2016-02-12 421       2         0
4  2016-02-16 351       6         4
5  2016-02-21 286      11         9
6  2016-02-23 350      13        11
7  2016-02-27 414      17         0
8  2016-03-02 332      21         4
9  2016-03-07 401      26         9
10 2016-03-11 321      30         2
11 2016-03-14 350      33         5

我感谢任何解决方案,但最好使用data.tabledplyr个包。

##########编辑##########

如果你能想出更好的信息(表格)结构来促进这项任务,我们也将非常感激!

1 个答案:

答案 0 :(得分:2)

这是一个选项:

actions[, date.copy := date]

cbind(a,
      dcast(actions[, .SD[a, .(days = date - date.copy, N = .I), roll = T, on = 'date']
                    , by = action],
            N ~ action, value.var = 'days'))
#          date  km  N     oil   tires
# 1:       <NA>  NA  1 NA days NA days
# 2: 2016-02-06  NA  2 NA days NA days
# 3: 2016-02-12 421  3  2 days  0 days
# 4: 2016-02-16 351  4  6 days  4 days
# 5: 2016-02-21 286  5 11 days  9 days
# 6: 2016-02-23 350  6 13 days 11 days
# 7: 2016-02-27 414  7 17 days  0 days
# 8: 2016-03-02 332  8 21 days  4 days
# 9: 2016-03-07 401  9 26 days  9 days
#10: 2016-03-11 321 10 30 days  2 days
#11: 2016-03-14 350 11 33 days  5 days

上面正在进行一些简单的事情 - 要分析才能理解。