如何将价值滞后一个月,每个月有不同的观察数量?

时间:2016-02-16 16:15:22

标签: r data.table

我有多个日期的数据集。我希望将sessionFactory的值滞后一个月。我可能无法使用Cells因为每个月都有不同的天数(更不用说还有一些缺失的日期)。

我所做的是创建一个新的数据表,其中包含唯一的shift()Year,移位/滞后Month,然后将其与原始数据表合并(注意不要有重复的列。)

显然,效率不高。还有其他办法吗?

Cells

2 个答案:

答案 0 :(得分:1)

# Replace your sapply usage with pacman and you'll thank me
#   pacman installs if needed, loads, and doesn't require quotation marks
pacman::p_load(data.table, lubridate) 

DT <- fread('DATE, ID, Cells
            2000-01-01, 1, 10
            2000-01-02, 1, 10
            2000-01-03, 1, 10
            2000-01-01, 2, 20
            2000-01-02, 2, 20
            2000-01-03, 2, 20
            2000-01-04, 2, 20
            2000-02-01, 1, 30
            2000-02-02, 1, 30
            2000-02-01, 2, 40
            2000-02-03, 2, 40
            2000-02-04, 2, 40
            2000-03-01, 1, 50
            2000-03-02, 1, 50
            2000-03-01, 2, 60
            2000-03-03, 2, 60
            ')
DT$date      <- ymd(DT$DATE)
DT$month     <- format((DT$date), "%b")
lag.cells    <- as.vector(capture.output(cat(rep("NA", length(DT$month[DT$month == "Jan"])), DT$Cells)))
lag.cells    <- strsplit(lag.cells, "\\s+")[[1]]
lag.cells    <- lag.cells[1:nrow(DT)]
DT$lag.cells <- lag.cells
DT

          DATE ID Cells       date month lag.cells
 1: 2000-01-01  1    10 2000-01-01   Jan        NA
 2: 2000-01-02  1    10 2000-01-02   Jan        NA
 3: 2000-01-03  1    10 2000-01-03   Jan        NA
 4: 2000-01-01  2    20 2000-01-01   Jan        NA
 5: 2000-01-02  2    20 2000-01-02   Jan        NA
 6: 2000-01-03  2    20 2000-01-03   Jan        NA
 7: 2000-01-04  2    20 2000-01-04   Jan        NA
 8: 2000-02-01  1    30 2000-02-01   Feb        10
 9: 2000-02-02  1    30 2000-02-02   Feb        10
10: 2000-02-01  2    40 2000-02-01   Feb        10
11: 2000-02-03  2    40 2000-02-03   Feb        20
12: 2000-02-04  2    40 2000-02-04   Feb        20
13: 2000-03-01  1    50 2000-03-01   Mar        20
14: 2000-03-02  1    50 2000-03-02   Mar        20
15: 2000-03-01  2    60 2000-03-01   Mar        30
16: 2000-03-03  2    60 2000-03-03   Mar        30

答案 1 :(得分:0)

Date班级seq"month""quarter"等支持"year"。 不是那么优雅,但你可以做这样的事情。

library(magrittr)
DT[, DATE := as.Date(DATE)]
DT[,  DATE_lag := sapply(DATE, function(x) 
  seq(x, by = "1 month", length.out = 2)[2]) %>%
    as.Date(origin = "1970-01-01")]
DT2 <- DT[, .(DATE_lag, ID, Cells)]
setnames(DT2, c("DATE_lag", "Cells"), c("DATE", "Lagged.Cells"))
merge(DT, DT2, by = c("DATE", "ID"), all.x = TRUE)

         DATE ID Cells       date month lag.cells   DATE_lag Lagged.Cells
 1: 2000-01-01  1    10 2000-01-01   Jan        NA 2000-02-01           NA
 2: 2000-01-01  2    20 2000-01-01   Jan        NA 2000-02-01           NA
 3: 2000-01-02  1    10 2000-01-02   Jan        NA 2000-02-02           NA
 4: 2000-01-02  2    20 2000-01-02   Jan        NA 2000-02-02           NA
 5: 2000-01-03  1    10 2000-01-03   Jan        NA 2000-02-03           NA
 6: 2000-01-03  2    20 2000-01-03   Jan        NA 2000-02-03           NA
 7: 2000-01-04  2    20 2000-01-04   Jan        NA 2000-02-04           NA
 8: 2000-02-01  1    30 2000-02-01   Feb        10 2000-03-01           10
 9: 2000-02-01  2    40 2000-02-01   Feb        10 2000-03-01           20
10: 2000-02-02  1    30 2000-02-02   Feb        10 2000-03-02           10
11: 2000-02-03  2    40 2000-02-03   Feb        20 2000-03-03           20
12: 2000-02-04  2    40 2000-02-04   Feb        20 2000-03-04           20
13: 2000-03-01  1    50 2000-03-01   Mar        20 2000-04-01           30
14: 2000-03-01  2    60 2000-03-01   Mar        30 2000-04-01           40
15: 2000-03-02  1    50 2000-03-02   Mar        20 2000-04-02           30
16: 2000-03-03  2    60 2000-03-03   Mar        30 2000-04-03           40
>