我有一个像
这样的数据框 ID_CASE Month
CS00000026A 201301
CS00000026A 201302
CS00000026A 201303
CS00000026A 201304
CS00000026A 201305
CS00000026A 201306
CS00000026A 201307
CS00000026A 201308
CS00000026A 201309
CS00000026A 201310
CS00000191C 201302
CS00000191C 201303
CS00000191C 201304
CS00000191C 201305
CS00000191C 201306
CS00000191C 201307
CS00000191C 201308
CS00000191C 201309
CS00000191C 201310
我希望最终数据框有三个额外的列,如
ID_CASE Month Lag_1 Lag_2 Lag_3
CS00000026A 201301 NA NA NA
CS00000026A 201302 201301 NA NA
CS00000026A 201303 201202 201201 NA
CS00000026A 201304 201203 201202 201201
CS00000026A 201305 201204 201203 201202
CS00000026A 201306 201305 201304 201303
CS00000026A 201307 201306 201305 201304
CS00000026A 201308 201307 201306 201305
CS00000026A 201309 201308 201307 201306
CS00000026A 201310 201309 201308 201307
CS00000191C 201302 NA NA NA
CS00000191C 201303 201302 NA NA
CS00000191C 201304 201303 201302 NA
CS00000191C 201305 201304 201303 201302
CS00000191C 201306 201305 201304 201303
CS00000191C 201307 201306 201305 201304
CS00000191C 201308 201307 201306 201305
CS00000191C 201309 201308 201307 201306
CS00000191C 201310 201309 201308 201307
其中
我已使用以下代码至少获取Lag_1
df <- ddply(df,.(ID_CASE),transform,
Lag_1 <- c(NA,Month[-nrow(df)]))
但这并不能为我提供Lag_1所需的输出。
我也尝试过查看解决方案 Lag in R dataframe
如果我有一个日期对象而不是 int 列&#39;月&#39;那怎么办呢?如在当前的例子中?
对此有任何帮助将不胜感激。
答案 0 :(得分:2)
尝试data.table
library(data.table)
setDT(df)[, `:=` (Lag_1 = c(NA, Month[-.N]),
Lag_2 = c(rep(NA, 2), Month[-.N]),
Lag_3 = c(rep(NA, 3), Month[-.N])), by = ID_CASE]
df
# ID_CASE Month Lag_1 Lag_2 Lag_3
# 1: CS00000026A 201301 NA NA NA
# 2: CS00000026A 201302 201301 NA NA
# 3: CS00000026A 201303 201302 201301 NA
# 4: CS00000026A 201304 201303 201302 201301
# 5: CS00000026A 201305 201304 201303 201302
# 6: CS00000026A 201306 201305 201304 201303
# 7: CS00000026A 201307 201306 201305 201304
# 8: CS00000026A 201308 201307 201306 201305
# 9: CS00000026A 201309 201308 201307 201306
# 10: CS00000026A 201310 201309 201308 201307
# 11: CS00000191C 201302 NA NA NA
# 12: CS00000191C 201303 201302 NA NA
# 13: CS00000191C 201304 201303 201302 NA
# 14: CS00000191C 201305 201304 201303 201302
# 15: CS00000191C 201306 201305 201304 201303
# 16: CS00000191C 201307 201306 201305 201304
# 17: CS00000191C 201308 201307 201306 201305
# 18: CS00000191C 201309 201308 201307 201306
# 19: CS00000191C 201310 201309 201308 201307
答案 1 :(得分:1)
您可以使用lag.zoo
,其中k
可以是滞后矢量。
library(plyr)
library(zoo)
ddply(df, .(ID_CASE), function(x){
z <- zoo(x$Month)
lag(z, k = 0:-3)
})
# ID_CASE lag0 lag-1 lag-2 lag-3
# 1 CS00000026A 201301 NA NA NA
# 2 CS00000026A 201302 201301 NA NA
# 3 CS00000026A 201303 201302 201301 NA
# 4 CS00000026A 201304 201303 201302 201301
# 5 CS00000026A 201305 201304 201303 201302
# 6 CS00000026A 201306 201305 201304 201303
# 7 CS00000026A 201307 201306 201305 201304
# 8 CS00000026A 201308 201307 201306 201305
# 9 CS00000026A 201309 201308 201307 201306
# 10 CS00000026A 201310 201309 201308 201307
# 11 CS00000191C 201302 NA NA NA
# 12 CS00000191C 201303 201302 NA NA
# 13 CS00000191C 201304 201303 201302 NA
# 14 CS00000191C 201305 201304 201303 201302
# 15 CS00000191C 201306 201305 201304 201303
# 16 CS00000191C 201307 201306 201305 201304
# 17 CS00000191C 201308 201307 201306 201305
# 18 CS00000191C 201309 201308 201307 201306
# 19 CS00000191C 201310 201309 201308 201307
评论后修改。
如果只有一个日期的组,则上面的代码将生成错误。一个小例子:
df <- data.frame(ID_CASE = c(1, 1, 1, 2), Month = 1:4)
df
# ID_CASE Month
# 1 1 1
# 2 1 2
# 3 1 3
# 4 2 4
ddply(df, .(ID_CASE), function(x){
z <- zoo(x$Month)
lag(z, k = 0:-3)
})
# Error in list_to_dataframe(res, attr(.data, "split_labels"), .id, id_as_factor) :
# Results do not have equal lengths
这是由于“一个仅限注册的团体”被强制转换为单变量时间序列。要避免此类强制,请使用[
子集和drop = FALSE
ddply(df, .(ID_CASE), function(x){
z <- zoo(x[ , "Month", drop = FALSE])
lag(z, k = 0:-3)
})
# ID_CASE Month.lag0 Month.lag-1 Month.lag-2 Month.lag-3
# 1 1 1 NA NA NA
# 2 1 2 1 NA NA
# 3 1 3 2 1 NA
# 4 2 4 NA NA NA
答案 2 :(得分:1)
从data.table
v1.9.6
,您可以使用shift()
:
require(data.table)
setDT(df)[, paste("lag", 1:3, sep="_") := shift(Month, 1:3), by=ID_CASE]
答案 3 :(得分:0)
使用dplyr:
library(dplyr)
df %.%
group_by(ID_CASE) %.%
mutate(lag_1 = lag(Month, 1),
lag_2 = lag(Month, 2),
lag_3 = lag(Month, 3))