我在数据框中有快照数据,如下所示:
zz <- "id created snap stage
ALPHA 2012-09-07 2014-01-02 A
ALPHA 2012-09-07 2014-10-01 End
BETA 2012-08-26 2014-01-04 B
BETA 2012-08-26 2014-06-19 C
BETA 2012-08-26 2014-11-21 End
GAMMA 2014-01-04 2014-01-04 A
GAMMA 2014-01-04 2014-03-07 B
GAMMA 2014-01-04 2014-03-28 C
GAMMA 2014-01-04 2014-03-29 End
DELTA 2014-07-14 2014-07-15 A
DELTA 2014-07-14 2014-09-26 C
DELTA 2014-07-14 2015-02-06 End"
df <- read.table(text=zz, header = T)
每当snap
日期早于created
之前,我需要将created
日期替换为2014-01-01
日期。但我只想替换第一个观察实例的捕捉日期。虽然id
按顺序通过A-B-C-End,但id
不必从A开始。
例如,这就是我正在寻找的输出:
id created snap stage
ALPHA 2012-09-07 2012-09-07 A
ALPHA 2012-09-07 2014-10-01 End
BETA 2012-08-26 2012-08-26 B
BETA 2012-08-26 2014-06-19 C
BETA 2012-08-26 2014-11-21 End
GAMMA 2014-01-04 2014-01-04 A
GAMMA 2014-01-04 2014-03-07 B
GAMMA 2014-01-04 2014-03-28 C
GAMMA 2014-01-04 2014-03-29 End
DELTA 2014-07-14 2014-07-15 A
DELTA 2014-07-14 2014-09-26 C
DELTA 2014-07-14 2015-02-06 End
请注意,GAMMA
和DELTA
保持不变,但ALPHA
阶段的A
替换了快照日期,BETA
阶段B
也被替换了1}}。
答案 0 :(得分:1)
这是一个dplyr方法 - 我从“mutate_each”开始,以确保“created”和“snap”都被格式化为正确的日期。然后我们按“id”对数据进行分组,最后使用“mutate”和“replace”对“snap”列进行必要的更改(我们检查创建的位置是截止日期之前和row_number为1的位置,即第一个该id组中的行:
library(dplyr)
df %>%
mutate_each(funs(as.Date(.)), created, snap) %>%
group_by(id) %>%
mutate(snap = replace(snap, which(created < as.Date("2014-01-01") & row_number() == 1), created))
#Source: local data frame [12 x 4]
#Groups: id
#
# id created snap stage
#1 ALPHA 2012-09-07 2012-09-07 A
#2 ALPHA 2012-09-07 2014-10-01 End
#3 BETA 2012-08-26 2012-08-26 B
#4 BETA 2012-08-26 2014-06-19 C
#5 BETA 2012-08-26 2014-11-21 End
#6 GAMMA 2014-01-04 2014-01-04 A
#7 GAMMA 2014-01-04 2014-03-07 B
#8 GAMMA 2014-01-04 2014-03-28 C
#9 GAMMA 2014-01-04 2014-03-29 End
#10 DELTA 2014-07-14 2014-07-15 A
#11 DELTA 2014-07-14 2014-09-26 C
#12 DELTA 2014-07-14 2015-02-06 End
答案 1 :(得分:0)
试试这个:
library(data.table)
setDT(df)[, snap := if (created[1L] < as.Date('2014-01-01'))
c(created[1L], snap[-1L]), by = id]
我认为snap
和created
是日期列。如果他们不是,你可以通过这样做来转换它们:
cols = c("snap", "created")
df[, (cols) := lapply(.SD, as.Date), .SDcols=cols]