如果满足条件,则替换第一个日期实例

时间:2015-03-20 20:21:52

标签: r dplyr

我在数据框中有快照数据,如下所示:

zz <- "id  created  snap    stage
ALPHA   2012-09-07  2014-01-02  A
ALPHA   2012-09-07  2014-10-01  End
BETA    2012-08-26  2014-01-04  B
BETA    2012-08-26  2014-06-19  C
BETA    2012-08-26  2014-11-21  End
GAMMA   2014-01-04  2014-01-04  A
GAMMA   2014-01-04  2014-03-07  B
GAMMA   2014-01-04  2014-03-28  C
GAMMA   2014-01-04  2014-03-29  End
DELTA   2014-07-14  2014-07-15  A
DELTA   2014-07-14  2014-09-26  C
DELTA   2014-07-14  2015-02-06  End"
df <- read.table(text=zz, header = T)

每当snap日期早于created之前,我需要将created日期替换为2014-01-01日期。但我只想替换第一个观察实例的捕捉日期。虽然id按顺序通过A-B-C-End,但id不必从A开始。

例如,这就是我正在寻找的输出:

id  created snap    stage
ALPHA   2012-09-07  2012-09-07  A
ALPHA   2012-09-07  2014-10-01  End
BETA    2012-08-26  2012-08-26  B
BETA    2012-08-26  2014-06-19  C
BETA    2012-08-26  2014-11-21  End
GAMMA   2014-01-04  2014-01-04  A
GAMMA   2014-01-04  2014-03-07  B
GAMMA   2014-01-04  2014-03-28  C
GAMMA   2014-01-04  2014-03-29  End
DELTA   2014-07-14  2014-07-15  A
DELTA   2014-07-14  2014-09-26  C
DELTA   2014-07-14  2015-02-06  End

请注意,GAMMADELTA保持不变,但ALPHA阶段的A替换了快照日期,BETA阶段B也被替换了1}}。

2 个答案:

答案 0 :(得分:1)

这是一个dplyr方法 - 我从“mutate_each”开始,以确保“created”和“snap”都被格式化为正确的日期。然后我们按“id”对数据进行分组,最后使用“mutate”和“replace”对“snap”列进行必要的更改(我们检查创建的位置是截止日期之前和row_number为1的位置,即第一个该id组中的行:

library(dplyr)
df %>% 
  mutate_each(funs(as.Date(.)), created, snap) %>%
  group_by(id) %>%
  mutate(snap = replace(snap, which(created < as.Date("2014-01-01") & row_number() == 1), created))

#Source: local data frame [12 x 4]
#Groups: id
#
#      id    created       snap stage
#1  ALPHA 2012-09-07 2012-09-07     A
#2  ALPHA 2012-09-07 2014-10-01   End
#3   BETA 2012-08-26 2012-08-26     B
#4   BETA 2012-08-26 2014-06-19     C
#5   BETA 2012-08-26 2014-11-21   End
#6  GAMMA 2014-01-04 2014-01-04     A
#7  GAMMA 2014-01-04 2014-03-07     B
#8  GAMMA 2014-01-04 2014-03-28     C
#9  GAMMA 2014-01-04 2014-03-29   End
#10 DELTA 2014-07-14 2014-07-15     A
#11 DELTA 2014-07-14 2014-09-26     C
#12 DELTA 2014-07-14 2015-02-06   End

答案 1 :(得分:0)

试试这个:

library(data.table)
setDT(df)[, snap := if (created[1L] < as.Date('2014-01-01')) 
                    c(created[1L], snap[-1L]), by = id]

我认为snapcreated是日期列。如果他们不是,你可以通过这样做来转换它们:

cols = c("snap", "created")
df[, (cols) := lapply(.SD, as.Date), .SDcols=cols]