连接日期变量

时间:2015-07-15 20:01:49

标签: r if-statement dplyr

我试图使用两个ifelse语句来创建一个新的日期变量,该变量使一系列假设填补现有日期变量的空白。这是我的意思的一个例子:

  id EffectiveDate EffectiveYear ED_NA EY_NA NewEffectiveDate
1  a    1972-10-05          1972 FALSE FALSE       1972-10-05
2  a          <NA>          1985  TRUE FALSE       1985-01-01
3  a    1988-11-12          1988 FALSE FALSE       1988-11-12
4  b    2011-09-05          2011 FALSE FALSE       2011-09-05
5  b          <NA>            NA  TRUE  TRUE       2011-09-05
6  b          <NA>          2012  TRUE FALSE       2012-01-01
7  c    2012-11-11          2012 FALSE FALSE       2012-11-11
8  c    2013-05-15          2013 FALSE FALSE       2013-05-15

id的快速代码:EY_NA =

id <- c("a","a","a","b","b","b","c","c")
EffectiveDate <- c("1972-10-05",NA,"1988-11-12","2011-09-05",NA,NA,"2012-11-11","2013-05-15")
EffectiveYear <- c(1972,1985,1988,2011,NA,2012,2012,2013)
tdat <- data.frame(id, EffectiveDate, EffectiveYear)
tdat$ED_NA <- is.na(tdat$EffectiveDate)
tdat$EY_NA <- is.na(tdat$EffectiveYear)

我在这个例子中试图创建的是“NewEffectiveDate”变量。用简单的英语,我想要的是,哪里缺少EffectiveDate数据但是没有缺少EffectiveYear数据,假设NewEffectiveDate等于EffectiveYear的1月1日。如果缺少EffectiveDate和EffectiveYear数据,则假定先前观察的EffectiveDate。当然,最后,如果没有缺少EffectiveDate数据,请选择EffectiveDate。

以下是我用来尝试解决问题的最新代码:

tdat %>% mutate(NewEffectiveDate = ifelse(ED_NA == 1 & EY_NA == 0,
  as.Date(paste(EffectiveYear, 1, 1, sep="-")),
  ifelse(ED_NA == 1 & EY_NA == 1), 
  as.Date(lag(EffectiveDate)),
  EffectiveDate
))

当我尝试这个特定代码时,我收到一条错误消息:错误:未使用的参数(as.Date(c(NA,1,NA,2,3,NA,NA,4)),c(1 ,NA,2,3,NA,NA,4,5))

我搜索了类似的问题,例如“ifelse concatenate date”及其中的一些变体,但是没有找到任何似乎适用于这个特定问题的东西。

我是R(和CLI)的新手,所以如果我忽略了一个非常明显的解决方案,我会提前道歉。从Excel到R的过渡很有意思,但是在做一些看似相对简单的任务时经常很痛苦(尽管dplyr包非常有帮助)。

3 个答案:

答案 0 :(得分:1)

id <- c("a","a","a","b","b","b","c","c")
EffectiveDate <- c("1972-10-05",NA,"1988-11-12","2011-09-05",NA,NA,"2012-11-11","2013-05-15")
EffectiveYear <- c(1972,1985,1988,2011,NA,2012,2012,2013)
tdat <- data.frame(id, EffectiveDate, EffectiveYear,
                   stringsAsFactors=FALSE)

library(zoo)
tdat %>% 
  mutate(NewEffectiveDate = ifelse(!is.na(EffectiveDate),
                                   EffectiveDate,
                                   ifelse(is.na(EffectiveDate) & !is.na(EffectiveYear),
                                          paste0(EffectiveYear, "-01-01"),
                                          NA)),
         NewEffecitveDate = na.locf(NewEffectiveDate))

这应该可以满足您的需求。我建议使用na.locf包中的zoo(最后一个),而不是尝试处理之前的日期问题。

答案 1 :(得分:1)

你可以做到

tdat$EffectiveDate <- as.Date(tdat$EffectiveDate)

tdat %>% mutate(NewEffectiveDate = as.Date(
    ifelse(!is.na(EffectiveDate), EffectiveDate,
           ifelse(!is.na(EffectiveYear), as.Date(paste(EffectiveYear, 1, 1, sep="-")),
                  lag(EffectiveDate)))
)) -> res

res
#   id EffectiveDate EffectiveYear NewEffectiveDate
# 1  a    1972-10-05          1972       1972-10-05
# 2  a          <NA>          1985       1985-01-01
# 3  a    1988-11-12          1988       1988-11-12
# 4  b    2011-09-05          2011       2011-09-05
# 5  b          <NA>            NA       2011-09-05
# 6  b          <NA>          2012       2012-01-01
# 7  c    2012-11-11          2012       2012-11-11
# 8  c    2013-05-15          2013       2013-05-15

答案 2 :(得分:0)

您的ifelse阻止问题似乎很早就关闭了第二个区块的括号而没有给出yesno参数,并且您给了一个额外的参数到第一个ifelse区块。

这应该有效:

tdat %>% mutate(NewEffectiveDate = ifelse(ED_NA == 1 & EY_NA == 0,
  as.Date(paste(EffectiveYear, 1, 1, sep="-")),
  ifelse(ED_NA == 1 & EY_NA == 1, 
  as.Date(lag(EffectiveDate))),
  EffectiveDate))