考虑这个数据集:
mydf <- data.frame(churn_indicator = c(0,0,1,0,1),
resign_date = c(NA,NA,"2011-01-01",NA,"2012-02-01"),
join_date = c("2001-01-01","2001-03-01","2002-04-02",
"2003-09-01","2005-05-10"))
任务是计算一个矢量'length',它是resign_date - 用于churn_indicator = 1的join_date和用于churn_indicator = 0的Sys.Date() - join_date。
我已经想出了如何使用for循环来做到这一点,但我想使用效率更高的东西(也许是应用系列)。另外,是否可以使用dplyr的mutate函数执行此操作?
答案 0 :(得分:0)
可能的解决方案:
# convert column from factor/characters to Date (if not already done)
mydf$resign_date <- as.Date(mydf$resign_date)
mydf$join_date <- as.Date(mydf$join_date)
# compute the date differences
days_churn1 <- as.numeric(difftime(mydf$resign_date,mydf$join_date,units='days'))
days_churn0 <- as.numeric(difftime(Sys.Date(),mydf$join_date,units='days'))
# set to zero the values where churn indicator is not what we want
days_churn1[mydf$churn_indicator==0]<-0
days_churn0[mydf$churn_indicator==1]<-0
# sum the two vectors
mydf$length <- days_churn1+days_churn0
> mydf
churn_indicator resign_date join_date length
1 0 <NA> 2001-01-01 5997
2 0 <NA> 2001-03-01 5938
3 1 2011-01-01 2002-04-02 3196
4 0 <NA> 2003-09-01 5024
5 1 2012-02-01 2005-05-10 2458
或者,您可以使用ifelse组合一些操作:
# convert column from factor/characters to Date (if not already done)
mydf$resign_date <- as.Date(mydf$resign_date)
mydf$join_date <- as.Date(mydf$join_date)
mydf$length <-
as.numeric(
ifelse(mydf$churn_indicator==1,
difftime(mydf$resign_date,mydf$join_date,units='days'),
difftime(Sys.Date(),mydf$join_date,units='days')
))