计算R

时间:2016-08-24 14:02:39

标签: r

我有一个包含多个帐户的数据框,其状态以及该状态的开始和结束时间。我想报告一个日期范围内每种状态的帐户数量。数据看起来像下面的df,结果为report。 (实际数据包含更多状态值.N / A值将在未来显示一个虚拟日期。)

df <- data.frame(account = c(1,1,2,3),
             state = c("Open","Closed","Open","Open"),
             startdate = c("2016-01-01","2016-04-04","2016-03-02","2016-08-01"), 
             enddate = c("2016-04-04","2999-01-01","2016-05-02","2016-08-05")
             )

report <- data.frame(date = seq(from = as.Date("2016-04-01"),by="1 day", length.out = 6), 
                 number.open = c(2,2,2,1,1,1)
                 )

我查看了来自rowwise()的{​​{1}}和mutate以及来自dplyr的{​​{1}}的选项,但我们无法对其进行编码它起作用了。 (见Checking if Date is Between two Dates in R

1 个答案:

答案 0 :(得分:1)

我们可以使用sapply为我们执行此操作:

report$NumberOpen <- 
    sapply(report$date, function(x)
    sum(as.Date(df1$startdate) < as.Date(x) &
    as.Date(df1$enddate) > as.Date(x) & 
    df1$state == 'Open'))

#  report
#         date NumberOpen
# 1 2016-04-01          2
# 2 2016-04-02          2
# 3 2016-04-03          2
# 4 2016-04-04          1
# 5 2016-04-05          1
# 6 2016-04-06          1

数据

df1 <- data.frame(account = c(1,1,2,3),
                 state = c("Open","Closed","Open","Open"),
                 startdate = c("2016-01-01","2016-04-04","2016-03-02","2016-08-01"), 
                 enddate = c("2016-04-04","2999-01-01","2016-05-02","2016-08-05")
)

report <- data.frame(date = seq(from = as.Date("2016-04-01"),by="1 day", length.out = 6)
)