计算日期范围内每个月的天数

时间:2018-06-19 16:35:17

标签: r date lubridate

我有一个包含开始日期和结束日期的数据框,

id <- c(1, 1, 2)
start <- c("2014-01-05", "2014-02-04", "2014-02-06")
end <- c("2014-02-03", "2014-04-29", "2014-03-07")
df <- data.frame(id, start, end)

 id        start          end
  1    2014-01-05   2014-02-03
  1    2014-02-04   2014-04-29
  2    2014-02-06   2014-03-07

我正在尝试确定如何计算开始日期和结束日期之间每个月发生的日期数量。如以下内容:

id    month_yyyy_mm count
 1          2014-01    27
 1          2014-02     3
 1          2014-02    25
 1          2014-03    31
 1          2014-04    29
 2          2014-02    23
 2          2014-03     7

我能够将字符串转换为日期,然后使用difftime计算开始和结束之间的总差,但是我不知道如何每月计算一次。 lubridate软件包中可能有什么可以帮助您的吗?

2 个答案:

答案 0 :(得分:2)

考虑下面的功能f1, f2, f3

f1 <- function(d_first,d_last){
        d_first <- as.Date(d_first)
        d_last <- as.Date(d_last)

        D <- seq(d_first, d_last, 1) # generate all days in [d_first,d_last]
        M <- unique(format(D, "%m")) # all months in [d_first,d_lst]

        f2 <- function(x) length(which(format(D, "%m") == x)) # returns number of days in month x
        res <- vapply(M,f2,numeric(1))
        return(cbind(unique(format(D, "%Y-%m")),res))
      }
f3 <- function(k) f1(df$start[k],df$end[k])

output <- sapply(1:nrow(df), f3)

产生

> output 
[[1]]
             res 
01 "2014-01" "27"
02 "2014-02" "3" 

[[2]]
             res 
02 "2014-02" "25"
03 "2014-03" "31"
04 "2014-04" "29"

[[3]]
             res 
02 "2014-02" "23"
03 "2014-03" "7" 

从现在开始,剩下的只是格式化问题。确实,简单的do.call(rbind, output)就能解决问题

> do.call(rbind, output)
             res 
01 "2014-01" "27"
02 "2014-02" "3" 
02 "2014-02" "25"
03 "2014-03" "31"
04 "2014-04" "29"
02 "2014-02" "23"
03 "2014-03" "7"

要想拥有ID,您可以设置f4 <- function(k) cbind(df$id[k], f3(k))

> do.call(rbind, sapply(1:nrow(df), f4))
                 res 
01 "1" "2014-01" "27"
02 "1" "2014-02" "3" 
02 "1" "2014-02" "25"
03 "1" "2014-03" "31"
04 "1" "2014-04" "29"
02 "2" "2014-02" "23"
03 "2" "2014-03" "7" 

但是可能有更聪明的解决方案。

答案 1 :(得分:1)

这是使用foverlaps()包中的data.table函数的另一种方法。

foverlaps()查找所创建的月份的第一天和最后几天与给定期间之间的重叠。

library(data.table)
library(lubridate)

# coerce dates from character to IDate
cols <- c("start", "end")
DT <- as.data.table(df)[, (cols) := lapply(.SD, as.IDate), .SDcols = cols]

# create sequence of months which cover all periods
mon_seq <- DT[, as.IDate(seq(floor_date(min(start), unit = "months"), 
                             ceiling_date(max(end), unit = "months"),
                             by = "month"))]
# create helper data.table with first and last day of months
mDT <- data.table(start = head(mon_seq, -1L), end = tail(mon_seq, -1L) - 1L)
setkeyv(DT, cols)
# find overlapping pieces for each month
foverlaps(mDT, DT, nomatch = 0L)[
  # compute count of days in each month
  , {tmp <- pmax(start, i.start)
  .(id = id, month = format(tmp, "%Y-%m"), 
    count = as.integer(difftime(pmin(end, i.end), tmp, units = "days")) + 1L)
  }][
    # reorder conveniently
    order(id, month)]
   id   month count
1:  1 2014-01    27
2:  1 2014-02     3
3:  1 2014-02    25
4:  1 2014-03    31
5:  1 2014-04    29
6:  2 2014-02    23
7:  2 2014-03     7