我有一个包含开始日期和结束日期的数据框,
id <- c(1, 1, 2)
start <- c("2014-01-05", "2014-02-04", "2014-02-06")
end <- c("2014-02-03", "2014-04-29", "2014-03-07")
df <- data.frame(id, start, end)
id start end
1 2014-01-05 2014-02-03
1 2014-02-04 2014-04-29
2 2014-02-06 2014-03-07
我正在尝试确定如何计算开始日期和结束日期之间每个月发生的日期数量。如以下内容:
id month_yyyy_mm count
1 2014-01 27
1 2014-02 3
1 2014-02 25
1 2014-03 31
1 2014-04 29
2 2014-02 23
2 2014-03 7
我能够将字符串转换为日期,然后使用difftime
计算开始和结束之间的总差,但是我不知道如何每月计算一次。 lubridate
软件包中可能有什么可以帮助您的吗?
答案 0 :(得分:2)
考虑下面的功能f1, f2, f3
f1 <- function(d_first,d_last){
d_first <- as.Date(d_first)
d_last <- as.Date(d_last)
D <- seq(d_first, d_last, 1) # generate all days in [d_first,d_last]
M <- unique(format(D, "%m")) # all months in [d_first,d_lst]
f2 <- function(x) length(which(format(D, "%m") == x)) # returns number of days in month x
res <- vapply(M,f2,numeric(1))
return(cbind(unique(format(D, "%Y-%m")),res))
}
f3 <- function(k) f1(df$start[k],df$end[k])
output <- sapply(1:nrow(df), f3)
产生
> output
[[1]]
res
01 "2014-01" "27"
02 "2014-02" "3"
[[2]]
res
02 "2014-02" "25"
03 "2014-03" "31"
04 "2014-04" "29"
[[3]]
res
02 "2014-02" "23"
03 "2014-03" "7"
从现在开始,剩下的只是格式化问题。确实,简单的do.call(rbind, output)
就能解决问题
> do.call(rbind, output)
res
01 "2014-01" "27"
02 "2014-02" "3"
02 "2014-02" "25"
03 "2014-03" "31"
04 "2014-04" "29"
02 "2014-02" "23"
03 "2014-03" "7"
要想拥有ID,您可以设置f4 <- function(k) cbind(df$id[k], f3(k))
,
> do.call(rbind, sapply(1:nrow(df), f4))
res
01 "1" "2014-01" "27"
02 "1" "2014-02" "3"
02 "1" "2014-02" "25"
03 "1" "2014-03" "31"
04 "1" "2014-04" "29"
02 "2" "2014-02" "23"
03 "2" "2014-03" "7"
但是可能有更聪明的解决方案。
答案 1 :(得分:1)
这是使用foverlaps()
包中的data.table
函数的另一种方法。
foverlaps()
查找所创建的月份的第一天和最后几天与给定期间之间的重叠。
library(data.table)
library(lubridate)
# coerce dates from character to IDate
cols <- c("start", "end")
DT <- as.data.table(df)[, (cols) := lapply(.SD, as.IDate), .SDcols = cols]
# create sequence of months which cover all periods
mon_seq <- DT[, as.IDate(seq(floor_date(min(start), unit = "months"),
ceiling_date(max(end), unit = "months"),
by = "month"))]
# create helper data.table with first and last day of months
mDT <- data.table(start = head(mon_seq, -1L), end = tail(mon_seq, -1L) - 1L)
setkeyv(DT, cols)
# find overlapping pieces for each month
foverlaps(mDT, DT, nomatch = 0L)[
# compute count of days in each month
, {tmp <- pmax(start, i.start)
.(id = id, month = format(tmp, "%Y-%m"),
count = as.integer(difftime(pmin(end, i.end), tmp, units = "days")) + 1L)
}][
# reorder conveniently
order(id, month)]
id month count 1: 1 2014-01 27 2: 1 2014-02 3 3: 1 2014-02 25 4: 1 2014-03 31 5: 1 2014-04 29 6: 2 2014-02 23 7: 2 2014-03 7