我有一个大型数据集,其开始日期和结束日期有时在一个月内,但更多时间跨度超过一个月或一年。最后,我想计算每个ID每个月的入住天数。
以下是示例数据:
ID = c(50:55)
ENTRY = as.Date(c("11/6/2011", "04/08/2012", "10/9/2012",
"23/10/2012", "15/11/2012", "23/11/2012"), "%d/%m/%Y")
EXIT = as.Date(c("11/7/2011", "06/09/2012", "24/9/2012",
"31/12/2012", "18/11/2012", "04/01/2013"), "%d/%m/%Y")
Occupancy <- data.frame(ID, ENTRY, EXIT)
ID ENTRY EXIT
50 2011-06-11 2011-07-11
51 2012-08-04 2012-09-06
52 2012-09-10 2012-09-24
53 2012-10-23 2012-12-31
54 2012-11-15 2012-11-18
55 2012-11-23 2013-01-04
这就是我想要创建的内容:
ID ENTRY EXIT
50 6/11/2011 6/30/2011
50 7/1/2011 7/11/2011
51 8/4/2012 8/31/2012
51 9/1/2012 9/6/2012
:
55 11/23/2012 11/30/2012
55 12/1/2012 12/31/2012
55 1/1/2013 1/4/2013
任何建议都将不胜感激!
答案 0 :(得分:2)
希望这有帮助!
它将为您提供最终结果 - 即每个ID每个月的入住天数。
ID = c(50:55)
ENTRY = as.Date(c("11/6/2011", "04/08/2012", "10/9/2012",
"23/10/2012", "15/11/2012", "23/11/2012"), "%d/%m/%Y")
EXIT = as.Date(c("11/7/2011", "06/09/2012", "24/9/2012",
"31/12/2012", "18/11/2012", "04/01/2013"), "%d/%m/%Y")
Occupancy <- data.frame(ID, ENTRY, EXIT)
library(zoo)
library(dplyr)
monthList <- mapply(function(x,y) as.yearmon(seq(x,y, "day")), ENTRY, EXIT)
OccupancyDf <- monthList %>% lapply(table) %>% lapply(as.list) %>% lapply(data.frame) %>% rbind_all()
OccupancyDf$ID <- Occupancy$ID
OccupancyDf[is.na(OccupancyDf)] <- 0
OccupancyDf
输出是:
Jun.2011 Jul.2011 Aug.2012 Sep.2012 Oct.2012 Nov.2012 Dec.2012 Jan.2013 ID
20 11 0 0 0 0 0 0 50
0 0 28 6 0 0 0 0 51
0 0 0 15 0 0 0 0 52
0 0 0 0 9 30 31 0 53
0 0 0 0 0 4 0 0 54
0 0 0 0 0 8 31 4 55
如果它解决了您的问题,请不要告诉我们。)
答案 1 :(得分:0)
获取您展示的输出的方法
以下函数将接受数据框的单行(ENTRY
和EXIT
)并返回每个月分解的数据框。
custom.dates <- function(a,ts) {
if (ts > 0) {
newdates <- lapply(1:ts, function(x) a$ENTRY + period(x,"month"))
new.entry <- lapply(1:ts, function(x) { ymd(paste(year(newdates[[x]]), month(newdates[[x]]), "01", sep="-")) } )
newdates <- lapply((ts-1):0, function(x) a$ENTRY + period(x,"month"))
new.exit <- lapply(ts:1, function(x) { ymd(paste(year(newdates[[x]]), month(newdates[[x]]), days_in_month(month(newdates[[x]])), sep="-")) } )
df <- data.frame(ENTRY=sort(c(a$ENTRY,new.entry)), EXIT=sort(c(a$EXIT,new.exit)))
return(df)
} else {
return(a)
}
}
使用tidyverse
library(tidyverse)
result <- Occupancy %>%
mutate(monthspan = (year(EXIT)*12 + month(EXIT)) - (year(ENTRY)*12 + month(ENTRY)) ) %>%
nest(monthspan, ENTRY, EXIT) %>%
mutate(data = map(data, ~custom.dates(select(.x, -monthspan), .x$monthspan))) %>%
unnest(data)
输出
ID ENTRY EXIT
1 50 2011-06-11 2011-06-30
2 50 2011-07-01 2011-07-11
3 51 2012-08-04 2012-08-31
4 51 2012-09-01 2012-09-06
5 52 2012-09-10 2012-09-24
6 53 2012-10-23 2012-10-31
7 53 2012-11-01 2012-11-30
8 53 2012-12-01 2012-12-31
9 54 2012-11-15 2012-11-18
10 55 2012-11-23 2012-11-30
11 55 2012-12-01 2012-12-31
12 55 2013-01-01 2013-01-04