我有以下示例数据框:
Date_from <- c("2013-01-01","2013-05-10","2013-08-13","2013-11-19")
Date_to <- c("2013-05-07","2013-08-12","2013-11-18","2013-12-25")
y <- data.frame(Date_from,Date_to)
y$concentration <- c("1.5","2.5","1.5","3.5")
y$Date_from <- as.Date(y$Date_from)
y$Date_to <- as.Date(y$Date_to)
y$concentration <- as.numeric(y$concentration)
我使用以下代码检测日期范围中的间隔,并将缺失的日期范围添加到数据框中,并为缺失的浓度分配NA:
adding<-data.frame(Date_from=y$Date_to[-nrow(y)]+1,Date_to=y$Date_from[-1]-1,concentration=NA)
adding<-adding[ adding$Date_from<adding$Date_to,]
res<-rbind(y,adding)
res[order(res$Date_from),]
结果是:
Date_from Date_to concentration
2013-01-01 2013-05-07 1.5
2013-05-08 2013-05-09 NA
2013-05-10 2013-08-12 2.5
2013-08-13 2013-11-18 1.5
2013-11-19 2013-12-25 3.5
现在的问题是数据帧在2013-12-25而不是2013-12-31结束。如何执行以下操作:
结果应如下所示:
Date_from Date_to concentration
2013-01-01 2013-05-07 1.5
2013-05-08 2013-05-09 NA
2013-05-10 2013-08-12 2.5
2013-08-13 2013-11-18 1.5
2013-11-19 2013-12-25 3.5
2013-12-26 2013-12-31 NA
答案 0 :(得分:2)
你不只是想要这个吗?
df <- read.table(text = "
Date_from Date_to concentration
2013-01-01 2013-05-07 1.5
2013-05-08 2013-05-09 NA
2013-05-10 2013-08-12 2.5
2013-08-13 2013-11-18 1.5
2013-11-19 2013-12-25 3.5", h = T, stringsAsFactors = F)
rbind(df, c(as.character(max(as.Date(df$Date_to))+1), paste0(substr(max(as.Date(df$Date_to)), 1, 4),"-12-31") , NA))
Date_from Date_to concentration
1 2013-01-01 2013-05-07 1.5
2 2013-05-08 2013-05-09 <NA>
3 2013-05-10 2013-08-12 2.5
4 2013-08-13 2013-11-18 1.5
5 2013-11-19 2013-12-25 3.5
6 2013-12-26 2013-12-31 <NA>
答案 1 :(得分:1)
您可以使用此显式功能
date_order<-function(dt){
for(i in 1:(nrow(dt)-1)){
if(dt[[1]][i+1] - dt[[2]][i] > 1){
pre<-dt[[2]][i] + 1
post<-dt[[1]][(i+1)] - 1
add<-data.frame("Date_from" = pre,"Date_to" = post,"concentration" = NA)
dt2<-rbind.data.frame(dt,add)
}
}
if(exists("dt2") == F){
dt2<-dt
}
dt2<-dt2[order(dt2$Date_from),]
yr<-substr(x = dt[[2]][nrow(dt)],start = 1,stop = 4)
last_day<-as.Date(paste(yr,"12-31",sep = "-"),format = "%Y-%m-%d")
if(dt[[2]][nrow(dt)] != last_day){
add2<-data.frame("Date_from" = dt[[2]][nrow(dt)] + 1,"Date_to" = last_day,"concentration" = NA)
dt2<-rbind.data.frame(dt2,add2)
}
return(dt2)
}
将此功能与您的数据一起使用将返回以下信息:
> date_order(y)
Date_from Date_to concentration
1 2013-01-01 2013-05-07 1.5
5 2013-05-08 2013-05-09 NA
2 2013-05-10 2013-08-12 2.5
3 2013-08-13 2013-11-18 1.5
4 2013-11-19 2013-12-25 3.5
11 2013-12-26 2013-12-31 NA
希望这就是您想要的。
答案 2 :(得分:0)
我的建议是将y
与包含一年中所有可能期间(明确给出或“剩余”)的数据框相结合。下面的解决方案使用data.table
语法以及floor_date()
包中的ceiling_date()
和lubridate
函数。这样可以确保即使给定的时间跨度多年,该解决方案也能正常工作。
library(data.table)
library(magrittr)
# coerce character dates to numeric dates
cols <- c("Date_from", "Date_to")
setDT(y, key = cols)[, (cols) := lapply(.SD, as.IDate), .SDcols = cols]
# create sequence of starting points of all periods
breaks <- y[, c(Date_from, Date_to + 1L)] %>%
# append start and end of year
c(lubridate::floor_date(min(.), "year"),
lubridate:: ceiling_date(max(.), "year")) %>%
sort() %>%
unique() %T>%
print()
[1] "2013-01-01" "2013-05-08" "2013-05-10" "2013-08-13" "2013-11-19" "2013-12-26" "2014-01-01"
# create periods
x <- data.table(from = head(breaks, -1L), to = tail(breaks, -1L) - 1L,
key = c("from", "to"))
x
from to 1: 2013-01-01 2013-05-07 2: 2013-05-08 2013-05-09 3: 2013-05-10 2013-08-12 4: 2013-08-13 2013-11-18 5: 2013-11-19 2013-12-25 6: 2013-12-26 2013-12-31
# right join to create the expected result
y[x]
Date_from Date_to concentration 1: 2013-01-01 2013-05-07 1.5 2: 2013-05-08 2013-05-09 NA 3: 2013-05-10 2013-08-12 2.5 4: 2013-08-13 2013-11-18 1.5 5: 2013-11-19 2013-12-25 3.5 6: 2013-12-26 2013-12-31 NA