Question

我有以下示例数据框：

Date_from <- c("2013-01-01","2013-05-10","2013-08-13","2013-11-19")
Date_to <- c("2013-05-07","2013-08-12","2013-11-18","2013-12-25")
y <- data.frame(Date_from,Date_to)
y$concentration <- c("1.5","2.5","1.5","3.5")
y$Date_from <- as.Date(y$Date_from)
y$Date_to <- as.Date(y$Date_to)
y$concentration <- as.numeric(y$concentration)

我使用以下代码检测日期范围中的间隔，并将缺失的日期范围添加到数据框中，并为缺失的浓度分配NA：

adding<-data.frame(Date_from=y$Date_to[-nrow(y)]+1,Date_to=y$Date_from[-1]-1,concentration=NA)
adding<-adding[ adding$Date_from<adding$Date_to,]
res<-rbind(y,adding)
res[order(res$Date_from),]

结果是：

Date_from    Date_to concentration
2013-01-01 2013-05-07           1.5
2013-05-08 2013-05-09            NA
2013-05-10 2013-08-12           2.5
2013-08-13 2013-11-18           1.5
2013-11-19 2013-12-25           3.5

现在的问题是数据帧在2013-12-25而不是2013-12-31结束。如何执行以下操作：

检测数据框中最后一个日期范围的结束日期，例如2013-12-25
再添加一行，并计算直到一年中最后一天的新日期范围，并添加NA来集中注意力

结果应如下所示：

Date_from    Date_to concentration
2013-01-01 2013-05-07           1.5
2013-05-08 2013-05-09            NA
2013-05-10 2013-08-12           2.5
2013-08-13 2013-11-18           1.5
2013-11-19 2013-12-25           3.5
2013-12-26 2013-12-31            NA

Answer 1

你不只是想要这个吗？

df <- read.table(text = "
Date_from    Date_to concentration
2013-01-01 2013-05-07           1.5
2013-05-08 2013-05-09            NA
2013-05-10 2013-08-12           2.5
2013-08-13 2013-11-18           1.5
2013-11-19 2013-12-25           3.5", h = T, stringsAsFactors = F)


rbind(df, c(as.character(max(as.Date(df$Date_to))+1), paste0(substr(max(as.Date(df$Date_to)), 1, 4),"-12-31")  , NA))


   Date_from    Date_to concentration
1 2013-01-01 2013-05-07           1.5
2 2013-05-08 2013-05-09          <NA>
3 2013-05-10 2013-08-12           2.5
4 2013-08-13 2013-11-18           1.5
5 2013-11-19 2013-12-25           3.5
6 2013-12-26 2013-12-31          <NA>

Answer 2

您可以使用此显式功能

date_order<-function(dt){
  for(i in 1:(nrow(dt)-1)){
    if(dt[[1]][i+1] - dt[[2]][i] > 1){
      pre<-dt[[2]][i] + 1
      post<-dt[[1]][(i+1)] - 1
      add<-data.frame("Date_from" = pre,"Date_to" = post,"concentration" = NA)
      dt2<-rbind.data.frame(dt,add)
    }
  }
  if(exists("dt2") == F){
    dt2<-dt
  }
  dt2<-dt2[order(dt2$Date_from),]
  yr<-substr(x = dt[[2]][nrow(dt)],start = 1,stop = 4)
  last_day<-as.Date(paste(yr,"12-31",sep = "-"),format = "%Y-%m-%d")
  if(dt[[2]][nrow(dt)] != last_day){
    add2<-data.frame("Date_from" = dt[[2]][nrow(dt)] + 1,"Date_to" = last_day,"concentration" = NA)
    dt2<-rbind.data.frame(dt2,add2)
  }
  return(dt2)
}

将此功能与您的数据一起使用将返回以下信息：

> date_order(y)
    Date_from    Date_to concentration
1  2013-01-01 2013-05-07           1.5
5  2013-05-08 2013-05-09            NA
2  2013-05-10 2013-08-12           2.5
3  2013-08-13 2013-11-18           1.5
4  2013-11-19 2013-12-25           3.5
11 2013-12-26 2013-12-31            NA

希望这就是您想要的。

Answer 3

我的建议是将y与包含一年中所有可能期间（明确给出或“剩余”）的数据框相结合。下面的解决方案使用data.table语法以及floor_date()包中的ceiling_date()和lubridate函数。这样可以确保即使给定的时间跨度多年，该解决方案也能正常工作。

library(data.table)
library(magrittr)
# coerce character dates to numeric dates
cols <- c("Date_from", "Date_to")
setDT(y, key = cols)[, (cols) := lapply(.SD, as.IDate), .SDcols = cols]
# create sequence of starting points of all periods
breaks <- y[, c(Date_from, Date_to + 1L)] %>% 
  # append start and end of year
  c(lubridate::floor_date(min(.), "year"), 
           lubridate:: ceiling_date(max(.), "year")) %>% 
  sort() %>% 
  unique() %T>%
  print()

[1] "2013-01-01" "2013-05-08" "2013-05-10" "2013-08-13" "2013-11-19" "2013-12-26" "2014-01-01"

# create periods
x <- data.table(from = head(breaks, -1L), to = tail(breaks, -1L) - 1L, 
                key = c("from", "to"))
x

         from         to
1: 2013-01-01 2013-05-07
2: 2013-05-08 2013-05-09
3: 2013-05-10 2013-08-12
4: 2013-08-13 2013-11-18
5: 2013-11-19 2013-12-25
6: 2013-12-26 2013-12-31

# right join to create the expected result
y[x]

    Date_from    Date_to concentration
1: 2013-01-01 2013-05-07           1.5
2: 2013-05-08 2013-05-09            NA
3: 2013-05-10 2013-08-12           2.5
4: 2013-08-13 2013-11-18           1.5
5: 2013-11-19 2013-12-25           3.5
6: 2013-12-26 2013-12-31            NA

日期范围到全年的完整列表

3 个答案: