填写缺少的日期范围

时间:2018-07-04 12:47:12

标签: r date

我有以下示例数据框:

Date_from <- c("2013-01-01","2013-01-10","2013-01-16","2013-01-19")
Date_to <- c("2013-01-07","2013-01-12","2013-01-18","2013-01-25")
y <- data.frame(Date_from,Date_to)
y$concentration <- c("1.5","2.5","1.5","3.5")
y$Date_from <- as.Date(y$Date_from)
y$Date_to <- as.Date(y$Date_to)
y$concentration <- as.numeric(y$concentration)

这些是特定日期范围内重金属的测量浓度。但是,日期范围不是连续的,因为2013-01-07至2013-01-10与2013-01-12至2013-01-16之间存在间隔。我需要检测这些间隙,在每个间隙后插入一行,并用缺少的范围填充它。结果应如下所示:

Date_from    Date_to concentration
2013-01-01 2013-01-07           1.5
2013-01-08 2013-01-09            NA
2013-01-10 2013-01-12           2.5
2013-01-13 2013-01-15            NA
2013-01-16 2013-01-18           1.5
2013-01-19 2013-01-25           3.5

2 个答案:

答案 0 :(得分:6)

尝试一下:

adding <- data.frame(Date_from = y$Date_to[-nrow(y)]+1,
                     Date_to = y$Date_from[-1]-1, concentration = NA)
adding <- adding[adding$Date_from <= adding$Date_to,]
res <- rbind(y,adding)
res[order(res$Date_from),]

#   Date_from    Date_to concentration
#1 2013-01-01 2013-01-07           1.5
#5 2013-01-08 2013-01-09            NA
#2 2013-01-10 2013-01-12           2.5
#6 2013-01-13 2013-01-15            NA
#3 2013-01-16 2013-01-18           1.5
#4 2013-01-19 2013-01-25           3.5

答案 1 :(得分:3)

这是一个需要magrittrdplyr的解决方案。它找到差距,然后循环填补这些差距。

# Locations to pad data frame
tmp <- which(y$Date_from-lag(y$Date_to) > 1) 
tmp <- tmp + (1:length(tmp)) - 1

for(i in tmp) {
  # Add row
  y %<>% add_row(Date_from = y$Date_to[i-1] + 1, 
                 Date_to = y$Date_from[i] - 1, 
                 .before = i)
}

#    Date_from    Date_to concentration
# 1 2013-01-01 2013-01-07           1.5
# 2 2013-01-08 2013-01-09            NA
# 3 2013-01-10 2013-01-12           2.5
# 4 2013-01-13 2013-01-15            NA
# 5 2013-01-16 2013-01-18           1.5
# 6 2013-01-19 2013-01-25           3.5