我有以下示例数据框:
Date_from <- c("2013-01-01","2013-01-10","2013-01-16","2013-01-19")
Date_to <- c("2013-01-07","2013-01-12","2013-01-18","2013-01-25")
y <- data.frame(Date_from,Date_to)
y$concentration <- c("1.5","2.5","1.5","3.5")
y$Date_from <- as.Date(y$Date_from)
y$Date_to <- as.Date(y$Date_to)
y$concentration <- as.numeric(y$concentration)
这些是特定日期范围内重金属的测量浓度。但是,日期范围不是连续的,因为2013-01-07至2013-01-10与2013-01-12至2013-01-16之间存在间隔。我需要检测这些间隙,在每个间隙后插入一行,并用缺少的范围填充它。结果应如下所示:
Date_from Date_to concentration
2013-01-01 2013-01-07 1.5
2013-01-08 2013-01-09 NA
2013-01-10 2013-01-12 2.5
2013-01-13 2013-01-15 NA
2013-01-16 2013-01-18 1.5
2013-01-19 2013-01-25 3.5
答案 0 :(得分:6)
尝试一下:
adding <- data.frame(Date_from = y$Date_to[-nrow(y)]+1,
Date_to = y$Date_from[-1]-1, concentration = NA)
adding <- adding[adding$Date_from <= adding$Date_to,]
res <- rbind(y,adding)
res[order(res$Date_from),]
# Date_from Date_to concentration
#1 2013-01-01 2013-01-07 1.5
#5 2013-01-08 2013-01-09 NA
#2 2013-01-10 2013-01-12 2.5
#6 2013-01-13 2013-01-15 NA
#3 2013-01-16 2013-01-18 1.5
#4 2013-01-19 2013-01-25 3.5
答案 1 :(得分:3)
这是一个需要magrittr
和dplyr
的解决方案。它找到差距,然后循环填补这些差距。
# Locations to pad data frame
tmp <- which(y$Date_from-lag(y$Date_to) > 1)
tmp <- tmp + (1:length(tmp)) - 1
for(i in tmp) {
# Add row
y %<>% add_row(Date_from = y$Date_to[i-1] + 1,
Date_to = y$Date_from[i] - 1,
.before = i)
}
# Date_from Date_to concentration
# 1 2013-01-01 2013-01-07 1.5
# 2 2013-01-08 2013-01-09 NA
# 3 2013-01-10 2013-01-12 2.5
# 4 2013-01-13 2013-01-15 NA
# 5 2013-01-16 2013-01-18 1.5
# 6 2013-01-19 2013-01-25 3.5