我有一个每日时间序列数据集,我试图执行每小时线性插值。我的代码在点之间执行线性插值,但我需要在每个ID的最后一个点和新的一天的开始之后从0开始。
在将缺少的小时数添加到每日原始数据后,下面是我的输出
执行以下代码后,下面是我的输出,但我不知道如何从0开始:
dfYPO0_20171 <- dfYPO0_2017
%>% mutate(TIMESTAMP = as.POSIXct(as.character(TIMESTAMP)))
%>% group_by(ID)
%>% do(left_join(data.frame(ID= .$ID[1], TIMESTAMP = seq(min(.$TIMESTAMP), max(.$TIMESTAMP), by = "hour")), ., by=c("ID", "TIMESTAMP")))
%>% mutate(CALC_HOURLY_PROD= na.approx(.$"Total Prod Yest"))
以下是我希望输出的结果:
提前感谢您的帮助!
答案 0 :(得分:2)
这是一种使用tidyverse
包的方法。首先,我们将根据缺失值的运行创建组,然后我们将使用approx
进行插值。
library(tidyverse)
# Fake data
dat = data.frame(time=seq(as.Date("2015-01-01"), as.Date("2015-01-31"), "1 day"),
prod=c(10.4, rep(NA,19), 25.8, rep(NA,9), 14.2))
dat = dat %>%
# Create groups based on runs of NA followed by a value
mutate(group = rev(cumsum(!is.na(rev(prod))))) %>%
# Operate on the groups we just created
group_by(group) %>%
# First, add a zero at the beginning of each group, then run the approx function
# to interpolate values for all groups of length greater than 1
mutate(prod = replace(prod, row_number()==1 & n()>1, 0),
prod = if(n()>1) approx(time, prod, xout=time)$y else prod) %>%
ungroup %>% select(-group)
time prod 1 2015-01-01 10.400000 2 2015-01-02 0.000000 3 2015-01-03 1.357895 ... 19 2015-01-19 23.084211 20 2015-01-20 24.442105 21 2015-01-21 25.800000 22 2015-01-22 0.000000 23 2015-01-23 1.577778 24 2015-01-24 3.155556 ... 29 2015-01-29 11.044444 30 2015-01-30 12.622222 31 2015-01-31 14.200000