使用R中的插值填充缺失值datetime系列数据

时间:2016-08-03 09:39:19

标签: r datetime interpolation

我有以下数据集。

name   -  old - new - datetime     
1051     38656       38400      2016-01-24 03:22:37    
1051     5888        5632       2016-01-24 04:03:28  
1051     5632        38144      2016-01-24 04:34:22    
1051     5120        4864       2016-01-24 03:56:33  
1051     37376       37632      2016-01-25 08:08:16  
..       ..          ..         ..  

我想插入此数据集:

name   -  old - new -  datetime  
1051     ?           ?          2016-01-24 03:20:00  
1051     ?           ?          2016-01-24 03:30:00    
1051     ?           ?          2016-01-24 03:40:00  
1051     ?           ?          2016-01-24 03:50:00  
1051     ?           ?          2016-01-24 04:00:00  
1051     ?           ?          2016-01-25 04:10:00  
..       ..          ..         ..

我复杂了一个数据集。所以,我想插值以获得更干净的数据集。我试过这个:

data.frame(datetime = seq(roomsdatetime$datetime[1], roomsdatetime$datetime[nrow(roomsdatetime)], by = "10 min")) %>%  
    mutate(roomsdatetime, approx = na.approx(roomsdatetime$old_value))

我收到此错误:

  

错误:结果大小错误(3562),预期3565或1

还有其他方法吗?

1 个答案:

答案 0 :(得分:0)

在Excel中,在leftest-top中输入以下内容:

[“差异”(在(有序)日期时间与基数24.01.2016 03:20:00之间的秒数,其值被指定为0)列是通过公式“=(B3- $ B $ 2)*获得的86400" ]

name           datetime difference old     new
1051    24.01.2016 03:20:00 0       NA     NA
1051    24.01.2016 03:22:37 157 38656   38400
1051    24.01.2016 03:30:00 600     NA     NA
1051    24.01.2016 03:40:00 1200    NA     NA
1051    24.01.2016 03:50:00 1800    NA     NA
1051    24.01.2016 03:56:33 2193    5120   4864
1051    24.01.2016 04:00:00 2400    NA     NA
1051    24.01.2016 04:03:28 2608    5888  5632
1051    24.01.2016 04:34:22 4462    5632  38144
1051    25.01.2016 04:10:00 89400   NA    NA
1051    25.01.2016 08:08:16 103696  37376  37632

然后,文件 - 另存为 - [FileName:seymaalaca.csv;键入:“CSV(逗号分隔)(* .csv)”]

mydataframe <- read.csv("C:/Users/User/Documents/Revolution/seymaalaca.csv", header=TRUE, sep=",", stringsAsFactors = FALSE)
mydataframe # results in:



    name            datetime difference   old   new    
1  1051 24.01.2016 03:20:00          0    NA    NA    
2  1051 24.01.2016 03:22:37        157 38656 38400    
3  1051 24.01.2016 03:30:00        600    NA    NA    
4  1051 24.01.2016 03:40:00       1200    NA    NA    
5  1051 24.01.2016 03:50:00       1800    NA    NA    
6  1051 24.01.2016 03:56:33       2193  5120  4864    
7  1051 24.01.2016 04:00:00       2400    NA    NA    
8  1051 24.01.2016 04:03:28       2608  5888  5632    
9  1051 24.01.2016 04:34:22       4462  5632 38144    
10 1051 25.01.2016 04:10:00      89400    NA    NA    
11 1051 25.01.2016 08:08:16     103696 37376 37632

oldcolumn <- lm(mydataframe$old ~ mydataframe$difference)
oldcolumn  #  old = 1.348e+04  + 2.233e-01*difference
oldfunction <- function (difference) {1.348e+04 + 2.233e-01*difference} # produces the row values for the "old" column

newcolumn <- lm(mydataframe$new ~ mydataframe$difference)
newcolumn  # new = 2.14e+04 + 1.56e-01*difference
newfunction <- function (difference) {2.14e+04 + 1.56e-01*difference} # produces the row values for the "new" column

myinterpolizer <- function (difference) {c(oldfunction(difference),newfunction(difference))} #  produces the row values for the "old&new" column

myinterpolizer(0)  # 13480 21400
myinterpolizer(600) # 13613.98 21493.60
myinterpolizer(1200) # 13747.96 21587.20
myinterpolizer(1800) # 13881.94 21680.80
myinterpolizer(2400) # 14015.92 21774.40
myinterpolizer(89400) # 33443.02 35346.40

产生上述12个数字的更简单的单线程:

# mydataframe[is.na(mydataframe$old),] # filters the rows where old=NA
# mydataframe[is.na(mydataframe$old),3] # After (filtering the rows where old=NA) select (the "difference" column) 
lapply(mydataframe[is.na(mydataframe$old),3], myinterpolizer)