插值缺失值,并根据R

时间:2016-07-11 12:40:47

标签: r dataframe interpolation

我有一个这样的数据框:

wpt    ID   Fuel  Dist  Express   Local
 1     S36   12    1     1         0
 2     S36   14    2     1         0
 inter S36   NA    NA    1         0
 inter S36   NA    NA    1         0
 3     S36   16    4     1         0
 inter S36   NA    NA    0         1
 4     S36   18    6     0         1
 5     S36   22    7     0         1
 6     W09   45    9     1         0
 inter W09   NA    NA    1         0
 inter W09   NA    NA    1         0
 inter W09   NA    NA    1         0
 7     W09   48    14    0         1
 8     W09   50    15    0         1

(1)我想插值并插入值到具有Fuel和Dist列的NA的地方。我将行与“inter”一起处理,将常规编号为“wpt”的开始和结束行作为一个单元。然后进行插值。

预期的输出是这样的:

wpt    ID   Fuel     Dist  Express   Local
 1     S36   12       1     1         0
 2     S36   14       2     1         0
 inter S36   14.6667  2.67  1         0
 inter S36   15.3333  3.33  1         0
 3     S36   16       4     1         0
 inter S36   17       5     0         1
 4     S36   18       6     0         1
 5     S36   22       7     0         1
 6     W09   45       9     1         0
 inter W09   45.75    10.25 1         0
 inter W09   46.50    11.50 1         0
 inter W09   47.25    12.75 1         0
 7     W09   48       14    0         1
 8     W09   50       15    0         1

要清楚,第一个段插值的计算如下:

> seq(14,16,length.out = 4)
[1] 14.00000 14.66667 15.33333 16.00000

(2)然后我想通过 ID 获得 Express和Local 的每个类别的累计和。预期的输出是这样的:

ID  Cumsum.Fuel  Cumsum.Dist Express  Local
S36    4             3          1       0
S36    5             2          0       1
W09    2.25          3.75       1       0
W09    2             1          0       1

要清楚,Express的“S36”的Cum.sum.Fuel为16-12 = 4。这同样适用于其他人。

提前致谢!!!

2 个答案:

答案 0 :(得分:3)

您可以使用的第一项任务:

library(zoo)
na.approx(df$Fuel)
 [1] 12.00000 14.00000 14.66667 15.33333 16.00000 17.00000 18.00000 22.00000 45.00000 45.75000
[11] 46.50000 47.25000 48.00000 50.00000

答案 1 :(得分:1)

要填写这两列,我们可以在按" ID"

分组后使用mutate_at
library(dplyr)
library(zoo)
df2 <- df1 %>% 
         group_by(ID) %>% 
         mutate_at(vars(Fuel, Dist), na.approx) 
df2
#     wpt    ID     Fuel      Dist Express Local
#   <chr> <chr>    <dbl>     <dbl>   <int> <int>
#1      1   S36 12.00000  1.000000       1     0
#2      2   S36 14.00000  2.000000       1     0
#3  inter   S36 14.66667  2.666667       1     0
#4  inter   S36 15.33333  3.333333       1     0
#5      3   S36 16.00000  4.000000       1     0
#6  inter   S36 17.00000  5.000000       0     1
#7      4   S36 18.00000  6.000000       0     1
#8      5   S36 22.00000  7.000000       0     1
#9      6   W09 45.00000  9.000000       1     0
#10 inter   W09 45.75000 10.250000       1     0
#11 inter   W09 46.50000 11.500000       1     0
#12 inter   W09 47.25000 12.750000       1     0
#13     7   W09 48.00000 14.000000       0     1
#14     8   W09 50.00000 15.000000       0     1

第二部分,

library(data.table)
df2 %>%
   group_by(ID, Express1 = rleid(Express), Local1 = rleid(Local)) %>%
   summarise(Express = first(Express),
             Local = first(Local), 
             Cumsum.Fuel = last(Fuel) - first(Fuel),
             Cumsum.Dist = last(Dist) - first(Dist))  %>%
    ungroup() %>% 
    select(-Express1, - Local1)
#Source: local data frame [4 x 5]
#    ID Express Local Cumsum.Fuel Cumsum.Dist
#  <chr>   <int> <int>       <dbl>       <dbl>
#1   S36       1     0        4.00        3.00
#2   S36       0     1        5.00        2.00
#3   W09       1     0        2.25        3.75
#4   W09       0     1        2.00        1.00

或者我们可以在没有rleid

的情况下执行此操作
df2 %>%
    group_by(ID, Express, Local) %>% 
    summarise(Cumsum.Fuel = last(Fuel) - first(Fuel), 
              Cumsum.Dist = last(Dist) - first(Dist))