我有一个这样的数据框:
wpt ID Fuel Dist Express Local
1 S36 12 1 1 0
2 S36 14 2 1 0
inter S36 NA NA 1 0
inter S36 NA NA 1 0
3 S36 16 4 1 0
inter S36 NA NA 0 1
4 S36 18 6 0 1
5 S36 22 7 0 1
6 W09 45 9 1 0
inter W09 NA NA 1 0
inter W09 NA NA 1 0
inter W09 NA NA 1 0
7 W09 48 14 0 1
8 W09 50 15 0 1
(1)我想插值并插入值到具有Fuel和Dist列的NA的地方。我将行与“inter”一起处理,将常规编号为“wpt”的开始和结束行作为一个单元。然后进行插值。
预期的输出是这样的:
wpt ID Fuel Dist Express Local
1 S36 12 1 1 0
2 S36 14 2 1 0
inter S36 14.6667 2.67 1 0
inter S36 15.3333 3.33 1 0
3 S36 16 4 1 0
inter S36 17 5 0 1
4 S36 18 6 0 1
5 S36 22 7 0 1
6 W09 45 9 1 0
inter W09 45.75 10.25 1 0
inter W09 46.50 11.50 1 0
inter W09 47.25 12.75 1 0
7 W09 48 14 0 1
8 W09 50 15 0 1
要清楚,第一个段插值的计算如下:
> seq(14,16,length.out = 4)
[1] 14.00000 14.66667 15.33333 16.00000
(2)然后我想通过 ID 获得 Express和Local 的每个类别的累计和。预期的输出是这样的:
ID Cumsum.Fuel Cumsum.Dist Express Local
S36 4 3 1 0
S36 5 2 0 1
W09 2.25 3.75 1 0
W09 2 1 0 1
要清楚,Express的“S36”的Cum.sum.Fuel为16-12 = 4。这同样适用于其他人。
提前致谢!!!
答案 0 :(得分:3)
您可以使用的第一项任务:
library(zoo)
na.approx(df$Fuel)
[1] 12.00000 14.00000 14.66667 15.33333 16.00000 17.00000 18.00000 22.00000 45.00000 45.75000
[11] 46.50000 47.25000 48.00000 50.00000
答案 1 :(得分:1)
要填写这两列,我们可以在按" ID"
分组后使用mutate_at
library(dplyr)
library(zoo)
df2 <- df1 %>%
group_by(ID) %>%
mutate_at(vars(Fuel, Dist), na.approx)
df2
# wpt ID Fuel Dist Express Local
# <chr> <chr> <dbl> <dbl> <int> <int>
#1 1 S36 12.00000 1.000000 1 0
#2 2 S36 14.00000 2.000000 1 0
#3 inter S36 14.66667 2.666667 1 0
#4 inter S36 15.33333 3.333333 1 0
#5 3 S36 16.00000 4.000000 1 0
#6 inter S36 17.00000 5.000000 0 1
#7 4 S36 18.00000 6.000000 0 1
#8 5 S36 22.00000 7.000000 0 1
#9 6 W09 45.00000 9.000000 1 0
#10 inter W09 45.75000 10.250000 1 0
#11 inter W09 46.50000 11.500000 1 0
#12 inter W09 47.25000 12.750000 1 0
#13 7 W09 48.00000 14.000000 0 1
#14 8 W09 50.00000 15.000000 0 1
第二部分,
library(data.table)
df2 %>%
group_by(ID, Express1 = rleid(Express), Local1 = rleid(Local)) %>%
summarise(Express = first(Express),
Local = first(Local),
Cumsum.Fuel = last(Fuel) - first(Fuel),
Cumsum.Dist = last(Dist) - first(Dist)) %>%
ungroup() %>%
select(-Express1, - Local1)
#Source: local data frame [4 x 5]
# ID Express Local Cumsum.Fuel Cumsum.Dist
# <chr> <int> <int> <dbl> <dbl>
#1 S36 1 0 4.00 3.00
#2 S36 0 1 5.00 2.00
#3 W09 1 0 2.25 3.75
#4 W09 0 1 2.00 1.00
或者我们可以在没有rleid
df2 %>%
group_by(ID, Express, Local) %>%
summarise(Cumsum.Fuel = last(Fuel) - first(Fuel),
Cumsum.Dist = last(Dist) - first(Dist))