我有一个分组的数据框;
Truck <- c('A','A','A','A','B','B','B','B','C','C','C','C')
OilChanged <- c('True','NewOil','False','False','False','False','False','False','True','NewOil','True','NewOil')
Odometer <- c(1000, 1000, 2000,3000,700,800,900,1000,20000,20000,30000,40000)
DF <- data.frame(Truck, OilChanged, Odometer)
# Truck OilChanged Odometer
# 1 A True 1000
# 2 A NewOil 1000
# 3 A False 2000
# 4 A False 3000
# 5 B False 700
# 6 B False 800
# 7 B False 900
# 8 B False 1000
# 9 C True 20000
# 10 C NewOil 20000
# 11 C True 30000
# 12 C NewOil 40000
我想尽可能地推断出石油的年龄(以公里为单位)。仅在换油后才可以进行推断。如果没有换油,那么油的年龄将仍然是个谜(例如:卡车B)。
下面是期望的结果;
Truck <- c('A','A','A','A','B','B','B','B','C','C','C','C')
OilChanged <- c('True','NewOil','False','False','False','False','False','False','True','NewOil','True','NewOil')
Odometer <- c(1000, 1000, 2000, 3000,700,800,900,1000,20000,20000,30000,40000)
OilAge <- c(NA,0,1000,2000,NA,NA,NA,NA,NA,0,10000,0)
Result <- data.frame(Truck, OilChanged, Odometer, OilAge)
# Truck OilChanged Odometer OilAge
# 1 A True 1000 NA
# 2 A NewOil 1000 0
# 3 A False 2000 1000
# 4 A False 3000 2000
# 5 B False 700 NA
# 6 B False 800 NA
# 7 B False 900 NA
# 8 B False 1000 NA
# 9 C True 20000 NA
# 10 C NewOil 20000 0
# 11 C True 30000 10000
# 12 C NewOil 40000 0
注意:在 True oilchanged (真换油)行与紧随 NewOil 行的行之间的里程表读数将始终相同。因为在更换机油之前直接取样了机油。但是必须保留这两行,以使下游计算正常运行,例如变化率公式。
OilAge列中的不适用表示年龄是个谜。
答案 0 :(得分:1)
请告诉我此解决方案是否适合您。
Truck <- c('A','A','A','A','B','B','B','B','C','C','C','C')
OilChanged <- c('True','NewOil','False','False','False','False','False','False','True','NewOil','True','NewOil')
Odometer <- c(1000, 1000, 2000,3000,700,800,900,1000,20000,20000,30000,30000)
DF <- data.frame(Truck, OilChanged, Odometer)
DF %>%
group_by(Truck) %>%
mutate(status = length(unique(OilChanged)),
OilAge = ifelse(OilChanged == "NewOil", 0,
ifelse(OilChanged == "False", Odometer - (Odometer - lag(Odometer)),
ifelse(OilChanged == "True", Odometer - lag(Odometer), NA)))) %>%
mutate(OilAge = ifelse(status !=1, OilAge, NA)) %>%
subset(select = c(Truck, OilChanged, Odometer, OilAge))
答案 1 :(得分:1)
另一种方法
DF %>% group_by(Truck) %>%
mutate(d = cumsum(OilChanged == 'NewOil')) %>%
group_by(Truck, d) %>%
mutate(OilAge = cumsum(c(0*NA^(as.logical(!(first(d)))), diff(NA^(as.logical(!d))*Odometer))))
# A tibble: 12 x 5
# Groups: Truck, d [6]
Truck OilChanged Odometer d OilAge
<chr> <chr> <dbl> <int> <dbl>
1 A True 1000 0 NA
2 A NewOil 1000 1 0
3 A False 2000 1 1000
4 A False 3000 1 2000
5 B False 700 0 NA
6 B False 800 0 NA
7 B False 900 0 NA
8 B False 1000 0 NA
9 C True 20000 0 NA
10 C NewOil 20000 1 0
11 C True 30000 1 10000
12 C NewOil 30000 2 0
d
是一个虚拟变量,您可以在了解已完成的操作后取消选择