我想在R中使用dplyr来计算基于先前(新计算的)2个数据点的新数据点。但是,mutate函数不会直接更新新值(可能是因为它是矢量化函数),因此计算基于“旧”值。第一个'间隙'被正确填充,但第二个(缺少2个数据)会产生问题。
我该如何克服这个问题?
library(dplyr)
temp <- data.frame(new_MAP_top = c(68,71,70,72,NA,75,70,69,69,NA,73,75,83,NA,NA,95,98,97),
steps = c(NA,NA,NA,NA,1.50,NA,NA,NA,NA,2.00,NA,NA,NA,4.00,4.00,NA,NA,NA))
temp<-temp %>%
mutate(
prev1 = lag(new_MAP_top,1),
prev2 = lag(new_MAP_top,2),
previous_slope = prev1-prev2,
previous_slope = ifelse(is.na(previous_slope), 0, previous_slope),
new_MAP_top = ifelse(is.na(steps), new_MAP_top, round(prev1-(previous_slope-(3*steps))/4, digit=2))
)
答案 0 :(得分:0)
这个想法只是根据需要多次运行您的代码(您发布的内容),以便更新NA
列中的所有new_MAP_top
。正如您所提到的,问题来自于该列中连续多个NA
。
library(dplyr)
# example data
temp <- data.frame(new_MAP_top = c(68,71,70,72,NA,75,70,69,69,NA,73,75,83,NA,NA,95,98,97),
steps = c(NA,NA,NA,NA,1.50,NA,NA,NA,NA,2.00,NA,NA,NA,4.00,4.00,NA,NA,NA))
# function to fully update a given dataframe
UpdateDF = function(df) {
# while there are NAs in that column
while(sum(is.na(df$new_MAP_top)) > 0) {
# apply your process
df = df %>%
mutate(prev1 = lag(new_MAP_top,1),
prev2 = lag(new_MAP_top,2),
previous_slope = prev1-prev2,
previous_slope = ifelse(is.na(previous_slope), 0, previous_slope),
new_MAP_top = ifelse(is.na(steps), new_MAP_top, round(prev1-(previous_slope-(3*steps))/4, digit=2)))
}
# return the updated table when we have no more NAs in that column
df
}
# apply the function
UpdateDF(temp)
# new_MAP_top steps prev1 prev2 previous_slope
# 1 68.00 NA NA NA 0.00
# 2 71.00 NA 68.00 NA 0.00
# 3 70.00 NA 71.00 68.00 3.00
# 4 72.00 NA 70.00 71.00 -1.00
# 5 72.62 1.5 72.00 70.00 2.00
# 6 75.00 NA 72.62 72.00 0.62
# 7 70.00 NA 75.00 72.62 2.38
# 8 69.00 NA 70.00 75.00 -5.00
# 9 69.00 NA 69.00 70.00 -1.00
# 10 70.50 2.0 69.00 69.00 0.00
# 11 73.00 NA 70.50 69.00 1.50
# 12 75.00 NA 73.00 70.50 2.50
# 13 83.00 NA 75.00 73.00 2.00
# 14 84.00 4.0 83.00 75.00 8.00
# 15 86.75 4.0 84.00 83.00 1.00
# 16 95.00 NA NA 84.00 0.00
# 17 98.00 NA 95.00 NA 0.00
# 18 97.00 NA 98.00 95.00 3.00
要删除任何不必要的列,您可以调整函数以准确返回所需的内容。