R dplyr使用以前的数据

时间:2017-11-30 11:56:55

标签: r dplyr

我想在R中使用dplyr来计算基于先前(新计算的)2个数据点的新数据点。但是,mutate函数不会直接更新新值(可能是因为它是矢量化函数),因此计算基于“旧”值。第一个'间隙'被正确填充,但第二个(缺少2个数据)会产生问题。

我该如何克服这个问题?

library(dplyr)

temp <- data.frame(new_MAP_top = c(68,71,70,72,NA,75,70,69,69,NA,73,75,83,NA,NA,95,98,97),
                   steps = c(NA,NA,NA,NA,1.50,NA,NA,NA,NA,2.00,NA,NA,NA,4.00,4.00,NA,NA,NA))

temp<-temp %>%
  mutate(
    prev1 = lag(new_MAP_top,1),
    prev2 = lag(new_MAP_top,2),
    previous_slope = prev1-prev2,
    previous_slope = ifelse(is.na(previous_slope), 0, previous_slope),
    new_MAP_top = ifelse(is.na(steps), new_MAP_top, round(prev1-(previous_slope-(3*steps))/4, digit=2))
  )

1 个答案:

答案 0 :(得分:0)

这个想法只是根据需要多次运行您的代码(您发布的内容),以便更新NA列中的所有new_MAP_top。正如您所提到的,问题来自于该列中连续多个NA

library(dplyr)

# example data
temp <- data.frame(new_MAP_top = c(68,71,70,72,NA,75,70,69,69,NA,73,75,83,NA,NA,95,98,97),
                   steps = c(NA,NA,NA,NA,1.50,NA,NA,NA,NA,2.00,NA,NA,NA,4.00,4.00,NA,NA,NA))


# function to fully update a given dataframe
UpdateDF = function(df) {

    # while there are NAs in that column
    while(sum(is.na(df$new_MAP_top)) > 0) { 

      # apply your process
      df = df %>%
      mutate(prev1 = lag(new_MAP_top,1),
             prev2 = lag(new_MAP_top,2),
             previous_slope = prev1-prev2,
             previous_slope = ifelse(is.na(previous_slope), 0, previous_slope),
             new_MAP_top = ifelse(is.na(steps), new_MAP_top, round(prev1-(previous_slope-(3*steps))/4, digit=2)))        
    }

      # return the updated table when we have no more NAs in that column
      df
}

# apply the function
UpdateDF(temp)

#    new_MAP_top steps prev1 prev2 previous_slope
# 1        68.00    NA    NA    NA           0.00
# 2        71.00    NA 68.00    NA           0.00
# 3        70.00    NA 71.00 68.00           3.00
# 4        72.00    NA 70.00 71.00          -1.00
# 5        72.62   1.5 72.00 70.00           2.00
# 6        75.00    NA 72.62 72.00           0.62
# 7        70.00    NA 75.00 72.62           2.38
# 8        69.00    NA 70.00 75.00          -5.00
# 9        69.00    NA 69.00 70.00          -1.00
# 10       70.50   2.0 69.00 69.00           0.00
# 11       73.00    NA 70.50 69.00           1.50
# 12       75.00    NA 73.00 70.50           2.50
# 13       83.00    NA 75.00 73.00           2.00
# 14       84.00   4.0 83.00 75.00           8.00
# 15       86.75   4.0 84.00 83.00           1.00
# 16       95.00    NA    NA 84.00           0.00
# 17       98.00    NA 95.00    NA           0.00
# 18       97.00    NA 98.00 95.00           3.00

要删除任何不必要的列,您可以调整函数以准确返回所需的内容。