R:用dplyr以小时为单位替换NA值

时间:2014-10-13 08:52:59

标签: r dplyr

我正在学习R中的dplyr包,我真的很喜欢它。但现在我正在处理数据中的NA值。

我想用相应小时的平均值替换任何NA,例如使用这个非常简单的示例:

#create an example
day = c(1, 1, 2, 2, 3, 3)
hour = c(8, 16, 8, 16, 8, 16)
profit = c(100, 200, 50, 60, NA, NA)
shop.data = data.frame(day, hour, profit)

#calculate the average for each hour
library(dplyr)
mean.profit <- shop.data %>%
  group_by(hour) %>%
  summarize(mean=mean(profit, na.rm=TRUE))

> mean.profit
Source: local data frame [2 x 2]

  hour mean
1    8   75
2   16  130

我是否可以使用dplyr transform命令将利润中的第3天的NA替换为75(8:00)和130(16:00)?

2 个答案:

答案 0 :(得分:19)

尝试

  shop.data %>% 
             group_by(hour) %>% 
             mutate(profit= ifelse(is.na(profit), mean(profit, na.rm=TRUE), profit))

  #   day hour profit
  #1   1    8    100
  #2   1   16    200
  #3   2    8     50
  #4   2   16     60
  #5   3    8     75
  #6   3   16    130

或者您可以使用replace

  shop.data %>% 
            group_by(hour) %>%
            mutate(profit= replace(profit, is.na(profit), mean(profit, na.rm=TRUE)))

答案 1 :(得分:3)

基本功能的(不太优雅)方法:

transform(shop.data, 
          profit = ifelse(is.na(profit), 
                          ave(profit, hour, FUN = function(x) mean(x, na.rm = TRUE)), 
                          profit))

#   day hour profit
# 1   1    8    100
# 2   1   16    200
# 3   2    8     50
# 4   2   16     60
# 5   3    8     75
# 6   3   16    130