dplyr:用条件子组方法替换NAs和0

时间:2018-06-10 18:11:35

标签: r dplyr

我正在尝试将大数据集中的所有NAs和0替换为各自的组均值 - 根据非NA或0的情况计算。

Source: local data frame [174,019 x 3]
Groups: name

   student   name  hours
1       s1    ABC    1.0
2       s1    DEF     NA
3       s2    DEF    0.5
4       s3     NA    2.0
5       s3    ABC    2.0
6       s4    GHI      0

使用dplyr的此解决方案按预期工作,但这可以在一个链中完成吗?

avg <- workshops %>%
  filter(hours > 0 & !is.na(name)) %>%
  group_by(name) %>%
  summarize(avg.hours = mean(hours, na.rm = TRUE))

workshops <- workshops %>%
  left_join(avg, by = "name") %>%
  mutate(hours = if_else(hours > 0, hours, avg.hours, avg.hours)) %>%
  select(-avg.hours)

更新了解决方案

workshop <- workshop %>%
  group_by(name) %>%
  mutate(hours = ifelse(!is.na(name), replace(hours, hours == 0 | is.na(hours),
                 mean(`is.na<-`(hours, hours == 0), na.rm = TRUE)), NA))

2 个答案:

答案 0 :(得分:1)

你可以这样做:

workshop%>%
  group_by(name)%>%
  mutate(hours=replace(hours,hours==0|is.na(hours),
                 mean(`is.na<-`(hours,hours==0),na.rm = T)))

答案 1 :(得分:0)

以下是来自na.aggregate的{​​{1}}选项。在按名称&#39;进行分组后,将{0}更改为NA zoo并应用na_ifna.aggregate替换缺失的值(默认情况下, mean参数为FUN

mean

数据

library(dplyr)
library(zoo)
workshops %>%
    group_by(name) %>% 
    mutate(hours = na.aggregate(na_if(hours, 0)))