我正在尝试将大数据集中的所有NAs和0替换为各自的组均值 - 根据非NA或0的情况计算。
Source: local data frame [174,019 x 3]
Groups: name
student name hours
1 s1 ABC 1.0
2 s1 DEF NA
3 s2 DEF 0.5
4 s3 NA 2.0
5 s3 ABC 2.0
6 s4 GHI 0
使用dplyr
的此解决方案按预期工作,但这可以在一个链中完成吗?
avg <- workshops %>%
filter(hours > 0 & !is.na(name)) %>%
group_by(name) %>%
summarize(avg.hours = mean(hours, na.rm = TRUE))
workshops <- workshops %>%
left_join(avg, by = "name") %>%
mutate(hours = if_else(hours > 0, hours, avg.hours, avg.hours)) %>%
select(-avg.hours)
更新了解决方案
workshop <- workshop %>%
group_by(name) %>%
mutate(hours = ifelse(!is.na(name), replace(hours, hours == 0 | is.na(hours),
mean(`is.na<-`(hours, hours == 0), na.rm = TRUE)), NA))
答案 0 :(得分:1)
你可以这样做:
workshop%>%
group_by(name)%>%
mutate(hours=replace(hours,hours==0|is.na(hours),
mean(`is.na<-`(hours,hours==0),na.rm = T)))
答案 1 :(得分:0)
以下是来自na.aggregate
的{{1}}选项。在按名称&#39;进行分组后,将{0}更改为NA zoo
并应用na_if
以na.aggregate
替换缺失的值(默认情况下, mean
参数为FUN
)
mean
library(dplyr)
library(zoo)
workshops %>%
group_by(name) %>%
mutate(hours = na.aggregate(na_if(hours, 0)))