Question

我对R非常陌生，但是我发现它非常有趣。

所以我进行了很多搜索，尽管有很多帖子解决了使用

计算多列中的缺失值的问题

na_count <-sapply(data, function(y) sum(length(which(is.na(y)))))
na_count <- data.frame(na_count)

，但找不到针对我问题的具体答案。

我有一个数据集，其中有一列称为物类，而另一列称为重量列，其中有一些缺失值。

我需要找到按物种分组的“权重”中的缺失值。我需要使用group_by并进行总结。

我遇到的错误之一是

因子species包含隐式NA，请考虑使用forcats::fct_explicit_na

我认为这与'（species）分组的im列也具有NA有关。

我尝试过

DF %>% 
  group_by(species) %>% 
  summarize(funs(sum(is.na(weight))))

这不起作用。

最后，我需要在缺失值中估算每种物种的平均重量。

欢呼

Answer 1

这是一个假设的数据帧：

df = data_frame(species = sample(c("dogs", "cats", "horses"), 100, replace = T) ,
weight = sample(seq(100, 200), 100))

让我们在其中放一些NA：

df[sample(seq(1:100), 30), 2] = NA

计算NA：

df %>% group_by(species) %>% summarise(NA_sum = sum(is.na(weight)))

最后的答案是：

df %>% group_by(species) %>% 
mutate(weight = ifelse(is.na(weight), mean(weight, na.rm = T), weight))