我构造了以下data.frame对象:
name <- c("Homer", "Marge", "Bart", "Lisa", "Maggie")
incidents <- c(133, 36, 1242, 2, NA)
gender <- c("MALE", "FEMALE", "MALE", "FEMALE", "FEMALE")
data <- data.frame(name, incidents, gender)
产生数据=
name incidents gender
1 Homer 133 MALE
2 Marge 36 FEMALE
3 Bart 1242 MALE
4 Lisa 2 FEMALE
5 Maggie NA FEMALE
首先我用
清理数据clean_data <- data[!is.na(incidents), ]
这样就是clean_data =
name incidents gender
1 Homer 133 MALE
2 Marge 36 FEMALE
3 Bart 1242 MALE
4 Lisa 2 FEMALE
现在我按性别汇总
agg <- aggregate(incidents ~ gender, clean_data, mean)
产生
gender incidents
1 FEMALE 19.0
2 MALE 687.5
现在,我希望能够&#34;填写&#34;具有来自agg的数据的事件中的NA值,使得data =
name incidents gender
1 Homer 133 MALE
2 Marge 36 FEMALE
3 Bart 1242 MALE
4 Lisa 2 FEMALE
5 Maggie 19.0 FEMALE
使用基础R做最简单的方法是什么?
答案 0 :(得分:4)
您可以使用ave
。它给出了&#34;意思&#34;与原始数据集中的值相同(&#34; vals&#34;),检查&#34; NA&#34; &#34;事件中的元素&#34;列,并用&#34; vals替换那些&#34;对应的&#34; NA&#34;元件。
vals <- with(data, ave(incidents, gender, FUN= function(x)
mean(x, na.rm=TRUE)))
indx1 <- is.na(data$incidents)
data$incidents[indx1] <- vals[indx1]
评论中@MrFlick显示的较短版本。使用&#34; ifelse&#34;,它取代了&#34; NA&#34;元素与&#34;意思是&#34;值。
data$incidents<-with(data, ave(incidents, gender,
FUN=function(x) ifelse(is.na(x), mean(x, na.rm=T), x)))
取代&#34; ifelse&#34;,&#34;替换&#34;也可以用作@Ananda Mahto用&#34; data.table&#34;表示。
答案 1 :(得分:4)
对于多样性,这里的方法是“data.table”,它也演示了replace
函数。
library(data.table)
as.data.table(data)[
, incidents := replace(incidents, is.na(incidents),
mean(incidents, na.rm = TRUE)),
by = gender][]
# name incidents gender
# 1: Homer 133 MALE
# 2: Marge 36 FEMALE
# 3: Bart 1242 MALE
# 4: Lisa 2 FEMALE
# 5: Maggie 19 FEMALE