我有一个看起来像这样的数据框
row1 key1 10
row2 key1 12
row3 key1 NA
row4 key2 2
row5 key2 3
row6 key2 NA
...
现在我想用每个键的平均值替换所有NA。例如。第一个NA应该用平均值10,12和2替换为平均值为2,3
一个粗略的解决方案是获取所有密钥,迭代它们,过滤该特定密钥的数据帧,然后用均值替换NA。还有其他更好的解决方
答案 0 :(得分:3)
有很多方法可以做到这一点。库zoo
具有内置函数na.aggregate
,它可以完全按照您的需要进行操作。
library(zoo)
d <- data.frame(key = rep(c("key1", "key2"), each = 3),
value = c(10, 12, NA, 2, 3, NA))
with(d, na.aggregate(value, by = key))
答案 1 :(得分:2)
如果这是您的数据集
temp <- read.table(text = "row1 key1 10
row2 key1 12
row3 key1 NA
row4 key2 2
row5 key2 3
row6 key2 NA", header = F)
library(data.table)
setDT(temp)[, V3 := as.numeric(V3)]
temp[, V3 := lapply(.SD, function(x) ifelse(is.na(x), mean(x, na.rm = T), V3)), .SDcols = "V3", by = V2]
答案 2 :(得分:1)
你可以尝试:
dat <- read.table(text="row1 key1 10
row2 key1 12
row3 key1 NA
row4 key2 2
row5 key2 3
row6 key2 NA
row7 key2 NA",sep="",header=F,stringsAsFactors=F)
repl <- with(dat, table(is.na(V3),V2)[2,]) #to get the number of missing values per group
dat1 <- dat
indx <- is.na(dat$V3) #create a index of position of missing values
dat$V3[indx] <- rep(with(dat, by(V3, V2, FUN=mean, na.rm=T)), repl) #replicate by `repl` in cases of multiple NAs per group
或使用ave
dat1$V3[indx] <- with(dat1, ave(V3, V2, FUN=function(x) mean(x, na.rm=T)))[indx]
identical(dat1, dat)
#[1] TRUE
dat1$V3
# [1] 10.0 12.0 11.0 2.0 3.0 2.5 2.5
答案 3 :(得分:1)
这应该可以解决问题,使用您的数据:
d <- data.frame(id = rep(c("key1", "key2"), each = 3), x = c(10, 12, NA, 2, 3, NA))
library(plyr)
d2 <- ddply(d, .(id), transform, x=ifelse(is.na(x), mean(x, na.rm=T), x) )
在此,我们会根据key
和x
分割数据框,具体取决于是NA
。结果是
id x
1 key1 10.0
2 key1 12.0
3 key1 11.0
4 key2 2.0
5 key2 3.0
6 key2 2.5