您好我写了一个函数,用每个列来计算NAs的中位数:
df1<-data.frame(c=(1:5), d=(11:15), f=c(1,NA, 2:4), e=c(1,0,1,0,1), g=c(1,NA,2,36,7))
reemp<-function (tbl) {
var_incom<-colnames(tbl)[ !complete.cases(t(tbl))]
for (col in var_incom) {
tbl$col[is.na(tbl$col)] <-median(tbl$col, na.rm=TRUE)}
return(tbl)}
reemp(df1)
但是我得到了警告信息而没有结果:
Warning messages: 1: In is.na(x) : is.na() applied to non-(list or vector) of type 'NULL' 2: In is.na(tbl$col) : is.na() applied to non-(list or vector) of type 'NULL' 3: In is.na(x) : is.na() applied to non-(list or vector) of type 'NULL' 4: In is.na(tbl$col) : is.na() applied to non-(list or vector) of type 'NULL'
答案 0 :(得分:1)
尝试:
df1[] <- lapply(df1, function(x) replace(x, is.na(x), median(x, na.rm=TRUE)))
如果您有很多列,那么只对至少有一个NA
nm1 <- names(df1)[unlist(lapply(df1, anyNA))]
#or nm1 <- names(df1)[colSums(is.na(df1))>0]
df1[nm1] <- lapply(df1[nm1], function(x) replace(x, is.na(x), median(x,na.rm=TRUE)))
或
library(matrixStats)
df1[is.na(df1)] <- colMedians(as.matrix(df1),
na.rm=TRUE)[which(is.na(df1), arr.ind=TRUE)[,2]]
答案 1 :(得分:1)
我用tbl [,col]替换了tbl $ col并且工作了。
reemp<-function (tbl) {
x <- data.frame(x=1)
var_incom<-colnames(tbl)[ !complete.cases(t(tbl))]
for (col in var_incom) {
tbl[,col][is.na(tbl[,col])] <-median(tbl[,col], na.rm=TRUE)
}
return(tbl)}
答案 2 :(得分:0)
以下应该工作:
df1
c d f e g
1 1 11 1 1 1
2 2 12 NA 0 NA
3 3 13 2 1 2
4 4 14 3 0 36
5 5 15 4 1 7
meds = sapply(df1, median, na.rm=T)
meds
c d f e g
3.0 13.0 2.5 1.0 4.5
for(i in 1:ncol(df1)) {
vect = df1[,i];
vect[is.na(vect)]=meds[i];
df1[,i] = vect
}
df1
c d f e g
1 1 11 1.0 1 1.0
2 2 12 2.5 0 4.5
3 3 13 2.0 1 2.0
4 4 14 3.0 0 36.0
5 5 15 4.0 1 7.0