功能由列al NAs按其中位数来估算

时间:2014-09-30 18:02:37

标签: r function loops

您好我写了一个函数,用每个列来计算NAs的中位数:

df1<-data.frame(c=(1:5), d=(11:15), f=c(1,NA, 2:4), e=c(1,0,1,0,1), g=c(1,NA,2,36,7))

reemp<-function (tbl) {
  var_incom<-colnames(tbl)[ !complete.cases(t(tbl))]
  for (col in var_incom) {
    tbl$col[is.na(tbl$col)] <-median(tbl$col, na.rm=TRUE)}
  return(tbl)}


reemp(df1)

但是我得到了警告信息而没有结果:

Warning messages:
1: In is.na(x) : is.na() applied to non-(list or vector) of type 'NULL'
2: In is.na(tbl$col) :
  is.na() applied to non-(list or vector) of type 'NULL'
3: In is.na(x) : is.na() applied to non-(list or vector) of type 'NULL'
4: In is.na(tbl$col) :
  is.na() applied to non-(list or vector) of type 'NULL'

3 个答案:

答案 0 :(得分:1)

尝试:

df1[] <- lapply(df1, function(x) replace(x, is.na(x), median(x, na.rm=TRUE)))

如果您有很多列,那么只对至少有一个NA

的列进行处理可能会有效
nm1 <- names(df1)[unlist(lapply(df1, anyNA))]
#or nm1 <- names(df1)[colSums(is.na(df1))>0]

df1[nm1] <- lapply(df1[nm1], function(x) replace(x, is.na(x), median(x,na.rm=TRUE)))

library(matrixStats)
 df1[is.na(df1)] <- colMedians(as.matrix(df1), 
                 na.rm=TRUE)[which(is.na(df1), arr.ind=TRUE)[,2]]

答案 1 :(得分:1)

我用tbl [,col]替换了tbl $ col并且工作了。

reemp<-function (tbl) {
  x <- data.frame(x=1)
  var_incom<-colnames(tbl)[ !complete.cases(t(tbl))]
  for (col in var_incom) {
    tbl[,col][is.na(tbl[,col])] <-median(tbl[,col], na.rm=TRUE)
  }
  return(tbl)}

答案 2 :(得分:0)

以下应该工作:

df1
  c  d  f e  g
1 1 11  1 1  1
2 2 12 NA 0 NA
3 3 13  2 1  2
4 4 14  3 0 36
5 5 15  4 1  7

meds = sapply(df1, median, na.rm=T)
meds
   c    d    f    e    g 
 3.0 13.0  2.5  1.0  4.5 

for(i in 1:ncol(df1))   {
    vect = df1[,i];
    vect[is.na(vect)]=meds[i];
    df1[,i] = vect
}
df1
  c  d   f e    g
1 1 11 1.0 1  1.0
2 2 12 2.5 0  4.5
3 3 13 2.0 1  2.0
4 4 14 3.0 0 36.0
5 5 15 4.0 1  7.0