Question

我在下面创建了以下函数，在data.frame的新列中将实际值与预测值（当真实不存在时）合并，该函数实际上有效，但我想优化它，因为使用数据集我工作，这个功能需要大约两个小时才能运行..如果有人能帮助我，我将不胜感激。

p <-            
  function(object, newdata = NULL, type = c("link", "response", "terms"), 
           rse.fit = FALSE, dispersion = NULL, terms = NULL,
           na.action = na.pass, ...)
  { 
{
    pred <- predict (object,newdata)    

      }

    vetor1 <- (newdata$ALT)         # Creates a column vector from the actual heights of the data.frame
    vetor1[is.na(vetor1)] <- 0      # Replaces the NA's present in the vector created above the numeric value 0
    vetor2 <- c(pred)           # Creates a vector from the predicted data
    for(i in 1:length(vetor1)){     # The loop is executed until all values vector1 pass the following condition
      if(vetor1[i]==0.00){      # If a value of the first vector has the value 0, ie, if it is absent
        vetor1[i]=vetor2[i]     # Then the predicted value will replace the missing value
        newdata$ALTMISTA <- vetor1  # The vector1, already possessing the actual values and the predicted values merged into the same vector goes                   on to become a new column in data.frame, this column is called a ALTMISTA
      }
    }
    return (newdata)            
  }

Answer 1

一些想法：如果你有一个gigundo数据集，那么，这需要时间;或者您需要学习使用parallel包。

我不认为你想在每次循环时重新定义newdata$ALTMISTA，因为你只是覆盖了这些值。

您可以使用i的矢量化操作删除ifelse循环：

 set.seed(1)
 foo<-sample(c(-1,1),10,rep=T)
 foo
 [1] -1 -1  1  1 -1  1  1  1  1 -1
 bar<-11:20
 foo<- ifelse(foo<0, foo,bar)
 foo
 [1] -1 -1 13 14 -1 16 17 18 19 -1

但正如我所说，我怀疑你有一个庞大的数据集，predict可能是时间猪。尝试使用Rprof找出花费的时间。

优化功能的性能

1 个答案: