我在下面创建了以下函数,在data.frame的新列中将实际值与预测值(当真实不存在时)合并,该函数实际上有效,但我想优化它,因为使用数据集我工作,这个功能需要大约两个小时才能运行..如果有人能帮助我,我将不胜感激。
p <-
function(object, newdata = NULL, type = c("link", "response", "terms"),
rse.fit = FALSE, dispersion = NULL, terms = NULL,
na.action = na.pass, ...)
{
{
pred <- predict (object,newdata)
}
vetor1 <- (newdata$ALT) # Creates a column vector from the actual heights of the data.frame
vetor1[is.na(vetor1)] <- 0 # Replaces the NA's present in the vector created above the numeric value 0
vetor2 <- c(pred) # Creates a vector from the predicted data
for(i in 1:length(vetor1)){ # The loop is executed until all values vector1 pass the following condition
if(vetor1[i]==0.00){ # If a value of the first vector has the value 0, ie, if it is absent
vetor1[i]=vetor2[i] # Then the predicted value will replace the missing value
newdata$ALTMISTA <- vetor1 # The vector1, already possessing the actual values and the predicted values merged into the same vector goes on to become a new column in data.frame, this column is called a ALTMISTA
}
}
return (newdata)
}
答案 0 :(得分:1)
一些想法:如果你有一个gigundo数据集,那么,这需要时间;或者您需要学习使用parallel
包。
我不认为你想在每次循环时重新定义newdata$ALTMISTA
,因为你只是覆盖了这些值。
您可以使用i
的矢量化操作删除ifelse
循环:
set.seed(1)
foo<-sample(c(-1,1),10,rep=T)
foo
[1] -1 -1 1 1 -1 1 1 1 1 -1
bar<-11:20
foo<- ifelse(foo<0, foo,bar)
foo
[1] -1 -1 13 14 -1 16 17 18 19 -1
但正如我所说,我怀疑你有一个庞大的数据集,predict
可能是时间猪。尝试使用Rprof
找出花费的时间。