加速R中双重循环函数的优化

时间:2014-07-22 13:08:52

标签: r performance for-loop vectorization

我正努力加快双重循环功能的优化。我已经看过this post和其他人,但无法成功应用Marek提出的矢量化。欢迎任何改进和加快我的代码的建议/更正/建议。

该功能的目的是确定将气象数据(TTnl:生长度日数,dDLv:日长度的日变化)与作物产量(PS_TN)相关联的关键生理参数(TT_TIn,b,g)。

我必须为许多品种做这个,每个品种都有一个大的数据集。 这是数据生成和函数定义的代码。

# Data generation
lev_moy<-runif(300,80,180) # emergence date
rec<-runif(300,190,280)    # harvest date
Site<-rep(c("Cot", "Gla"),150)
Year<-rep(c("2007", "2008", "2009"),100)
meteo<-data.frame("Year"=rep(c(rep(2007, 365), rep(2008, 365),rep(2009, 365)),2),
                  "Site"=c(rep("Cot",365*3), rep("Gla",365*3)),
                  "dDLv"=rep(c(seq(3,23,0.125),seq(23,-2.25,-0.125),-1.75),6),
                  "TTnl"=runif(6*365,11.5,14.5))
PS_TN_pl<-jitter(0.00087*900*((rec*13)-900), factor = 2) # yield

# Function
rmse <- function(x) {
  TT_TIn<-x[1] # minimum growing degree day to finish the vegetative phase
  b<-x[2]      # growth parameter
  g<-x[3]      # critical daily variation in daylength 
               # (allowing the end of vegatative phase)
  TT_TI<<-numeric(length(lev_moy))
  PS_TN_est<<-numeric(length(lev_moy))
  for (i in 1:length(Site)) { 
    # for each plant (row) I select the meteo data corresponding to 
    # the specific year and site
    TTv<-c(subset(meteo, Year==Year[i]&Site==Site[i])$TTnl)
    dDLv<-c(subset(meteo, Year==Year[i]&Site==Site[i])$dDLv)
    for (j in lev_moy[i]:length(dDLv)) {
      # for each plant (row) I sum the temperature (TTnl) corresponding to the
      # vegetative phase. This period is detremined by a minimum growing degree 
      # day (TT_TIn) that I'll like to estimate with the optim function      
      if (sum(TTv[lev_moy[i]:j])>TT_TIn & dDLv[j]<g ) {
        TT_TI[i]<-ifelse(j>rec[i],sum(TTv[lev_moy[i]:rec[i]]),
                         sum(TTv[lev_moy[i]:j]))
        break      }    }  }
  PS_TN_est<-b*TT_TI*(rec-TT_TI)
  error<-PS_TN_est-PS_TN_pl
  rmse<-sqrt(mean(error^2))
  return(rmse)
}

# Optimisation
optFl<-optim(c(915,0.00057,1,0.05),rmse)

0 个答案:

没有答案