在并行计算中使用rollapplyr

时间:2017-09-08 14:38:58

标签: r

我需要为每列进行36个月的滚动回归,并获得每个回归的截距。大约有100行和9000列。原始代码需要10个小时才能在笔记本电脑上运行。我想使用并行计算来减少运行时间,但它会返回错误。以下是我的代码

library(parallel)
library(zoo)

    no_cores <- detectCores()-1
    c1 <- makeCluster(no_cores)

    z <- read.zoo(df, FUN = as.yearmon, format = "%m/%d/%Y")
        getCoef <- function(z, lhs, rhs){
          if(all(is.na(z[,lhs]))) NA
          else coef(lm(paste(lhs, "~", rhs), z))["(Intercept)"] 
        }

        roll <- function(z, lhs, rhs = "A + B + C + D + E") {
          rollapplyr(z, 36, getCoef, by.column = FALSE, coredata = FALSE, lhs = lhs, rhs = rhs)
        }

        ynames <- colnames(df)[2:8785]
        tm1 <- system.time(
          L_rr <- ParLapply(c1, ynames, roll, z = z)
          )

我得到的错误是:

Error in checkForRemoteErrors(val) : 
  3 nodes produced errors; first error: could not find function "rollapplyr"

换句话说,我认为类似于ParLapply在并行计算设置中类似于lapply,有一个函数等效rollapplyr。我不知道它是什么。感谢。

1 个答案:

答案 0 :(得分:1)

试试这个

library(parallel)
library(zoo)

no_cores <- detectCores()-1
c1 <- makeCluster(no_cores)

clusterEvalQ(c1, {
    library(zoo)
    z <- read.zoo(df, FUN = as.yearmon, format = "%m/%d/%Y")
    getCoef <- function(z, lhs, rhs){
                if(all(is.na(z[,lhs]))) NA
                else coef(lm(paste(lhs, "~", rhs), z))["(Intercept)"] 
    }

    roll <- function(z, lhs, rhs = "A + B + C + D + E") {
                rollapplyr(z, 36, getCoef, by.column = FALSE, coredata = FALSE, lhs = lhs, rhs = rhs)
    }

    ynames <- colnames(df)[2:8785]
})

tm1 <- system.time(
            L_rr <- ParLapply(c1, ynames, roll, z = z)
        )
stopCluster(c1)

您需要像在父环境中一样设置并行工作程序。我是通过在clusterEvalQ(cl, {..}中嵌入您的函数,库和数据来实现的。确保在完成后终止群集。

如果它不起作用,请告诉我