我需要为每列进行36个月的滚动回归,并获得每个回归的截距。大约有100行和9000列。原始代码需要10个小时才能在笔记本电脑上运行。我想使用并行计算来减少运行时间,但它会返回错误。以下是我的代码
library(parallel)
library(zoo)
no_cores <- detectCores()-1
c1 <- makeCluster(no_cores)
z <- read.zoo(df, FUN = as.yearmon, format = "%m/%d/%Y")
getCoef <- function(z, lhs, rhs){
if(all(is.na(z[,lhs]))) NA
else coef(lm(paste(lhs, "~", rhs), z))["(Intercept)"]
}
roll <- function(z, lhs, rhs = "A + B + C + D + E") {
rollapplyr(z, 36, getCoef, by.column = FALSE, coredata = FALSE, lhs = lhs, rhs = rhs)
}
ynames <- colnames(df)[2:8785]
tm1 <- system.time(
L_rr <- ParLapply(c1, ynames, roll, z = z)
)
我得到的错误是:
Error in checkForRemoteErrors(val) :
3 nodes produced errors; first error: could not find function "rollapplyr"
换句话说,我认为类似于ParLapply在并行计算设置中类似于lapply,有一个函数等效rollapplyr。我不知道它是什么。感谢。
答案 0 :(得分:1)
试试这个
library(parallel)
library(zoo)
no_cores <- detectCores()-1
c1 <- makeCluster(no_cores)
clusterEvalQ(c1, {
library(zoo)
z <- read.zoo(df, FUN = as.yearmon, format = "%m/%d/%Y")
getCoef <- function(z, lhs, rhs){
if(all(is.na(z[,lhs]))) NA
else coef(lm(paste(lhs, "~", rhs), z))["(Intercept)"]
}
roll <- function(z, lhs, rhs = "A + B + C + D + E") {
rollapplyr(z, 36, getCoef, by.column = FALSE, coredata = FALSE, lhs = lhs, rhs = rhs)
}
ynames <- colnames(df)[2:8785]
})
tm1 <- system.time(
L_rr <- ParLapply(c1, ynames, roll, z = z)
)
stopCluster(c1)
您需要像在父环境中一样设置并行工作程序。我是通过在clusterEvalQ(cl, {..}
中嵌入您的函数,库和数据来实现的。确保在完成后终止群集。
如果它不起作用,请告诉我