并行时无法找到变量

时间:2016-06-11 17:40:24

标签: r parallel-processing

当我尝试这段R代码时。我有并行问题

ShowDialog

我有错误通知:

# include library
require(stats)
library(GMD)
library(parallel)
# include function
source('~/Workspaces/Projects/RProject/MovielensCluster/readData.R'); # contain readtext.convert() function
###
elbow.k <- function(mydata){
  ## determine a "good" k using elbow
  dist.obj <- dist(mydata);
  hclust.obj <- hclust(dist.obj);
  css.obj <- css.hclust(dist.obj,hclust.obj);
  elbow.obj <- elbow.batch(css.obj);
  #   print(elbow.obj)
  k <- elbow.obj$k
  return(k)
}

# include file
filePath <- "dataset/u.user";
data.convert <- readtext.convert(filePath);
data.clustering <- data.convert[,c(-1,-4)];
# find k value
no_cores <- detectCores();
cl<-makeCluster(no_cores);
clusterExport(cl, list("data.clustering", "data.original", "elbow.k", "clustering.kmeans"));
start.time <- Sys.time();
k.clusters <- parSapply(cl, 1, function(x) elbow.k(data.clustering));
end.time <- Sys.time();
cat('Time to find k using Elbow method is',(end.time - start.time),'seconds with k value:', k.clusters);

任何人都可以帮我解决吗?非常感谢。

1 个答案:

答案 0 :(得分:3)

我认为你的问题与“变量范围”有关。在Mac / Linux上,您可以选择使用 makeCluster(no_core,type =“FORK”),它自动包含所有环境变量。在Windows上,您必须使用并行套接字群集(PSOCK),该群集仅从加载的基础软件包开始。因此,您总是准确指定哪些变量以及您包含的并行函数库。 clusterExport() clusterEvalQ()是必需的,以便函数分别查看所需的变量和包。请注意,忽略 clusterExport 之后对变量的任何更改。回到你的问题。您必须使用以下内容:

clusterEvalQ(cl, library(GMD));

和您的完整代码:

# include library
require(stats)
library(GMD)
library(parallel)
# include function
source('~/Workspaces/Projects/RProject/MovielensCluster/readData.R'); # contain readtext.convert() function
###
elbow.k <- function(mydata){
  ## determine a "good" k using elbow
  dist.obj <- dist(mydata);
  hclust.obj <- hclust(dist.obj);
  css.obj <- css.hclust(dist.obj,hclust.obj);
  elbow.obj <- elbow.batch(css.obj);
  #   print(elbow.obj)
  k <- elbow.obj$k
  return(k)
}

# include file
filePath <- "dataset/u.user";
data.convert <- readtext.convert(filePath);
data.clustering <- data.convert[,c(-1,-4)];
# find k value
no_cores <- detectCores();
cl<-makeCluster(no_cores);
clusterEvalQ(cl, library(GMD));
clusterExport(cl, list("data.clustering", "data.original", "elbow.k", "clustering.kmeans"));
start.time <- Sys.time();
k.clusters <- parSapply(cl, 1, function(x) elbow.k(data.clustering));
end.time <- Sys.time();
cat('Time to find k using Elbow method is',(end.time - start.time),'seconds with k value:', k.clusters);