Question

我试图使用雪包编写R代码。我有功能

IMP＆LT; -function（X，Y）

我如何在clusterApply中使用此功能？

cl <- makeCluster(c("localhost","localhost"), type = "SOCK")
clusterApply(cl, 1:6, get("+"), 3)
stopCluster(cl)

而不是这个我想使用我的功能

cl <- makeCluster(c("localhost","localhost"), type = "SOCK")
clusterApply(cl, imp(dataset,3), 3)
stopCluster(cl)

假设这是我的功能，我怎么能用这个函数在并行和分布式系统中运行..

impap<-function(x,y)
{
data<-as(x,"matrix")

t<-data+y

print(t)
}

Answer 1

我倾向于喜欢并行和分布式计算的降雪。这是一个通用代码，在两种情况下都会很好地并行化并进行微小修改，并且还会为每个实例输出日志文件，以便更好地进行和错误跟踪。

rm(list = ls()) #remove all past worksheet variables
n_projs=5 #this is the number of iterations. Each of them gets sent to an available CPU core
proj_name_root="model_run_test"
proj_names=paste0(proj_name_root,"__",c(1:n_projs))

#FUNCTION TO RUN
project_exec=function(proj_name){
  cat('starting with', proj_name, '\n')
  ##ADD CODE HERE
  cat('done with ', proj_name, '\n')
}

require(snowfall)
# Init Snowfall with settings from sfCluster
cpucores=as.integer(Sys.getenv('NUMBER_OF_PROCESSORS'))

#TWO WAYS TO RUN (CLUSTER OR SINGLE MACHINE)
#hosts=c(commandArgs(TRUE)) #list of strings with computer names in cluster
sfInit(socketHosts=hosts, parallel=T, cpus=cpucores, type="SOCK", slaveOutfile="/home/cluster_user/output.log")

##BELOW IS THE CODE IF YOU ARE RUNNING PARALLEL IN THE SAME MACHINE (MULTI CORE)
#sfInit(parallel=T, cpus=cpucores) #This is where you would need to configure snowfall to create a cluster with the AWS instances 

#sfLibrary(sp) ##import libraries used in your function here into your snowfall instances
sfExportAll()
all_reps=sfLapply(proj_names,fun=project_exec)
sfRemoveAll()
sfStop()

在雪包中使用clusterApply？

1 个答案: