我试图使用雪包编写R代码。 我有功能
IMP< -function(X,Y)
我如何在clusterApply中使用此功能?
cl <- makeCluster(c("localhost","localhost"), type = "SOCK")
clusterApply(cl, 1:6, get("+"), 3)
stopCluster(cl)
而不是这个我想使用我的功能
cl <- makeCluster(c("localhost","localhost"), type = "SOCK")
clusterApply(cl, imp(dataset,3), 3)
stopCluster(cl)
假设这是我的功能,我怎么能用这个函数在并行和分布式系统中运行..
impap<-function(x,y)
{
data<-as(x,"matrix")
t<-data+y
print(t)
}
答案 0 :(得分:1)
我倾向于喜欢并行和分布式计算的降雪。这是一个通用代码,在两种情况下都会很好地并行化并进行微小修改,并且还会为每个实例输出日志文件,以便更好地进行和错误跟踪。
rm(list = ls()) #remove all past worksheet variables
n_projs=5 #this is the number of iterations. Each of them gets sent to an available CPU core
proj_name_root="model_run_test"
proj_names=paste0(proj_name_root,"__",c(1:n_projs))
#FUNCTION TO RUN
project_exec=function(proj_name){
cat('starting with', proj_name, '\n')
##ADD CODE HERE
cat('done with ', proj_name, '\n')
}
require(snowfall)
# Init Snowfall with settings from sfCluster
cpucores=as.integer(Sys.getenv('NUMBER_OF_PROCESSORS'))
#TWO WAYS TO RUN (CLUSTER OR SINGLE MACHINE)
#hosts=c(commandArgs(TRUE)) #list of strings with computer names in cluster
sfInit(socketHosts=hosts, parallel=T, cpus=cpucores, type="SOCK", slaveOutfile="/home/cluster_user/output.log")
##BELOW IS THE CODE IF YOU ARE RUNNING PARALLEL IN THE SAME MACHINE (MULTI CORE)
#sfInit(parallel=T, cpus=cpucores) #This is where you would need to configure snowfall to create a cluster with the AWS instances
#sfLibrary(sp) ##import libraries used in your function here into your snowfall instances
sfExportAll()
all_reps=sfLapply(proj_names,fun=project_exec)
sfRemoveAll()
sfStop()