我确定必须在某处回答,所以;如果你有一个指向答案的指针,请告诉我...; o)
我有许多相当大的处理任务(主要是多标签文本分类器),它们读取大量文件,用它做的东西,输出结果然后移到下一个。
我整齐地按顺序工作但想要并行化。
通过一个非常基本的例子......
require(plyr)
fileDir <- "/Users/barneyc/sourceFiles"
outputDir <- "/Users/barneyc/outputFiles"
files <- as.list(list.files(full.names=TRUE,recursive=FALSE,pattern=".csv"))
l_ply(files, function(x){
print(x)
#change to dir containing source files
setwd(fileDir)
# read file
content <- read.csv(file=x,header=TRUE)
# change directory to output
setwd(outputDir)
# append the itemID from CSV file to
write.table(content$itemID,file="ids.csv", append = TRUE, sep=",", row.names=FALSE,col.names=TRUE)
}, .parallel=FALSE )
将遍历目录fileDir
中的所有文件,打开每个CSV,从文件中提取值并将其附加到目录outputDir
中保存的输出CSV。一个基本的例子,但运行得很好,以说明问题。
要并行运行此操作会产生一个问题,因为目录变量(fileDir
&amp; outputDir
)基本上不为匿名function (x)
所知,ala ...
require(plyr)
require(doParallel)
fileDir <- "/Users/barneyc/sourceFiles"
outputDir <- "/Users/barneyc/outputFiles"
files <- as.list(list.files(full.names=TRUE,recursive=FALSE,pattern=".csv"))
cl<-makeCluster(4) # make a cluster of available cores
registerDoParallel(cl) # raise cluster
l_ply(files, function(x){
print(x)
#change to dir containing source files
#setwd(fileDir)
# read file
content <- read.csv(file=x,header=TRUE)
# change directory to output
setwd(y)
# append the itemID from CSV file to
write.table(content$itemID,file="ids.csv", append = TRUE, sep=",", row.names=FALSE,col.names=TRUE)
}, .parallel=TRUE )
stopCluster() # kill the cluster
有人能说明我如何将这两个目录变量传递给函数吗?
答案 0 :(得分:0)
感谢@Roland我的并行功能现在......
require(plyr)
require(doParallel)
fileDir <- "/Users/barneyc/sourceFiles"
outputDir <- "/Users/barneyc/outputFiles"
files <- as.list(list.files(full.names=TRUE,recursive=FALSE,pattern=".csv"))
cl<-makeCluster(4) # make a cluster of available cores
registerDoParallel(cl) # raise cluster
l_ply(files, function(x,y,z){
filename <- x
fileDir <- y
outputDir <- z
#change to dir containing source files
setwd(fileDir)
# read file
content <- read.csv(file=filename,header=TRUE)
# change directory to output
setwd(outputDir)
# append the itemID from CSV file to
write.table(content$itemID,file="ids.csv", append = TRUE, sep=",", row.names=FALSE,col.names=TRUE)
}, y=fileDir, z=outputDir, .parallel=TRUE )
stopCluster() # kill the cluster