首先,感谢您抽出宝贵时间阅读我的问题。 最重要的是,我需要概念上的帮助,因为我不理解自己的解释出了什么问题。前一段时间,我尝试重构一些我使用的算法,以便它们以并行方式工作并利用我拥有的所有CPU(大约40个,并且我的进程始终一次使用一个)。 在查找示例和文献时,我发现可能最能为我服务的软件包是“ doParallel”,我正在阅读以下内容: https://cran.r-project.org/web/packages/doParallel/vignettes/gettingstartedParallel.pdf run a for loop in parallel in R 但是,当我在代码中实现它时,它比以前消耗了更多时间。为了查看问题出在哪里,我简化了代码,并将其限制在一个简单的任务上,该代码表明使用doParallel所花的时间比我经常使用的通用循环所花的时间更长。在这里,我分享了我评估过的代码以及它提供的输出,您可以在其中看到花费更多时间的东西:
library(doParallel)
proteins_names <- c("TCSYLVIO_005590","TcCLB.503947.20","TcCLB.504249.111","TcCLB.511081.60","TCSYLVIO_009736","TcCLB.507071.100","TcCLB.507801.60","TcCLB.509103.10","TCSYLVIO_003504","TcCLB.503645.40","TcCLB.508221.490","TCSYLVIO_005223","TcCLB.505949.10","TcCLB.505949.120","TcCLB.506459.219","TcCLB.506763.340","TcCLB.506767.360","TcCLB.506955.250","TcCLB.506965.190","TcCLB.506965.90")
merged_total_test<-data.frame(matrix(nrow =100,ncol = 22, rnorm(n = 2200,sd = 2,mean=10)))
merged_total_test$protein<-proteins_names[sample(1:20,100,replace = T)]
merged_total_test$signal<-rnorm(n = 100,sd = 2,mean=1000)
cores=detectCores()
cl <- makeCluster(cores[1]-4)
registerDoParallel(cl)
init_time_parallel<-Sys.time()
dt_plot_total_parallel <- foreach (prot = 1:20, .combine=rbind) %dopar% {
temp_protein_c <- merged_total_test[merged_total_test$protein == proteins_names[prot]&!is.na(merged_total_test$signal),]
temp_protein_c
}
final_time_parallel<-Sys.time()
total_time_parallel<-final_time_parallel - init_time_parallel
stopCluster(cl)
init_time<-Sys.time()
dt_plot_total <- merged_total_test[0,]
for (prot in 1:20){
print(prot)
temp_protein_c <- merged_total_test[merged_total_test$protein == proteins_names[prot]&!is.na(merged_total_test$signal),]
dt_plot_total<-rbind(dt_plot_total,temp_protein_c)
}
final_time<-Sys.time()
total_time<-final_time - init_time
total_time
total_time_parallel
identical(dt_plot_total,dt_plot_total_parallel)#should be true
输出:
> total_time
Time difference of 0.3065186 secs
> total_time_parallel
Time difference of 1.939842 secs
> identical(dt_plot_total,dt_plot_total_parallel)#should be true
[1] TRUE