与R

时间:2018-06-25 20:23:48

标签: r parallel-processing cluster-analysis snow

当我尝试将计算与R并行时,我遇到了性能问题。我有一个8列(条件)和12000行(基因)的矩阵,我想确定代表该矩阵的最佳簇数基因表达(将具有相同表达模式的基因分组)。为此,我在这个Tutorial之后加上clustGap函数和围绕medoid的分区。

由于计算量很长,并且由于我可以访问计算量群集,因此我打算对其进行并行化。

我想使用snow软件包,并评估速度,我提取了sub-matrix,然后在计算机上进行了首次测试。

library(snow)
cl<-makeCluster(8) 
clusterEvalQ(cl, library(cluster))
clusterExport(cl,"df")
T1<-Sys.time()
results <-clusterCall(cl,function(x) clusGap(df, FUN = pam, K.max = 20, B= 500,verbose=TRUE))
T2<-Sys.time() 
difftime(T2, T1) 
  

时差14.59781秒

T3<-Sys.time()
clusGap(df, FUN = pam, K.max = 20, B = 500,verbose=TRUE)
T4<-Sys.time()
difftime(T4, T3) 
  

时差8.251367秒

所以我有点受测试困扰,因为似乎1核计算比8核更有效:o

有人知道我对这种计算方法怀念吗?

非常感谢

sessionInfo()

> R version 3.3.3 (2017-03-06)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: macOS  10.13.4

locale:
[1] fr_FR.UTF-8/fr_FR.UTF-8/fr_FR.UTF-8/C/fr_FR.UTF-8/fr_FR.UTF-8

attached base packages:
 [1] splines   stats4    parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] snow_0.4-2          cluster_2.0.6       DDRTree_0.1.5       irlba_2.3.1         VGAM_1.0-3          ggplot2_2.2.1      
[7] Biobase_2.34.0      BiocGenerics_0.20.0 Matrix_1.2-12      

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.17           lattice_0.20-35        tidyr_0.8.1            GO.db_3.4.0            assertthat_0.2.0      
 [6] digest_0.6.15          slam_0.1-40            R6_2.2.2               plyr_1.8.4             RSQLite_2.1.1         
[11] pillar_1.2.3           rlang_0.2.1            lazyeval_0.2.1         data.table_1.11.4      blob_1.1.1            
[16] S4Vectors_0.12.2       combinat_0.0-8         qvalue_2.6.0           BiocParallel_1.8.2     stringr_1.3.1         
[21] igraph_1.1.2           pheatmap_1.0.10        bit_1.1-14             munsell_0.5.0          fgsea_1.0.2           
[26] pkgconfig_2.0.1        tidyselect_0.2.4       tibble_1.4.2           gridExtra_2.3          matrixStats_0.53.1    
[31] IRanges_2.8.2          dplyr_0.7.5            grid_3.3.3             gtable_0.2.0           DBI_1.0.0             
[36] magrittr_1.5           scales_0.5.0           stringi_1.2.3          GOSemSim_2.0.4         reshape2_1.4.3        
[41] bindrcpp_0.2.2         limma_3.30.13          DO.db_2.9              clusterProfiler_3.2.14 fastmatch_1.1-0       
[46] fastICA_1.2-1          RColorBrewer_1.1-2     tools_3.3.3            bit64_0.9-7            glue_1.2.0            
[51] purrr_0.2.5            HSMMSingleCell_0.108.0 AnnotationDbi_1.36.2   colorspace_1.3-2       DOSE_3.0.10           
[56] memoise_1.1.0          bindr_0.1.1        

0 个答案:

没有答案