Question

我尝试加速创建热图，主要计划是使用hcluster包中的amap函数来并行化群集。

我在aheatmap文档中读到，我可以提供一个hclust - 对象，我在堆栈溢出的某处读取，层次聚类是主要的瓶颈。所以我想只计算一次聚类并将其提供给我的热图。

现在我收到以下错误：

cluster_mat中的错误（mat，Rowv，distfun = distfun，hclustfun = hclustfun，： aheatmap - 无效的群集功能：必须是字符串或函数

pg_h <- matrix(rnorm(10000),ncol = 10)

d <- dist(pg_h)
h <- hclust(d)
aheatmap(pg_h, 
         Colv=NA,
         scale='row',
         distfun=d, 
         hclustfun=h)

任何人都可以帮我这个，或者有不同的方法来创建我的热图。我有大约8000行和15列，这需要一个多小时。我只想对行进行聚类。

Answer 1

至少对于错误，我认为在这种情况下你必须传递从hclust类对象（h）获得的方法，如下所示：

aheatmap(pg_h, 
     Colv=NA,
     scale='row',
     distfun=d, 
     hclustfun=h$method)

对于以下内容：

您可以在相关对象的method中看到str：

> str(h)
#List of 7
#$ merge      : int [1:999, 1:2] -778 -321 -191 -549 -133 -176 -94 -514 -653 -359 ...
#$ height     : num [1:999] 0.914 0.927 0.934 0.951 0.963 ...
#$ order      : int [1:1000] 74 910 12 864 979 849 218 361 478 974 ...
#$ labels     : NULL
#$ method     : chr "complete"
#$ call       : language hclust(d = d)
#$ dist.method: chr "euclidean"
#- attr(*, "class")= chr "hclust"

至于问题的第一部分（加速，如果这是你所指的），通过hclust对象传递方法似乎产生输出快点。以下是一些基准测试：

system.time(aheatmap(pg_h, Colv=NA, scale='row', distfun=d, hclustfun=h$method))
#   user  system elapsed 
#   3.31    0.36    3.87 
system.time(aheatmap(pg_h, Colv=NA, scale='row', distfun=d, hclustfun="complete"))
#   user  system elapsed 
#   3.51    0.65    4.35

请注意，hclustfun（“complete”）的方法在两种情况下都是相同的。

提供hclust对象时，加快aheatmap（NMF包）中的热图和错误

1 个答案: