Question

我有这样的树形图热像图。

完整数据为here。

问题是左边的树状图被压扁了。如何在不改变热图的列大小的情况下取消（展开）它？

enter image description here

使用以下代码生成：

#!/usr/bin/Rscript
library(gplots);
library(RColorBrewer);


plot_hclust  <- function(inputfile,clust.height,type.order=c(),row.margins=70) {

    # Read data
    dat.bcd <- read.table(inputfile,na.strings=NA, sep="\t",header=TRUE);


    rownames(dat.bcd) <- do.call(paste,c(dat.bcd[c("Probes","Gene.symbol")],sep=" "))
    dat.bcd <- dat.bcd[,!names(dat.bcd) %in% c("Probes","Gene.symbol")] 
    dat.bcd <- dat.bcd

    # Clustering and distance function
    hclustfunc <- function(x) hclust(x, method="complete")
    distfunc <- function(x) dist(x,method="maximum")


    # Select based on FC, as long as any of them >= anylim

    anylim <- 2.0
    dat.bcd <- dat.bcd[ apply(dat.bcd, 1,function(x) any (x >= anylim)), ]


    # Clustering functions
    height <- clust.height; 

    # Define output file name
    heatout <- paste("tmp.pafc.heat.",anylim,".h",height,".pdf",sep="");


    # Compute distance and clusteirn function
    d.bcd <- distfunc(dat.bcd)
    fit.bcd <- hclustfunc(d.bcd)


    # Cluster by height
    #cutree and rect.huclust has to be used in tandem
    clusters <- cutree(fit.bcd, h=height) 
    nofclust.height <-  length(unique(as.vector(clusters)));

    myorder <- colnames(dat.bcd); 
    if (length(type.order)>0) {
     myorder <- type.order
    }

    # Define colors
    #hmcols <- rev(brewer.pal(11,"Spectral"));
    hmcols <- rev(redgreen(2750));
    selcol <- colorRampPalette(brewer.pal(12,"Set3"))
    selcol2 <- colorRampPalette(brewer.pal(9,"Set1"))
    sdcol= selcol(5);
    clustcol.height = selcol2(nofclust.height);

    # Plot heatmap
    pdf(file=heatout,width=20,height=50); # for FC.lim >=2
    heatmap.2(as.matrix(dat.bcd[,myorder]),Colv=FALSE,density.info="none",lhei=c(0.1,4),dendrogram="row",scale="row",RowSideColors=clustcol.height[clusters],col=hmcols,trace="none", margin=c(30,row.margins), hclust=hclustfunc,distfun=distfunc,lwid=c(1.5,2.0),keysize=0.3);
    dev.off();


}
#--------------------------------------------------
# ENd of functions 
#-------------------------------------------------- 

plot_hclust("http://pastebin.com/raw.php?i=ZaGkPTGm",clust.height=3,row.margins=70);

Answer 1

在您的情况下，数据具有长尾，这对于基因表达数据（对数正态）是预期的。

data <- read.table(file='http://pastebin.com/raw.php?i=ZaGkPTGm', 
                   header=TRUE, row.names=1)

mat <- as.matrix(data[,-1]) # -1 removes the first column containing gene symbols

从分位数分布可以看出，具有最高表达的基因的范围从1.5扩展到300以上。

quantile(mat)

#     0%     25%     50%     75%    100% 
#  0.000   0.769   1.079   1.544 346.230

当对未缩放的数据执行层次聚类时，生成的树形图可能会显示具有最高表达式的值的偏差，如示例中所示。这在许多（reference）中都应该是对数或z分数变换。您的数据集包含values == 0，这是日志转换的问题，因为log(0)未定义。

Z分数转换（reference）在heatmap.2内实现，但重要的是要注意该函数计算距离矩阵并在缩放数据之前运行聚类算法。因此，选项scale='row'不会影响群集结果，请参阅我之前的帖子（differences in heatmap/clustering defaults in R）以获取更多详细信息。

我建议您在运行heatmap.2之前扩展数据：

# scale function transforms columns by default hence the need for transposition. z <- t(scale(t(mat))) quantile(z) # 0% 25% 50% 75% 100% # -2.1843994 -0.6646909 -0.2239677 0.3440102 2.2640027 # set custom distance and clustering functions hclustfunc <- function(x) hclust(x, method="complete") distfunc <- function(x) dist(x,method="maximum") # obtain the clusters fit <- hclustfunc(distfunc(z)) clusters <- cutree(fit, 5) # require(gplots) pdf(file='heatmap.pdf', height=50, width=10) heatmap.2(z, trace='none', dendrogram='row', Colv=F, scale='none', hclust=hclustfunc, distfun=distfunc, col=greenred(256), symbreak=T, margins=c(10,20), keysize=0.5, labRow=data$Gene.symbol, lwid=c(1,0.05,1), lhei=c(0.03,1), lmat=rbind(c(5,0,4),c(3,1,2)), RowSideColors=as.character(clusters)) dev.off()

另外，请参阅其他帖子here和here，其中介绍了如何通过lmat，lwid和lhei参数设置热图的布局

生成的热图如下所示（省略了行标签和列标签）：

Answer 2

据我所知，您的数据集中可能有一些异常值（最底层的对象）。请尝试以下方法：

从数据集中删除异常值
记录您的距离，以减少对极端值的重视

Answer 3

使用'scale'参数，使用'fheatmap'包很容易进行Zscore变换。查看'fheatmap'包。可以通过增加画布的使用来扩展树形图的高度（pdf）。 http://cran.r-project.org/web/packages/fheatmap/index.html

如何在热图中扩展树形图

3 个答案: