使用相同方法的不同群集

时间:2017-04-12 12:50:55

标签: cluster-analysis hierarchical-clustering dendrogram pheatmap

我遇到了层次聚类的问题。我想制作一个树形图和热图,用距离相关方法(d_mydata = dist(1-cor(t(mydata)))和ward.D2作为聚类方法。

作为包pheatmap中的小工具,您可以在左侧绘制树形图以显示簇。

我的分析的管道是这样的:

  1. 创建树形图
  2. 测试有多少群集是最佳的(k)
  3. 提取每个群组中的主题
  4. 制作热图
  5. 当热图中绘制的树形图与之前绘制的树形图不同时,即使方法相同,我也会感到惊讶。

    所以我决定用cutree之前分类的簇创建一个pheatmap着色,并测试颜色是否与树形图中的簇相对应。

    这是我的代码:

    # Create test matrix
    test = matrix(rnorm(200), 20, 10)
    test[1:10, seq(1, 10, 2)] = test[1:10, seq(1, 10, 2)] + 3
    test[11:20, seq(2, 10, 2)] = test[11:20, seq(2, 10, 2)] + 2
    test[15:20, seq(2, 10, 2)] = test[15:20, seq(2, 10, 2)] + 4
    colnames(test) = paste("Test", 1:10, sep = "")
    rownames(test) = paste("Gene", 1:20, sep = "")
    test<-as.data.frame(test)
    
    # Create a dendrogram with this test matrix
    dist_test<-dist(test)
    hc=hclust(dist_test, method="ward.D2")
    
    plot(hc)
    
    dend<-as.dendrogram(hc, check=F, nodePar=list(cex = .000007),leaflab="none", cex.main=3, axes=F, adjust=F)
    
    
    clus2 <- as.factor(cutree(hc, k=2)) # cut tree into 2 clusters
    groups<-data.frame(clus2)
    groups$id<-rownames(groups)
    
    
    #-----------DATAFRAME WITH mydata AND THE CLASSIFICATION OF CLUSTERS AS FACTORS---------------------
    test$id<-rownames(test)
    clusters<-merge(groups, test, by.x="id")
    rownames(clusters)<-clusters$id
    
    clusters$clus2<-as.character(clusters$clus2)
    clusters$clus2[clusters$clus2== "1"]= "cluster1"
    clusters$clus2[clusters$clus2=="2"]<-"cluster2"
    
    
    plot(dend, 
     main = "test", 
     horiz =  TRUE, leaflab = "none")
    
    
    
    d_clusters<-dist(1-cor(t(clusters[,7:10])))
    hc_cl=hclust(d_clusters, method="ward.D2")
    
    
    
    annotation_col = data.frame( 
      Path = factor(colnames(clusters[3:12]))
    )
      rownames(annotation_col) = colnames(clusters[3:12])
    
    
    
    annotation_row = data.frame(
        Group = factor(clusters$clus2)
    )
    rownames(annotation_row) = rownames(clusters)
    
    # Specify colors
    ann_colors = list(
      Path= c(Test1="darkseagreen", Test2="lavenderblush2", Test3="lightcyan3", Test4="mediumpurple", Test5="red", Test6="blue", Test7="brown", Test8="pink", Test9="black", Test10="grey"), 
      Group = c(cluster1="yellow", cluster2="blue")
    )
    
    
    
    require(RColorBrewer)
    library(RColorBrewer)
    cols <- colorRampPalette(brewer.pal(10, "RdYlBu"))(20)
    library(pheatmap)
    pheatmap(clusters[ ,3:12], color =  rev(cols), 
         scale = "column",
         kmeans_k = NA,
         show_rownames = F, show_colnames = T,
         main = "Heatmap CK14, CK5/6, GATA3 and FOXA1 n=492 SCALE",
         clustering_method = "ward.D2",
         cluster_rows = TRUE, cluster_cols = TRUE,
         clustering_distance_rows = "correlation", 
         clustering_distance_cols = "correlation",
         annotation_row = annotation_row, 
         annotation_col = annotation_col,  
         annotation_colors=ann_colors
    )
    

    heatmap

    任何有同样问题的人?我犯了一个愚蠢的错误吗?

    提前谢谢

0 个答案:

没有答案