K意味着聚类和热图在R中搞砸了

时间:2014-10-07 13:10:38

标签: r ggplot2 heatmap k-means

我想让我的K-means和热图在R中运行。

示例数据是:

Gene          CTRL           Trt1       Trt2
CTC-367J11.1 1.246981e-01 1.367852e-05 1.794000e-05
Metazoa_SRP  2.530088e-05 1.444200e-05 1.926654e-05
U2           3.333631e-05 2.958200e-05 2.139313e-05
U6           6.305455e-05 5.250028e-05 8.006410e-05
PDE4B        1.096031e+01 1.152491e+01 1.123822e+01
Y_RNA        1.055033e-04 7.694829e-05 6.391186e-05
Metazoa_SRP  1.667394e-05 1.435015e-05 1.827063e-05
SIK3         1.899680e+01 1.969393e+01 2.364119e+01
Metazoa_SRP  5.617737e-01 8.913592e-01 1.842051e-01
U6           1.197319e-04 8.278068e-05 4.552052e-05
Metazoa_SRP  1.639560e-05 2.207347e-05 1.568830e-05
TAB1         1.283763e+01 9.046890e+00 9.739123e+00
U6           1.033654e-04 6.847156e-05 9.091511e-05
CENPC1       5.189229e+01 3.859490e+01 4.172082e+01
Y_RNA        7.265482e-05 5.306069e-05 5.707300e-05
Metazoa_SRP  1.621217e-05 9.311304e-01 5.794767e-01
Y_RNA        9.819591e-05 8.993314e-05 7.113170e-05
Metazoa_SRP  2.246108e-05 1.921480e-05 1.768147e-05
Metazoa_SRP  2.747219e-05 1.513105e-05 1.145366e-03
SULT1E1      1.443337e-03 1.072894e-03 2.243520e-02
Y_RNA        6.103954e-05 6.474251e-05 9.992000e-05
Y_RNA        9.063240e-05 6.180986e-05 6.909407e-05
NADKD1       2.370368e+01 1.709286e+01 1.503605e+01
U6           1.223693e-04 6.924021e-05 6.730057e-05
Y_RNA        1.612464e-04 8.317700e-05 1.367695e-04
RNU1-1       1.166811e+05 1.696343e+05 1.129499e+05
U6           4.632516e-05 9.701152e-05 6.301424e-05
NTPCR        3.139066e+01 1.629096e+01 1.781411e+01
Metazoa_SRP  1.978751e-01 1.433062e-05 2.070821e-05
U6           8.207452e-05 7.182641e-05 7.608100e-05
Metazoa_SRP  1.578756e-05 1.858409e-05 2.446180e-05
U6           5.737100e-05 5.423917e-05 9.218728e-05
DVL2         3.294008e+01 2.095570e+01 2.340127e+01
Metazoa_SRP  1.087326e+00 1.443017e+00 2.541242e+00
GALNT2       2.775928e+01 1.730751e+01 2.105737e+01
Metazoa_SRP  3.084284e-05 3.512870e-05 2.576436e-04
BCR          3.634695e+01 1.260421e+01 1.375759e+01
U6           6.806021e-05 5.552677e-05 8.207164e-05
Y_RNA        8.142876e-05 6.821020e-05 1.088023e-04
U6           6.790829e-05 5.647853e-05 7.394994e-05
U7           1.448038e-04 9.154464e-05 1.285874e-04
SCAND1       1.885882e+01 2.245786e+01 2.580144e+01
PHRF1        1.188219e+01 1.072032e+01 1.117122e+01
U7           2.287524e-04 1.977780e-04 1.102363e-04
U6           1.028393e-04 4.356925e-05 4.605374e-05
U6           6.817994e-05 8.988280e-05 5.114122e-05
Metazoa_SRP  1.542046e+02 1.290191e+02 1.557341e+02
Metazoa_SRP  7.414352e-01 1.374566e+00 1.305447e+00
ZDHHC5       1.537020e+01 1.838988e+01 1.851591e+01
U6           5.157132e-05 8.396489e-05 4.929159e-05

我使用的命令是:

library(gplots)
library(RColorBrewer)
A<-read.table(file="random.txt",header=T)
A.matrix <- data.matrix(A[,2:ncol(A)])
rownames(A.matrix) <- A$Gene
A.matrix <- A.matrix + 0.00001
log10.A.matrix <- log10(A.matrix)
Z.log10.A.matrix <- t(scale(t(log10.A.matrix)))
tmp <- Z.log10.A.matrix[which(is.finite(Z.log10.A.matrix[,1])),]

length(which(!is.finite(tmp)))

fin.Z.log10.A.matrix <- tmp
set.seed(1)
km9.fin.Z.log.A.matrix <- kmeans(fin.Z.log10.A.matrix,5, iter.max=40, nstart=10)

rowOrder <- names(sort(km9.fin.Z.log.A.matrix$cluster))
colorVector <- c("darkgreen","darkred","orange", "green", "magenta")
clusterColors <- colorVector[ sort(km9.fin.Z.log.A.matrix$cluster)]
col1=c("blue","white","firebrick")
heatmap.2(fin.Z.log10.A.matrix[rowOrder,],trace="none",labRow=F,labCol=colnames(km9.fin.Z.log.A.matrix),col=col1,RowSideColors=clusterColors,Rowv=F,Colv=T,dendrogram="column",na.rm=T,main="Gene Expression",mar=c(5,5), cexCol=0.5)

我得到的数字不是预期的数字。行未正确排序。

群集看起来很丢失。我认为这是一个小错误,但我无法追踪它。

请帮助。

谢谢

1 个答案:

答案 0 :(得分:1)

以下内容并非直接相关,但可能对错误有一些线索:

 mm2 = melt(ddf, id='Gene')
 ggplot(mm2[mm2$value<100,], aes(x=variable, y=value, group=Gene, color=Gene))+geom_point()+geom_line()

enter image description here

为清楚起见,省略了一些具有较高值的​​基因。

但是这些较高的值可能是错误的来源:

> mm2[mm2$value>100,]
           Gene variable       value
26       RNU1-1     CTRL 116681.1000
47  Metazoa_SRP     CTRL    154.2046
76       RNU1-1     Trt1 169634.3000
97  Metazoa_SRP     Trt1    129.0191
126      RNU1-1     Trt2 112949.9000
147 Metazoa_SRP     Trt2    155.7341