我想让我的K-means和热图在R中运行。
示例数据是:
Gene CTRL Trt1 Trt2
CTC-367J11.1 1.246981e-01 1.367852e-05 1.794000e-05
Metazoa_SRP 2.530088e-05 1.444200e-05 1.926654e-05
U2 3.333631e-05 2.958200e-05 2.139313e-05
U6 6.305455e-05 5.250028e-05 8.006410e-05
PDE4B 1.096031e+01 1.152491e+01 1.123822e+01
Y_RNA 1.055033e-04 7.694829e-05 6.391186e-05
Metazoa_SRP 1.667394e-05 1.435015e-05 1.827063e-05
SIK3 1.899680e+01 1.969393e+01 2.364119e+01
Metazoa_SRP 5.617737e-01 8.913592e-01 1.842051e-01
U6 1.197319e-04 8.278068e-05 4.552052e-05
Metazoa_SRP 1.639560e-05 2.207347e-05 1.568830e-05
TAB1 1.283763e+01 9.046890e+00 9.739123e+00
U6 1.033654e-04 6.847156e-05 9.091511e-05
CENPC1 5.189229e+01 3.859490e+01 4.172082e+01
Y_RNA 7.265482e-05 5.306069e-05 5.707300e-05
Metazoa_SRP 1.621217e-05 9.311304e-01 5.794767e-01
Y_RNA 9.819591e-05 8.993314e-05 7.113170e-05
Metazoa_SRP 2.246108e-05 1.921480e-05 1.768147e-05
Metazoa_SRP 2.747219e-05 1.513105e-05 1.145366e-03
SULT1E1 1.443337e-03 1.072894e-03 2.243520e-02
Y_RNA 6.103954e-05 6.474251e-05 9.992000e-05
Y_RNA 9.063240e-05 6.180986e-05 6.909407e-05
NADKD1 2.370368e+01 1.709286e+01 1.503605e+01
U6 1.223693e-04 6.924021e-05 6.730057e-05
Y_RNA 1.612464e-04 8.317700e-05 1.367695e-04
RNU1-1 1.166811e+05 1.696343e+05 1.129499e+05
U6 4.632516e-05 9.701152e-05 6.301424e-05
NTPCR 3.139066e+01 1.629096e+01 1.781411e+01
Metazoa_SRP 1.978751e-01 1.433062e-05 2.070821e-05
U6 8.207452e-05 7.182641e-05 7.608100e-05
Metazoa_SRP 1.578756e-05 1.858409e-05 2.446180e-05
U6 5.737100e-05 5.423917e-05 9.218728e-05
DVL2 3.294008e+01 2.095570e+01 2.340127e+01
Metazoa_SRP 1.087326e+00 1.443017e+00 2.541242e+00
GALNT2 2.775928e+01 1.730751e+01 2.105737e+01
Metazoa_SRP 3.084284e-05 3.512870e-05 2.576436e-04
BCR 3.634695e+01 1.260421e+01 1.375759e+01
U6 6.806021e-05 5.552677e-05 8.207164e-05
Y_RNA 8.142876e-05 6.821020e-05 1.088023e-04
U6 6.790829e-05 5.647853e-05 7.394994e-05
U7 1.448038e-04 9.154464e-05 1.285874e-04
SCAND1 1.885882e+01 2.245786e+01 2.580144e+01
PHRF1 1.188219e+01 1.072032e+01 1.117122e+01
U7 2.287524e-04 1.977780e-04 1.102363e-04
U6 1.028393e-04 4.356925e-05 4.605374e-05
U6 6.817994e-05 8.988280e-05 5.114122e-05
Metazoa_SRP 1.542046e+02 1.290191e+02 1.557341e+02
Metazoa_SRP 7.414352e-01 1.374566e+00 1.305447e+00
ZDHHC5 1.537020e+01 1.838988e+01 1.851591e+01
U6 5.157132e-05 8.396489e-05 4.929159e-05
我使用的命令是:
library(gplots)
library(RColorBrewer)
A<-read.table(file="random.txt",header=T)
A.matrix <- data.matrix(A[,2:ncol(A)])
rownames(A.matrix) <- A$Gene
A.matrix <- A.matrix + 0.00001
log10.A.matrix <- log10(A.matrix)
Z.log10.A.matrix <- t(scale(t(log10.A.matrix)))
tmp <- Z.log10.A.matrix[which(is.finite(Z.log10.A.matrix[,1])),]
length(which(!is.finite(tmp)))
fin.Z.log10.A.matrix <- tmp
set.seed(1)
km9.fin.Z.log.A.matrix <- kmeans(fin.Z.log10.A.matrix,5, iter.max=40, nstart=10)
rowOrder <- names(sort(km9.fin.Z.log.A.matrix$cluster))
colorVector <- c("darkgreen","darkred","orange", "green", "magenta")
clusterColors <- colorVector[ sort(km9.fin.Z.log.A.matrix$cluster)]
col1=c("blue","white","firebrick")
heatmap.2(fin.Z.log10.A.matrix[rowOrder,],trace="none",labRow=F,labCol=colnames(km9.fin.Z.log.A.matrix),col=col1,RowSideColors=clusterColors,Rowv=F,Colv=T,dendrogram="column",na.rm=T,main="Gene Expression",mar=c(5,5), cexCol=0.5)
我得到的数字不是预期的数字。行未正确排序。
群集看起来很丢失。我认为这是一个小错误,但我无法追踪它。
请帮助。
谢谢
答案 0 :(得分:1)
以下内容并非直接相关,但可能对错误有一些线索:
mm2 = melt(ddf, id='Gene')
ggplot(mm2[mm2$value<100,], aes(x=variable, y=value, group=Gene, color=Gene))+geom_point()+geom_line()
为清楚起见,省略了一些具有较高值的基因。
但是这些较高的值可能是错误的来源:
> mm2[mm2$value>100,]
Gene variable value
26 RNU1-1 CTRL 116681.1000
47 Metazoa_SRP CTRL 154.2046
76 RNU1-1 Trt1 169634.3000
97 Metazoa_SRP Trt1 129.0191
126 RNU1-1 Trt2 112949.9000
147 Metazoa_SRP Trt2 155.7341