我有一些看起来像这样的数据:
Cluster_ID KO1 KO2 KO3 WT1 WT2 WT3
5 chr5:100947454..100947489,+ 3.31322 7.52365 3.67255 21.15730 8.732710 17.42640
12 chr5:101227760..101227782,+ 1.48223 3.76182 5.11534 15.71680 4.426170 13.43560
29 chr5:102236093..102236457,+ 15.60700 10.38260 12.46040 6.85094 15.551400 7.18341
我按如下方式过滤了它:
data<-read.table("expresn_matrix.txt", header=T)
CAGE_data <- as.data.frame(data) # Check that it has been converted to
#Remove clusters with 0 expression for all 6 samples
CAGE_filter <- CAGE[rowSums(abs(CAGE[,2:7]))>0,]
#Filter whole file to keep only clusters with at least 5 TPM in at least 3 files
CAGE_filter_more <- CAGE_filter[apply(CAGE_filter[,2:7] >= 5,1,sum) >= 3,]
CAGE_data <- as.data.frame(CAGE_filter_more)
CAGE_datam <- t(as.matrix(CAGE_data))
CAGE_data[, c(2:7)] <- sapply(CAGE_data[, c(2:7)], as.numeric)
rm= rowMeans(CAGE_data[, c(2:7)])
#Get data dimensions
dim(CAGE_datam)
过滤后,数据维度为:
[1] 7 599
我在数据上做了PCA:
pca<-prcomp(t(CAGE_data[,2:7]), scale.=TRUE)
summary(pca)$importance[,1:6]
cols <- gsub(pattern="(\\w+)\\d+", replacement="\\1", x=colnames(CAGE_data[,2:7]))
qplot(PC1, PC2, label=colnames(CAGE_data[,2:7]), color=cols, geom=c("point", "text"),
data=as.data.frame(pca$x))
我的情节每个样本只有6个点。它需要集中在更多的点上。当我从6个样本中获得599行数据时,我不确定为什么只有6个点?有人可以帮帮我吗?
谢谢。