我正在使用RNA seq数据,其中基因名称为第一列,聚类基因表达数据为以下列。有很多基因,但是我只对其中的200个感兴趣。有没有一种方法可以只针对那些特定基因,然后用它们创建数据矩阵。我可以从列
中检索信息Mydata.1 <- x[c("Gene Name", "Cluster_1")]
但不是行,例如失败
Mydata.1 <- x[c("Malat1", "Cd74")]
有人知道我该怎么做吗? 谢谢!
答案 0 :(得分:0)
要查找所需数据,可以使用以下代码:
newdata <-mydata [which(mydata $ gene =='THE_GENE_U_LOOK_FOR',]
答案 1 :(得分:0)
此答案使用逻辑向量来子集数据框行。有关更多信息,请查看:http://adv-r.had.co.nz/Subsetting.html#data-types。
# Mockup data
x <- data.frame(
`Gene Name` = c("HPRT1", "ABC", "Malat1", "Cd74"),
Cluster_1 = 1:4,
Cluster_2 = 5:8,
check.names = FALSE
)
# Defining gene names of interest to look for
target_genes <- c("Malat1", "Cd74")
# Getting a logical vector that implicitly codes for row positions
# Note: we need to wrap Gene Name in backticks (``) because of the space character in "Gene Name"
row_matches <- x$`Gene Name` %in% target_genes
# Subsetting the gene expression matrix (actually a dataframe object)
# mydata2: dataframe whose rows are for target genes only
# Note: the empty placeholder after the comma in the subsetting below indicates all columns
mydata2 <- x[row_matches, ]
mydata2
#> Gene Name Cluster_1 Cluster_2
#> 3 Malat1 3 7
#> 4 Cd74 4 8
或者,我们也可以使用函数subset
获得更简洁的代码:
# Mockup data
x <- data.frame(
`Gene Name` = c("HPRT1", "ABC", "Malat1", "Cd74"),
Cluster_1 = 1:4,
Cluster_2 = 5:8,
check.names = FALSE
)
# Defining gene names of interest to look for
target_genes <- c("Malat1", "Cd74")
# As an alternative use the function subset
mydata2 <- subset(x, `Gene Name` %in% target_genes)
mydata2
#> Gene Name Cluster_1 Cluster_2
#> 3 Malat1 3 7
#> 4 Cd74 4 8