Question

我正在使用RNA seq数据，其中基因名称为第一列，聚类基因表达数据为以下列。有很多基因，但是我只对其中的200个感兴趣。有没有一种方法可以只针对那些特定基因，然后用它们创建数据矩阵。我可以从列

中检索信息

Mydata.1 <- x[c("Gene Name", "Cluster_1")]

但不是行，例如失败

Mydata.1 <- x[c("Malat1", "Cd74")]

有人知道我该怎么做吗？谢谢！

Answer 1

要查找所需数据，可以使用以下代码：

newdata <-mydata [which（mydata $ gene =='THE_GENE_U_LOOK_FOR'，]

Answer 2

此答案使用逻辑向量来子集数据框行。有关更多信息，请查看：http://adv-r.had.co.nz/Subsetting.html#data-types。

# Mockup data
x <- data.frame(
  `Gene Name` = c("HPRT1", "ABC", "Malat1", "Cd74"),
  Cluster_1 = 1:4,
  Cluster_2 = 5:8,
  check.names = FALSE
)

# Defining gene names of interest to look for
target_genes <- c("Malat1", "Cd74")

# Getting a logical vector that implicitly codes for row positions
# Note: we need to wrap Gene Name in backticks (``) because of the space character in "Gene Name"
row_matches <- x$`Gene Name` %in% target_genes

# Subsetting the gene expression  matrix (actually a dataframe object)
# mydata2: dataframe whose rows are for target genes only
# Note: the empty placeholder after the comma in the subsetting below indicates all columns
mydata2 <- x[row_matches, ]

mydata2
#>   Gene Name Cluster_1 Cluster_2
#> 3    Malat1         3         7
#> 4      Cd74         4         8

或者，我们也可以使用函数subset获得更简洁的代码：

# Mockup data
x <- data.frame(
  `Gene Name` = c("HPRT1", "ABC", "Malat1", "Cd74"),
  Cluster_1 = 1:4,
  Cluster_2 = 5:8,
  check.names = FALSE
)

# Defining gene names of interest to look for
target_genes <- c("Malat1", "Cd74")

# As an alternative use the function subset
mydata2 <- subset(x, `Gene Name` %in% target_genes)

mydata2
#>   Gene Name Cluster_1 Cluster_2
#> 3    Malat1         3         7
#> 4      Cd74         4         8

如何在R中搜索特定行并返回该行的信息

2 个答案: