如何在R中搜索特定行并返回该行的信息

时间:2019-02-20 21:35:45

标签: r subset

我正在使用RNA seq数据,其中基因名称为第一列,聚类基因表达数据为以下列。有很多基因,但是我只对其中的200个感兴趣。有没有一种方法可以只针对那些特定基因,然后用它们创建数据矩阵。我可以从列

中检索信息
Mydata.1 <- x[c("Gene Name", "Cluster_1")]

但不是行,例如失败

Mydata.1 <- x[c("Malat1", "Cd74")] 

有人知道我该怎么做吗? 谢谢!

2 个答案:

答案 0 :(得分:0)

要查找所需数据,可以使用以下代码:

newdata <-mydata [which(mydata $ gene =='THE_GENE_U_LOOK_FOR',]

答案 1 :(得分:0)

此答案使用逻辑向量来子集数据框行。有关更多信息,请查看:http://adv-r.had.co.nz/Subsetting.html#data-types

# Mockup data
x <- data.frame(
  `Gene Name` = c("HPRT1", "ABC", "Malat1", "Cd74"),
  Cluster_1 = 1:4,
  Cluster_2 = 5:8,
  check.names = FALSE
)

# Defining gene names of interest to look for
target_genes <- c("Malat1", "Cd74")

# Getting a logical vector that implicitly codes for row positions
# Note: we need to wrap Gene Name in backticks (``) because of the space character in "Gene Name"
row_matches <- x$`Gene Name` %in% target_genes

# Subsetting the gene expression  matrix (actually a dataframe object)
# mydata2: dataframe whose rows are for target genes only
# Note: the empty placeholder after the comma in the subsetting below indicates all columns
mydata2 <- x[row_matches, ]

mydata2
#>   Gene Name Cluster_1 Cluster_2
#> 3    Malat1         3         7
#> 4      Cd74         4         8

或者,我们也可以使用函数subset获得更简洁的代码:

# Mockup data
x <- data.frame(
  `Gene Name` = c("HPRT1", "ABC", "Malat1", "Cd74"),
  Cluster_1 = 1:4,
  Cluster_2 = 5:8,
  check.names = FALSE
)

# Defining gene names of interest to look for
target_genes <- c("Malat1", "Cd74")

# As an alternative use the function subset
mydata2 <- subset(x, `Gene Name` %in% target_genes)

mydata2
#>   Gene Name Cluster_1 Cluster_2
#> 3    Malat1         3         7
#> 4      Cd74         4         8