使用R提取包含变量字符串的所有行

时间:2017-09-13 17:42:34

标签: r matrix

我有一个存储为.tsv的元数据文件,我将其读入R并保存为META。我需要提取包含给定字符串的所有行" male",此处存储在变量sample中。

完整的脚本有很多这些操作,因此将模式存储在下面的示例中非常重要。错误是我试图grep的方式。

IN <- "/home/zchadva/Scratch/output/cov"

#metadata
META <- read.table("/home/zchadva/Scratch/data/hipsci/rnaseq/hipsci.qc1_sample_info.20160926.tsv", header = TRUE, sep = "\t")

#Set study/table variables
sample <- "\\<male\\>"
control <- "female"

#Grep all rows containing "male" from the table META
sample.list <- META[grep(sample, META, value=TRUE)]

编辑:这让我更接近

理想情况下,每次我需要进行模式搜索时,我不想使用META$Gender来指定coloumn,因为我们的真实元数据文件很棒。如果我确实需要指定,我希望在变量

中有Gender
sample.list <- (META[grep(sample, META$Gender), ]

例如:

**coloumn** <- Gender
sample.list <- (META[grepl(sample, META$**coloumn**), ]

#Table example simplified
ID    Disease    Gender    Cell
JX1   ibd        male      liver
PTY   healthy    male      liver
HB3   ibd        female    brain
PO3   bbs        male      

#Desired layout in sample.list
JX1   ibd        male      liver
PTY   healthy    male      liver
PO3   bbs        male      

任何帮助都非常赞赏。我已经尝试了好几个小时

1 个答案:

答案 0 :(得分:1)

grepl会比grep提供更好的结果,因为您可以使用逻辑向量来索引数据框。

META <- 
  data.frame(ID = c("JX1", "PTY", "HB3", "PO3"),
             Disease = c("ibd", "healthy", "ibd", "bbs"),
             Gender = c("male", "male", "female", "male"),
             Cell = c("liver", "liver", "brain", "liver"))

sample <- "male"
control <- "female"

META[grepl("^male", META$Gender), ]

   ID Disease Gender  Cell
1 JX1     ibd   male liver
2 PTY healthy   male liver
4 PO3     bbs   male liver