我有一个存储为.tsv的元数据文件,我将其读入R并保存为META
。我需要提取包含给定字符串的所有行" male",此处存储在变量sample
中。
完整的脚本有很多这些操作,因此将模式存储在下面的示例中非常重要。错误是我试图grep的方式。
IN <- "/home/zchadva/Scratch/output/cov"
#metadata
META <- read.table("/home/zchadva/Scratch/data/hipsci/rnaseq/hipsci.qc1_sample_info.20160926.tsv", header = TRUE, sep = "\t")
#Set study/table variables
sample <- "\\<male\\>"
control <- "female"
#Grep all rows containing "male" from the table META
sample.list <- META[grep(sample, META, value=TRUE)]
理想情况下,每次我需要进行模式搜索时,我不想使用META$Gender
来指定coloumn,因为我们的真实元数据文件很棒。如果我确实需要指定,我希望在变量
Gender
sample.list <- (META[grep(sample, META$Gender), ]
例如:
**coloumn** <- Gender
sample.list <- (META[grepl(sample, META$**coloumn**), ]
#Table example simplified
ID Disease Gender Cell
JX1 ibd male liver
PTY healthy male liver
HB3 ibd female brain
PO3 bbs male
#Desired layout in sample.list
JX1 ibd male liver
PTY healthy male liver
PO3 bbs male
任何帮助都非常赞赏。我已经尝试了好几个小时
答案 0 :(得分:1)
grepl
会比grep
提供更好的结果,因为您可以使用逻辑向量来索引数据框。
META <-
data.frame(ID = c("JX1", "PTY", "HB3", "PO3"),
Disease = c("ibd", "healthy", "ibd", "bbs"),
Gender = c("male", "male", "female", "male"),
Cell = c("liver", "liver", "brain", "liver"))
sample <- "male"
control <- "female"
META[grepl("^male", META$Gender), ]
ID Disease Gender Cell
1 JX1 ibd male liver
2 PTY healthy male liver
4 PO3 bbs male liver