我有一个数据框,其中包含几种类型的基因信息:
chr start end Gene Region
1 100 110 Bat Exon
1 120 130 Bat Intron
1 500 550 Ball Upstream, Downstream
1 590 600 Ball Intron, Upstream
1 900 980 Mit Promoter, Upstream
我想将数据子集化以删除包含具有" Exon"的基因的任何行。或"促销员"在Regions列中。我一直在用:
Regions <- subset(Table, Region == "Intron" | Region== "DownStream" | Region =="Upstream" | Region=="DownStream,Upstream")
但是这给了我:
chr start end Gene Region
1 120 130 Bat Intron
1 500 550 Ball Upstream, Downstream
1 590 600 Ball Intron, Upstream
我想要的是:
chr start end Gene Region
1 500 550 Ball Upstream, Downstream
1 590 600 Ball Intron, Upstream
答案 0 :(得分:2)
使用grepl
:
df[!grepl("Exon|Promoter", df$Region),]
# chr start end Gene Region
#2 1 120 130 Bat Intron
#3 1 500 550 Ball Upstream, Downstream
#4 1 590 600 Ball Intron, Upstream
我不清楚为什么你想要第2行用&#34; Intron&#34;也删除了。请解释一下。
想想我现在明白了,试试这个:
temp <- df$Gene[grepl("Exon|Promoter", df$Region)]
df[!df$Gene %in% temp,]
# chr start end Gene Region
#3 1 500 550 Ball Upstream, Downstream
#4 1 590 600 Ball Intron, Upstream