我有这3个示例字符串:
x <- "AP-1(bZIP)/ThioMac-PU.1-ChIP-Seq(GSE21512)/Homer(0.989)More Information | Similar Motifs Found"
y <- "NeuroG2(bHLH)/Fibroblast-NeuroG2-ChIP-Seq(GSE75910)/Homer(0.828)More Information | Similar Motifs Found"
z <- "SPIB/MA0081.1/Jaspar(0.753)More Information | Similar Motifs Found"
我想要做的是删除在最后/
分隔符的第一个单词之后出现的字符串,结果是:
AP-1(bZIP)/ThioMac-PU.1-ChIP-Seq(GSE21512)/Homer
NeuroG2(bHLH)/Fibroblast-NeuroG2-ChIP-Seq(GSE75910)/Homer
SPIB/MA0081.1/Jaspar
我试过了,但它没有给出我想要的东西:
> sub("\\(.*?\\)More Information | Similar Motifs Found","",x)
[1] "AP-1| Similar Motifs Found"
做正确的方法是什么?
答案 0 :(得分:1)
您可以使用贪婪模式cin >>
匹配到最后cin >>
,然后使用后引用提取该组:
(.*/\\w+).*
在/word
中,第一个v <- c("AP-1(bZIP)/ThioMac-PU.1-ChIP-Seq(GSE21512)/Homer(0.989)More Information | Similar Motifs Found", "NeuroG2(bHLH)/Fibroblast-NeuroG2-ChIP-Seq(GSE75910)/Homer(0.828)More Information | Similar Motifs Found", "SPIB/MA0081.1/Jaspar(0.753)More Information | Similar Motifs Found")
sub("(.*/\\w+).*", "\\1", v)
# [1] "AP-1(bZIP)/ThioMac-PU.1-ChIP-Seq(GSE21512)/Homer" "NeuroG2(bHLH)/Fibroblast-NeuroG2-ChIP-Seq(GSE75910)/Homer"
# [3] "SPIB/MA0081.1/Jaspar"
是贪婪的并且会尽可能多地匹配,停止条件为(.*/\\w+).*
+ .*
(由/
匹配);第二个a word
匹配字符串的剩余部分。