Question

我有这3个示例字符串：

x <- "AP-1(bZIP)/ThioMac-PU.1-ChIP-Seq(GSE21512)/Homer(0.989)More Information | Similar Motifs Found"
y <- "NeuroG2(bHLH)/Fibroblast-NeuroG2-ChIP-Seq(GSE75910)/Homer(0.828)More Information | Similar Motifs Found"
z <- "SPIB/MA0081.1/Jaspar(0.753)More Information | Similar Motifs Found"

我想要做的是删除在最后/分隔符的第一个单词之后出现的字符串，结果是：

AP-1(bZIP)/ThioMac-PU.1-ChIP-Seq(GSE21512)/Homer
NeuroG2(bHLH)/Fibroblast-NeuroG2-ChIP-Seq(GSE75910)/Homer
SPIB/MA0081.1/Jaspar

我试过了，但它没有给出我想要的东西：

> sub("\\(.*?\\)More Information | Similar Motifs Found","",x)
[1] "AP-1| Similar Motifs Found"

做正确的方法是什么？

Answer 1

您可以使用贪婪模式cin >>匹配到最后cin >>，然后使用后引用提取该组：

(.*/\\w+).*

在/word中，第一个v <- c("AP-1(bZIP)/ThioMac-PU.1-ChIP-Seq(GSE21512)/Homer(0.989)More Information | Similar Motifs Found", "NeuroG2(bHLH)/Fibroblast-NeuroG2-ChIP-Seq(GSE75910)/Homer(0.828)More Information | Similar Motifs Found", "SPIB/MA0081.1/Jaspar(0.753)More Information | Similar Motifs Found") sub("(.*/\\w+).*", "\\1", v) # [1] "AP-1(bZIP)/ThioMac-PU.1-ChIP-Seq(GSE21512)/Homer" "NeuroG2(bHLH)/Fibroblast-NeuroG2-ChIP-Seq(GSE75910)/Homer" # [3] "SPIB/MA0081.1/Jaspar"是贪婪的并且会尽可能多地匹配，停止条件为(.*/\\w+).* + .*（由/匹配）;第二个a word匹配字符串的剩余部分。

如何使用带有边界的R regex删除部分字符串

1 个答案: