在第三组括号中提取字符串

时间:2017-02-20 21:08:34

标签: r regex string

我希望在第三组括号中提取string的部分,最好使用基数R。以下是一个示例数据集:

my.data <- read.table(text = '
     my.num                              my.string                                  my.cov
        1    Abc(~1)Fgf(~-1+e2:cp)Bca(~-1+g1+g2:ti+g2:cfi+g2:pp+g2:cp)q(~-1+re:se)    10
        2    Abc(~1)Fgf(~-1+e1:e2:fi)Bca(~-1+g1+g2:ti+g2:pr+g2:ts+g2:cfi)q(~1)        20
        3    Abc(~1)Fgf(~1)Bca(~-1+g1+g2+g2:cp)q(~-1+re:se)                           15
', header = TRUE, stringsAsFactors = FALSE)
my.data

这两个结果中的任何一个都会有所帮助:

desired.result1 <- read.table(text = '
     my.num                     my.string            my.cov
        1    Bca(~-1+g1+g2:ti+g2:cfi+g2:pp+g2:cp)      10
        2    Bca(~-1+g1+g2:ti+g2:pr+g2:ts+g2:cfi)      20
        3    Bca(~-1+g1+g2+g2:cp)                      15
', header = TRUE, stringsAsFactors = FALSE)
desired.result1

desired.result2 <- read.table(text = '
     my.num                     my.string       my.cov
        1    ~-1+g1+g2:ti+g2:cfi+g2:pp+g2:cp      10
        2    ~-1+g1+g2:ti+g2:pr+g2:ts+g2:cfi      20
        3    ~-1+g1+g2+g2:cp                      15
', header = TRUE, stringsAsFactors = FALSE)
desired.result2

regex我生气了。我甚至不知道从哪里开始,也无法在互联网上找到类似的问题。感谢您的任何建议或帮助。

2 个答案:

答案 0 :(得分:4)

使用strsplit:

sapply(strsplit(my.data$my.string, split = "(", fixed = TRUE), function(i){
  strsplit(i[4], split = ")", fixed = TRUE)[[1]][1]})

# [1] "~-1+g1+g2:ti+g2:cfi+g2:pp+g2:cp" "~-1+g1+g2:ti+g2:pr+g2:ts+g2:cfi" "~-1+g1+g2+g2:cp" 

答案 1 :(得分:3)

第一个表达:

sub(".*?\\(.*?\\).*?\\(.*?\\)(.*?\\(.*?\\)).*", "\\1", my.data$my.string)
[1] "Bca(~-1+g1+g2:ti+g2:cfi+g2:pp+g2:cp)" "Bca(~-1+g1+g2:ti+g2:pr+g2:ts+g2:cfi)"
[3] "Bca(~-1+g1+g2+g2:cp)" 

第二个表达:

sub(".*?\\(.*?\\).*?\\(.*?\\).*?\\((.*?)\\).*", "\\1", my.data$my.string)
[1] "~-1+g1+g2:ti+g2:cfi+g2:pp+g2:cp" "~-1+g1+g2:ti+g2:pr+g2:ts+g2:cfi" "~-1+g1+g2+g2:cp"