我希望在第三组括号中提取string
的部分,最好使用基数R
。以下是一个示例数据集:
my.data <- read.table(text = '
my.num my.string my.cov
1 Abc(~1)Fgf(~-1+e2:cp)Bca(~-1+g1+g2:ti+g2:cfi+g2:pp+g2:cp)q(~-1+re:se) 10
2 Abc(~1)Fgf(~-1+e1:e2:fi)Bca(~-1+g1+g2:ti+g2:pr+g2:ts+g2:cfi)q(~1) 20
3 Abc(~1)Fgf(~1)Bca(~-1+g1+g2+g2:cp)q(~-1+re:se) 15
', header = TRUE, stringsAsFactors = FALSE)
my.data
这两个结果中的任何一个都会有所帮助:
desired.result1 <- read.table(text = '
my.num my.string my.cov
1 Bca(~-1+g1+g2:ti+g2:cfi+g2:pp+g2:cp) 10
2 Bca(~-1+g1+g2:ti+g2:pr+g2:ts+g2:cfi) 20
3 Bca(~-1+g1+g2+g2:cp) 15
', header = TRUE, stringsAsFactors = FALSE)
desired.result1
desired.result2 <- read.table(text = '
my.num my.string my.cov
1 ~-1+g1+g2:ti+g2:cfi+g2:pp+g2:cp 10
2 ~-1+g1+g2:ti+g2:pr+g2:ts+g2:cfi 20
3 ~-1+g1+g2+g2:cp 15
', header = TRUE, stringsAsFactors = FALSE)
desired.result2
regex
我生气了。我甚至不知道从哪里开始,也无法在互联网上找到类似的问题。感谢您的任何建议或帮助。
答案 0 :(得分:4)
使用strsplit:
sapply(strsplit(my.data$my.string, split = "(", fixed = TRUE), function(i){
strsplit(i[4], split = ")", fixed = TRUE)[[1]][1]})
# [1] "~-1+g1+g2:ti+g2:cfi+g2:pp+g2:cp" "~-1+g1+g2:ti+g2:pr+g2:ts+g2:cfi" "~-1+g1+g2+g2:cp"
答案 1 :(得分:3)
第一个表达:
sub(".*?\\(.*?\\).*?\\(.*?\\)(.*?\\(.*?\\)).*", "\\1", my.data$my.string)
[1] "Bca(~-1+g1+g2:ti+g2:cfi+g2:pp+g2:cp)" "Bca(~-1+g1+g2:ti+g2:pr+g2:ts+g2:cfi)"
[3] "Bca(~-1+g1+g2+g2:cp)"
第二个表达:
sub(".*?\\(.*?\\).*?\\(.*?\\).*?\\((.*?)\\).*", "\\1", my.data$my.string)
[1] "~-1+g1+g2:ti+g2:cfi+g2:pp+g2:cp" "~-1+g1+g2:ti+g2:pr+g2:ts+g2:cfi" "~-1+g1+g2+g2:cp"