我有以下两个字符串:
x <- "chr1:625000-635000.BB_162.Adipose"
y <- "chr1:625000-635000.BB_162.combined.HMSC-ad"
使用此正则表达式,我可以捕获x
> stringr::str_match(x,"(\\w+):(\\d+)-(\\d+)\\.(\\w+)\\.(\\w+)")
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] "chr1:625000-635000.BB_162.Adipose" "chr1" "625000" "635000" "BB_162" "Adipose"
我想要做的是y
来获取此
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] "chr1:625000-635000.BB_162.combined.HMSC-ad" "chr1" "625000" "635000" "BB_162" "HMSC-ad"
使用我当前的正则表达式并申请y
我得到了这个:
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] "chr1:625000-635000.BB_162.combined" "chr1" "625000" "635000" "BB_162" "combined"
如何概括我的正则表达式,以便它可以同时处理x
和y
?
更新
S.Kalbar,你的正则表达式给出了这个:
> stringr::str_match(y,"(\\w+):(\\d+)-(\\d+)\\.(\\w+)\\.(\\w+)(?:\\.([A-Za-z-]+))?")
[,1] [,2] [,3] [,4] [,5] [,6] [,7]
[1,] "chr1:625000-635000.BB_162.combined.HMSC-ad" "chr1" "625000" "635000" "BB_162" "combined" "HMSC-ad"
> stringr::str_match(x,"(\\w+):(\\d+)-(\\d+)\\.(\\w+)\\.(\\w+)(?:\\.([A-Za-z-]+))?")
[,1] [,2] [,3] [,4] [,5] [,6] [,7]
[1,] "chr1:625000-635000.BB_162.Adipose" "chr1" "625000" "635000" "BB_162" "Adipose" NA
什么&#39;我希望得到的是y
:
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] "chr1:625000-635000.BB_162.combined.HMSC-ad" "chr1" "625000" "635000" "BB_162" "HMSC-ad"
这适用于x
:
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] "chr1:625000-635000.BB_162.Adipose" "chr1" "625000" "635000" "BB_162" "Adipose"
答案 0 :(得分:1)
正则表达式:(\w+):(\d+)-(\d+)\.(\w+)(?:\.\w+)?(?:\.([A-Za-z-]+))
答案 1 :(得分:1)
您可以为引擎分配一些令牌:
(?:(?<=\\d)-(?=\\d))|(?:\\.combined\\.)|[.:]+
分解,这说:
(?:(?<=\\d)-(?=\\d)) # a dash between numbers
| # or
(?:\\.combined\\.) # .combined. literally
| # or
[.:]+ # one of . or :
<小时/> 在
R
使用str_split()
:
library(stringr)
x <- c("chr1:625000-635000.BB_162.Adipose", "chr1:625000-635000.BB_162.combined.HMSC-ad")
str_split(x, '(?:(?<=\\d)-(?=\\d))|(?:\\.combined\\.)|[.:]+', simplify = TRUE)
哪个收益
[,1] [,2] [,3] [,4] [,5]
[1,] "chr1" "625000" "635000" "BB_162" "Adipose"
[2,] "chr1" "625000" "635000" "BB_162" "HMSC-ad"