如何使R正则表达式捕获特殊字符(例如点(。)和下划线(_))?

时间:2017-11-14 02:06:27

标签: r regex

我有三个字符串:

x <- "PB0038.1_Jundm2_1/Jaspar.instid_chr1:183286850-183287250.bin1"
y <- "Ddit3::Cebpa/MA0019.1/Jaspar.instid_chr1:183286845-183287245.bin22"
z <- "Arid3a/MA0151.1/Jaspar.instid_chr1:183286849-183287249.bin10"

正则表达式

^(.*?)\\/.*?\\/.*?\\.instid_(.*?)\\.bin(\\d+)

适用于字符串yz但不适用x

> stringr::str_match(y,"^(.*?)\\/.*?\\/.*?\\.instid_(.*?)\\.bin(\\d+)")[,c(2,3,4)]
[1] "Ddit3::Cebpa"             "chr1:183286845-183287245" "22"                      
> stringr::str_match(z,"^(.*?)\\/.*?\\/.*?\\.instid_(.*?)\\.bin(\\d+)")[,c(2,3,4)]
[1] "Arid3a"                   "chr1:183286849-183287249" "10"  
> stringr::str_match(x,"^(.*?)\\/.*?\\/.*?\\.instid_(.*?)\\.bin(\\d+)")[,c(2,3,4)]
[1] NA NA NA

我该如何修改?

x所需的最终结果是

"PB0038.1_Jundm2_1",  "chr1:183286850-183287250" "1"

1 个答案:

答案 0 :(得分:2)

您的function largestNumber(numbers) { // this function was given by the quiz writer, my code starts below "code here" // code here numbers = [1, 2, 3, 4, 5]; for(let i = 0; i < numbers.length; i++){ let largest = 0; if(numbers[i] > largest){ largest = largest.numbers[i]; } } return largest; } console.log(largestNumber); 输入不会也不应该匹配,因为它只有一个正斜杠,但您的模式需要两个。如果要允许一个或两个正斜杠,则可以对模式进行一种可能的修改:

x

您可能会发现上述模式可以接受,因为您只捕获第一个斜杠之前的模式。其他两次捕获发生在str_match(x, "^(.*?)\\/.*?\\.instid_(.*?)\\.bin(\\d+)")[,c(2,3,4)] 令牌之后和.instid_扩展之后的最后。但这些似乎都不依赖于路径中的斜线数量。

Demo