R:不能使用lookahead或lookbehind strsplit正确分割字符串

时间:2016-01-18 12:55:19

标签: regex r

这里是字符串

15  3 23 11  0 51.0000000  0 18G 5G 7G 9G10G13G16G19G20G27G28G30R 2

我需要将其拆分为" G"和" R"得到

[1] " 15  3 23 11  0 18.0000000  0 18 "G 5" "G 7" "G 9" "G10" "G13" .... "R 2"                                

我试图使用前瞻和后视。

Lookbehind ss.tl.pattern="(?<=G|R[ 0-9]{2})" split.tl=strsplit(time.lines,ss.tl.pattern,perl=TRUE)合理地运作:

[[1]]
 [1] " 15  3 23 11  0 18.0000000  0 18G 5" "G 7"                                
 [3] "G 9"                                 "G10"                                
 [5] "G13"                                  "G16"                                
 [7] "G19"                                 "G20"                                
 [9] "G27"                                 "G28"                                
 [11] "G30"                                 "R 2"    

除了第一个sep之外的所有内容

如果我尝试前瞻相同的模式ss.tl.pattern="(?=G|R[ 0-9]{2})",那就错了:

 [[3]]
 [1] " 15  3 23 11  0 20.0000000  0 18" "G"                               
 [3] " 5"                               "G"                               
 [5] " 7"                               "G"                               
 [7] " 9"                               "G"                               
 [9] "10"                               "G"                               
[11] "13"                               "G"                               
[13] "16"                               "G"                               
[15] "19"                               "G"                               
[17] "20"                               "G"                               
[19] "27"                               "G"                               
[21] "28"                               "G"                               
[23] "30"                               "R"
[25] "2"                           

我无法弄清楚为什么它在之前和之后分裂&#34; G&#34;或&#34; R&#34;。

1 个答案:

答案 0 :(得分:3)

我们可以使用strsplit

strsplit(str1, "(?<=\\d)(?=(G|R))", perl=TRUE)[[1]]
#[1] "15  3 23 11  0 51.0000000  0 18" "G 5"                             "G 7"                             "G 9"                            
#[5] "G10"                             "G13"                             "G16"                             "G19"                            
#[9] "G20"                             "G27"                             "G28"                             "G30"                            
#[13] "R 2"  

数据

str1 <- "15  3 23 11  0 51.0000000  0 18G 5G 7G 9G10G13G16G19G20G27G28G30R 2"