假设我有这个字符串:
string <- "I2-1-EX-1-I3-1-EX-1-I2-1-I1-1-EX-1-I3-1-I2-1-EX-1-I2-1-I2-1-I1-1-I3-1-N2-1-I1-1-I1-1-I2-1-N2-1-N3-1-I1-1-NR-1-FA-1-NR-1-I3-1-I1-1-NR-1-N1-1-EX-1-QU-1-I3-1-NR-1-FA-1-EX-1-QU-1-NR-1-I2-1-I2-1-I2-1-NR-1-TR-1-I1-1-I2-1-I3-1-NR-1-I1-1-I1-1-EX-1-NR-1-NR-1-I1-1-NR-1-NR-1-I3-1-I2-1-NR-1-I1-1-QU-1-QU-1-I1-1-TR-1-QU-1-NR-1-NR-1-QU-1-TR-1-NR-1-I1-1-TR-1-I1-1-FA-1-I1-1-I2-1-QU-1-TR-1-FA-1-EX-1-QU-1-QU-1-QU-1-NR-1-QU-1-I1-1-TR-1-FA-1-QU-1-FA-1-FA-1-TR-1-FA-1-QU-1-EX-1-QU-1-I1-1-QU-1-QU-1-FA-1-FA-1-QU-1-QU-1-FA-1-FA-1-I3-1-NR-1-FA-1-I1-1-I2-1-FA-1-QU-1-FA-1-I2-1-FA-1-NR-1-I1-1-NR-1-TR-1-NR-1-EX-1-NR-1-NR-1-EX-1-TR-1-I3-1-I1-1-NR-1-NR-1-FA-1-I1-1-TR-1-EX-1-NR-1-NR-1-I1-1-I1-1-NR-1-I1-1-NR-1-EX-1-EX-1-EX-1-NR-1-NR-1-NR-1-FA-1-FA"
我想匹配包含"I"
的两个标记之间发生的所有事情。例如,这意味着从字符串的开头匹配:
-EX-
-EX-
-EX-
-EX-
-N2-
-N2-1-N3-
-NR-1-FA-1-NR-
etc...
如何使用正则表达式(理想情况下适用于R)来实现此匹配?
我尝试了类似(?=<1|2|3).*(?=I)
的内容,但似乎没有用。我上面的正则表达式的基本原理是,所有我以1,2或3结束,这将是一个后视应该找到的左手边界,而我是前瞻应该找到的右手边界。
答案 0 :(得分:4)
好像您正在尝试获取介于I[123]-1
和1-I[123]
之间的所有字符。 \K
keeps the text matched so far out of the overall regex match。 (?:(?!I[123]).)*?
只有在I
中不是任何单个字符时才匹配任何单个字符,否则匹配将失败。
I[123]
答案 1 :(得分:2)
> strsplit(string, "I\\d-\\d")
[[1]]
[1] ""
[2] "-EX-1-"
[3] "-EX-1-"
[4] "-"
[5] "-EX-1-"
[6] "-"
[7] "-EX-1-"
[8] "-"
[9] "-"
[10] "-"
[11] "-N2-1-"
[12] "-"
[13] "-"
[14] "-N2-1-N3-1-"
[15] "-NR-1-FA-1-NR-1-"
[16] "-"
[17] "-NR-1-N1-1-EX-1-QU-1-"
[18] "-NR-1-FA-1-EX-1-QU-1-NR-1-"
[19] "-"
[20] "-"
[21] "-NR-1-TR-1-"
[22] "-"
[23] "-"
[24] "-NR-1-"
[25] "-"
[26] "-EX-1-NR-1-NR-1-"
[27] "-NR-1-NR-1-"
[28] "-"
[29] "-NR-1-"
[30] "-QU-1-QU-1-"
[31] "-TR-1-QU-1-NR-1-NR-1-QU-1-TR-1-NR-1-"
[32] "-TR-1-"
[33] "-FA-1-"
[34] "-"
[35] "-QU-1-TR-1-FA-1-EX-1-QU-1-QU-1-QU-1-NR-1-QU-1-"
[36] "-TR-1-FA-1-QU-1-FA-1-FA-1-TR-1-FA-1-QU-1-EX-1-QU-1-"
[37] "-QU-1-QU-1-FA-1-FA-1-QU-1-QU-1-FA-1-FA-1-"
[38] "-NR-1-FA-1-"
[39] "-"
[40] "-FA-1-QU-1-FA-1-"
[41] "-FA-1-NR-1-"
[42] "-NR-1-TR-1-NR-1-EX-1-NR-1-NR-1-EX-1-TR-1-"
[43] "-"
[44] "-NR-1-NR-1-FA-1-"
[45] "-TR-1-EX-1-NR-1-NR-1-"
[46] "-"
[47] "-NR-1-"
[48] "-NR-1-EX-1-EX-1-EX-1-NR-1-NR-1-NR-1-FA-1-FA"
如果您想将数字范围限制为1:3,请使用此模式:"I[1-3]-[1-3]"