我有以下正则表达式,可以分割任何空格或标点符号。如何从:punct:
中排除1个或多个标点字符?假设我想排除撇号和逗号。我知道我可以明确使用[all punctuation marks in here]
代替[[:punct:]]
,但我希望有一种排除方法。
X <- "I'm not that good at regex yet, but am getting better!"
strsplit(X, "[[:space:]]|(?=[[:punct:]])", perl=TRUE)
[1] "I" "'" "m" "not" "that" "good" "at" "regex" "yet"
[10] "," "" "but" "am" "getting" "better" "!"
答案 0 :(得分:8)
我不清楚你想要的结果是什么,但你可以使用负面的like this answer。
R> strsplit(X, "[[:space:]]|(?=[^,'[:^punct:]])", perl=TRUE)[[1]]
[1] "I'm" "not" "that" "good" "at" "regex" "yet,"
[8] "but" "am" "getting" "better" "!"
答案 1 :(得分:0)
如果右侧的下一个字符为(?![',])
或'
,则您可以直接使用,
negative lookahead对PCRE子模式施加限制,但匹配失败:< / p>
[[:space:]]|(?=(?![',])[[:punct:]])
^^^^^^^^
请参阅regex demo。
<强>详情
[[:space:]]
- 任何空白|
- 或(?=(?![',])[[:punct:]])
- 一个积极的前瞻,要求在当前位置的右侧,没有'
和,
并且有任何1个标点符号不是'
或,
(实际上,需要'
和,
以外的任何标点符号。X <- "I'm not that good at regex yet, but am getting better!"
strsplit(X, "[[:space:]]|(?=(?![',])[[:punct:]])", perl=TRUE)
[[1]]
[1] "I'm" "not" "that" "good" "at" "regex" "yet,"
[8] "but" "am" "getting" "better" "!"