选择性地删除尾随字符串

时间:2017-12-05 06:37:46

标签: r regex gsub

我想删除最后一个字母“O”,除非是“HELLO”这个词的一部分。

我试过这样做:

示例:

a <- c("HELLO XO","DO HELLO","TWO XO","HO")
gsub("[^HELLO]O\\>","",a)

[1] "HELLO " " HELLO" "T " "HO"

但我想要

"HELLO X" "D HELLO" "TW X" "H"

5 个答案:

答案 0 :(得分:3)

尝试使用以下模式进行替换:

\b(?!HELLO\b)(\w+)O\b

这表示断言单词HELLO不会出现在单词中,然后会抓住所有内容直到最终O,如果它出现的话。然后,它替换为已删除的可选最终O

\b          - from the start of the word
(?!HELLO\b) - assert that the word is not HELLO
(\w+)O      - match a word ending in O, but don't capture final O
\b          - end of word

如果匹配发生,捕获组将包含整个单词减去最终的O。

<强>代码:

a <- c("HELLO XO","DO HELLO","TWO XO","HO")
gsub("\\b(?!HELLO\\b)(\\w+)O\\b", "\\1", a, perl=TRUE)
[1] "HELLO X" "D HELLO" "TW X"    "H"

请注意,我们必须启用Perl模式(perl=TRUE)并使用gsub才能使用否定前瞻。

Demo

答案 1 :(得分:1)

使用正则表达式交替运算符|

a <- c("HELLO XO","DO HELLO","TWO XO","HO")
gsub("(HELLO)|O(?!\\S)", "\\1", a, perl=T)
# [1] "HELLO X" "D HELLO" "TW X"    "H"      

(HELLO)|O这个正则表达式做了两件事,

  1. 首先它会捕获所有HELLO字符串。

  2. 匹配所有剩余的0未跟随非空格字符。

答案 2 :(得分:1)

您的正则表达式是正确的。[^HELLO]表示除HELO之外的任何字符。但除了HELL之前,您只需要O。因此,您应该使用以下表达式:

a <- c("HELLO XO","DO HELLO","TWO XO","HO")
gsub("(?<!\\bHELL)O\\b", "", a, perl=TRUE)

答案 3 :(得分:0)

a <- c("HELLO XO","DO HELLO","TWO XO","HO")

aa <- gsub("O","",a)
gsub("HELL", "HELLO",aa)

答案 4 :(得分:0)

有点冗长,但你可以试试这个

a <- c("HELLO XO","DO HELLO","TWO XO","HO")
b <- lapply(a, function(x) unlist(strsplit(x, " ")))
b
> b
[[1]]
[1] "HELLO" "XO"   

[[2]]
[1] "DO"    "HELLO"

[[3]]
[1] "TWO" "XO" 

[[4]]
[1] "HO"


c <- unlist(lapply(b, function(y) paste(ifelse( y == "HELLO", "HELLO", gsub("O", "", y)), collapse = " " )))
c

[1] "HELLO X" "D HELLO" "TW X"    "H"