如何使用lookbehind交替

时间:2017-10-30 08:21:40

标签: r regex

目标:

我想将句子与'no'一词匹配,但只有在'没有'之前没有'带'或'有'或'有'在r之前。

输入:

NSNumber

预期输出:

The ground was rocky with no cracks in it
No diggedy, no doubt
Understandably, there is no way an elephant can be green

尝试:

The ground was rocky with no cracks in it
Understandably, there is no way an elephant can be green

问题:

负面的后视似乎被忽略,所以所有的句子都被替换了。问题是在lookbehind声明中使用交替吗?

2 个答案:

答案 0 :(得分:3)

你只需要一个正则表达式替换字符。想法是匹配和捕捉所有可能的"不"句子并匹配所有剩余的句子。然后将所有匹配的字符替换为\\1,即第一个捕获组中的字符。

gsub("(?i)(.*(with|there (?:is|are)) no\\b.*)|.*", "\\1" ,string, perl=T)

DEMO

示例:

x <- "The ground was rocky with no cracks in it\nNo diggedy, no doubt\nUnderstandably, there is no way an elephant can be green"
gsub("(?i)(.*(with|there (?:is|are)) no\\b.*\\n?)|.*\\n?", "\\1" ,x, perl=T)
# [1] "The ground was rocky with no cracks in it\nUnderstandably, there is no way an elephant can be green"

答案 1 :(得分:2)

您可以使用

(?mxi)^       # Start of a line (and free-spacing/case insensitive modes are on)
(?:           # Outer container group start
  (?!.*\b(?:with|there\h(?:is|are))\h+no\b) # no 'with/there is/are no' before 'no'
  .*\bno\b  # 'no' whole word after 0+ chars
  (?![?:])    # cannot be followed with ? or :
|             # or
  .*          # any 0+ chars
  [?:]\h*n(?![a-z]) # ? or : followed with 0+ spaces, 'n' not followed with any letter
)             # container group end
.*            # the rest of the line and 
\R*           # 0+ line breaks

请参阅regex demo。简而言之:该模式找到2种替代品,两种类型的行中的一种,其中一种no整个单词,其前面没有withthere isthere are以及它们之后的空格,或包含?:后跟0 +水平空格(\h)的行,然后是n未跟随任何其他字母的行。

请参阅R demo

sentences <- "The ground was rocky with no cracks in it\r\nNo diggedy, no doubt\r\nUnderstandably, there is no way an elephant can be green"
rx <- "(?mxi)^ # Start of a line
(?:            # Outer container group start
  (?!.*\\b(?:with|there\\h(?:is|are))\\h+no\\b) # no 'with/there is/are no' before 'no'
  .*\\bno\\b   # 'no' whole word after 0+ chars
  (?![?:])     # cannot be followed with ? or :
|              # or
  .*           # any 0+ chars
  [?:]\\h*n(?![a-z]) # ? or : followed with 0+ spaces, 'n' not followed with any letter
)              # container group end
.*             # the rest of the line and 0+ line breaks
\\R*"
res <- gsub(rx, "", sentences, perl=TRUE)
cat(res, sep="\n")

输出:

The ground was rocky with no cracks in it
Understandably, there is no way an elephant can be green

感谢x修饰符,您可以为正则表达式模式添加注释,并使用空格格式化它以获得更好的可读性。请注意,所有文字空格必须替换为\\h(水平空白),\\s(任何空格),\\n(LF),\\r(CR)等。让它以这种模式运作。

(?i)修饰符代表ingore.case=TRUE