目标:
我想将句子与'no'一词匹配,但只有在'没有'之前没有'带'或'有'或'有'在r之前。
输入:
NSNumber
预期输出:
The ground was rocky with no cracks in it
No diggedy, no doubt
Understandably, there is no way an elephant can be green
尝试:
The ground was rocky with no cracks in it
Understandably, there is no way an elephant can be green
问题:
负面的后视似乎被忽略,所以所有的句子都被替换了。问题是在lookbehind声明中使用交替吗?
答案 0 :(得分:3)
你只需要一个正则表达式替换字符。想法是匹配和捕捉所有可能的"不"句子并匹配所有剩余的句子。然后将所有匹配的字符替换为\\1
,即第一个捕获组中的字符。
gsub("(?i)(.*(with|there (?:is|are)) no\\b.*)|.*", "\\1" ,string, perl=T)
示例:
x <- "The ground was rocky with no cracks in it\nNo diggedy, no doubt\nUnderstandably, there is no way an elephant can be green"
gsub("(?i)(.*(with|there (?:is|are)) no\\b.*\\n?)|.*\\n?", "\\1" ,x, perl=T)
# [1] "The ground was rocky with no cracks in it\nUnderstandably, there is no way an elephant can be green"
答案 1 :(得分:2)
您可以使用
(?mxi)^ # Start of a line (and free-spacing/case insensitive modes are on)
(?: # Outer container group start
(?!.*\b(?:with|there\h(?:is|are))\h+no\b) # no 'with/there is/are no' before 'no'
.*\bno\b # 'no' whole word after 0+ chars
(?![?:]) # cannot be followed with ? or :
| # or
.* # any 0+ chars
[?:]\h*n(?![a-z]) # ? or : followed with 0+ spaces, 'n' not followed with any letter
) # container group end
.* # the rest of the line and
\R* # 0+ line breaks
请参阅regex demo。简而言之:该模式找到2种替代品,两种类型的行中的一种,其中一种no
整个单词,其前面没有with
,there is
或there are
以及它们之后的空格,或包含?
或:
后跟0 +水平空格(\h
)的行,然后是n
未跟随任何其他字母的行。
请参阅R demo:
sentences <- "The ground was rocky with no cracks in it\r\nNo diggedy, no doubt\r\nUnderstandably, there is no way an elephant can be green"
rx <- "(?mxi)^ # Start of a line
(?: # Outer container group start
(?!.*\\b(?:with|there\\h(?:is|are))\\h+no\\b) # no 'with/there is/are no' before 'no'
.*\\bno\\b # 'no' whole word after 0+ chars
(?![?:]) # cannot be followed with ? or :
| # or
.* # any 0+ chars
[?:]\\h*n(?![a-z]) # ? or : followed with 0+ spaces, 'n' not followed with any letter
) # container group end
.* # the rest of the line and 0+ line breaks
\\R*"
res <- gsub(rx, "", sentences, perl=TRUE)
cat(res, sep="\n")
输出:
The ground was rocky with no cracks in it
Understandably, there is no way an elephant can be green
感谢x
修饰符,您可以为正则表达式模式添加注释,并使用空格格式化它以获得更好的可读性。请注意,所有文字空格必须替换为\\h
(水平空白),\\s
(任何空格),\\n
(LF),\\r
(CR)等。让它以这种模式运作。
(?i)
修饰符代表ingore.case=TRUE
。