Question

我有一个大文：

"Big piece of text. This sentence includes 'regexp' word. And this
sentence doesn't include that word"

我需要找到以“此”开头的子字符串，以“字”结尾，但不包含字词“正则表达式”。

在这种情况下，字符串：“this sentence doesn't include that word”正是我想要接收的。

如何通过正则表达式执行此操作？

Answer 1

使用ignore case选项，以下内容应该有效：

\bthis\b(?:(?!\bregexp\b).)*?\bword\b

示例：http://www.rubular.com/r/g6tYcOy8IT

说明：

\bthis\b           # match the word 'this', \b is for word boundaries
(?:                # start group, repeated zero or more times, as few as possible
   (?!\bregexp\b)    # fail if 'regexp' can be matched (negative lookahead)
   .                 # match any single character
)*?                # end group
\bword\b           # match 'word'

每个单词周围的\b确保您不匹配子字符串，例如匹配'thistle'中的'this'或'wordy'中的'word'。

这可以通过检查起始单词和结束单词之间的每个字符来确保排除的单词不会发生。

Answer 2

使用前瞻性断言。

如果要检查字符串是否包含其他子字符串，可以写：

/^(?!.*substring)/

您还必须检查this和word行的开头和结尾：

/^this(?!.*substring).*word$/

这里的另一个问题是你找不到字符串，你想找到句子（如果我理解你的任务正确的话）。

所以解决方案看起来像这样：

perl -e '
  local $/;
  $_=<>;
  while($_ =~ /(.*?[.])/g) { 
    $s=$1;
    print $s if $s =~ /^this(?!.*substring).*word[.]$/
  };'

使用示例：

$ cat 1.pl
local $/;
$_=<>;
while($_ =~ /(.*?[.])/g) {
    $s=$1;
    print $s if $s =~ /^\s*this(?!.*regexp).*word[.]/i;
};

$ cat 1.txt
This sentence has the "regexp" word. This sentence doesn't have the word. This sentence does have the "regexp" word again.

$ cat 1.txt | perl 1.pl 
 This sentence doesn't have the word.

正则表达式：查找不带子串的字符串

2 个答案: