Question

我正在尝试使用正则表达式来挑选所有a-z和带或不带符号的单词。

该字必须至少为2个字符
不能以'symbol
两个'符号不能彼此相邻
和“两个字符”字不能以'symbol

我已经在那个正则表达式上工作了几个小时而且我无法使它工作：

/\b[a-z]([a-z(\')](?!\1))+\b/

它不起作用，我不知道为什么！（两个'符号彼此相邻）

任何想法？

Answer 1

这应该有效（免责声明：未经测试）

/\b(?![a-z]{2}'\b)[a-z]((?!'')['a-z])+\b/

你的不是因为你试图在字符类中嵌套带括号的表达式。这只会在课程中添加(和)，而不会设置下一个\1代码的值。

（编辑）在aa'上添加了约束。

Answer 2

([a-z](?:[a-z]|'(?!'))+[a-z']|[a-z]{2})

Live @ RegExPal

您可能不需要使用\b，因为正则表达式是贪婪的，并且会消耗所有单词。
此版本无法使用RegexPal进行测试（无法识别lookbehind）但具有自定义字边框：

(?<![a-z'])([a-z](?:[a-z]|'(?!'))+[a-z']|[a-z]{2})(?![a-z'])

Answer 3

假设单词用空格分隔：

(?:^|\s)((?:[a-z]{2})|(?:[a-z](?!.*'')[a-z']{2,}))(?:$|\s)

在perl脚本中执行：

my $re = qr/(?:^|\s)((?:[a-z]{2})|(?:[a-z](?!.*'')[a-z']{2,}))(?:$|\s)/;
while(<DATA>) {
    chomp;
    say (/$re/ ? "OK: $_" : "KO: $_");
}
__DATA__
ab
abc
a'
ab''
abc'
a''b
:!ù

<强>输出：

OK: ab
OK: abc
KO: a'
OK: ab''
OK: abc'
KO: a''b
KO: :!ù

<强>解释

The regular expression:

(?-imsx:\b((?:[a-z]{2})|(?:[a-z](?!.*'')[a-z']{2,}))\b)

matches as follows:

NODE                     EXPLANATION
----------------------------------------------------------------------
(?-imsx:                 group, but do not capture (case-sensitive)
                         (with ^ and $ matching normally) (with . not
                         matching \n) (matching whitespace and #
                         normally):
----------------------------------------------------------------------
  \b                       the boundary between a word char (\w) and
                           something that is not a word char
----------------------------------------------------------------------
  (                        group and capture to \1:
----------------------------------------------------------------------
    (?:                      group, but do not capture:
----------------------------------------------------------------------
      [a-z]{2}                 any character of: 'a' to 'z' (2 times)
----------------------------------------------------------------------
    )                        end of grouping
----------------------------------------------------------------------
   |                        OR
----------------------------------------------------------------------
    (?:                      group, but do not capture:
----------------------------------------------------------------------
      [a-z]                    any character of: 'a' to 'z'
----------------------------------------------------------------------
      (?!                      look ahead to see if there is not:
----------------------------------------------------------------------
        .*                       any character except \n (0 or more
                                 times (matching the most amount
                                 possible))
----------------------------------------------------------------------
        ''                       '\'\''
----------------------------------------------------------------------
      )                        end of look-ahead
----------------------------------------------------------------------
      [a-z']{2,}               any character of: 'a' to 'z', ''' (at
                               least 2 times (matching the most
                               amount possible))
----------------------------------------------------------------------
    )                        end of grouping
----------------------------------------------------------------------
  )                        end of \1
----------------------------------------------------------------------
  \b                       the boundary between a word char (\w) and
                           something that is not a word char
----------------------------------------------------------------------
)                        end of grouping
----------------------------------------------------------------------

正则表达式：符号不能相互重复

3 个答案: