正则表达式:符号不能相互重复

时间:2014-01-11 12:13:37

标签: regex

我正在尝试使用正则表达式来挑选所有a-z和带或不带符号的单词。

  1. 该字必须至少为2个字符
  2. 不能以'symbol
  3. 开头
  4. 两个'符号不能彼此相邻
  5. 和“两个字符”字不能以'symbol
  6. 结尾

    我已经在那个正则表达式上工作了几个小时而且我无法使它工作:

    /\b[a-z]([a-z(\')](?!\1))+\b/
    

    它不起作用,我不知道为什么! (两个'符号彼此相邻)

    任何想法?

3 个答案:

答案 0 :(得分:0)

这应该有效(免责声明:未经测试)

/\b(?![a-z]{2}'\b)[a-z]((?!'')['a-z])+\b/

你的不是因为你试图在字符类中嵌套带括号的表达式。这只会在课程中添加(),而不会设置下一个\1代码的值。

(编辑)在aa'上添加了约束。

答案 1 :(得分:0)

([a-z](?:[a-z]|'(?!'))+[a-z']|[a-z]{2})

Live @ RegExPal

您可能不需要使用\b,因为正则表达式是贪婪的,并且会消耗所有单词。
此版本无法使用RegexPal进行测试(无法识别lookbehind)但具有自定义字边框:

(?<![a-z'])([a-z](?:[a-z]|'(?!'))+[a-z']|[a-z]{2})(?![a-z'])

答案 2 :(得分:0)

假设单词用空格分隔:

(?:^|\s)((?:[a-z]{2})|(?:[a-z](?!.*'')[a-z']{2,}))(?:$|\s)

在perl脚本中执行:

my $re = qr/(?:^|\s)((?:[a-z]{2})|(?:[a-z](?!.*'')[a-z']{2,}))(?:$|\s)/;
while(<DATA>) {
    chomp;
    say (/$re/ ? "OK: $_" : "KO: $_");
}
__DATA__
ab
abc
a'
ab''
abc'
a''b
:!ù

<强>输出:

OK: ab
OK: abc
KO: a'
OK: ab''
OK: abc'
KO: a''b
KO: :!ù

<强>解释

The regular expression:

(?-imsx:\b((?:[a-z]{2})|(?:[a-z](?!.*'')[a-z']{2,}))\b)

matches as follows:

NODE                     EXPLANATION
----------------------------------------------------------------------
(?-imsx:                 group, but do not capture (case-sensitive)
                         (with ^ and $ matching normally) (with . not
                         matching \n) (matching whitespace and #
                         normally):
----------------------------------------------------------------------
  \b                       the boundary between a word char (\w) and
                           something that is not a word char
----------------------------------------------------------------------
  (                        group and capture to \1:
----------------------------------------------------------------------
    (?:                      group, but do not capture:
----------------------------------------------------------------------
      [a-z]{2}                 any character of: 'a' to 'z' (2 times)
----------------------------------------------------------------------
    )                        end of grouping
----------------------------------------------------------------------
   |                        OR
----------------------------------------------------------------------
    (?:                      group, but do not capture:
----------------------------------------------------------------------
      [a-z]                    any character of: 'a' to 'z'
----------------------------------------------------------------------
      (?!                      look ahead to see if there is not:
----------------------------------------------------------------------
        .*                       any character except \n (0 or more
                                 times (matching the most amount
                                 possible))
----------------------------------------------------------------------
        ''                       '\'\''
----------------------------------------------------------------------
      )                        end of look-ahead
----------------------------------------------------------------------
      [a-z']{2,}               any character of: 'a' to 'z', ''' (at
                               least 2 times (matching the most
                               amount possible))
----------------------------------------------------------------------
    )                        end of grouping
----------------------------------------------------------------------
  )                        end of \1
----------------------------------------------------------------------
  \b                       the boundary between a word char (\w) and
                           something that is not a word char
----------------------------------------------------------------------
)                        end of grouping
----------------------------------------------------------------------