我正在尝试使用正则表达式来挑选所有a-z和带或不带符号的单词。
我已经在那个正则表达式上工作了几个小时而且我无法使它工作:
/\b[a-z]([a-z(\')](?!\1))+\b/
它不起作用,我不知道为什么! (两个'符号彼此相邻)
任何想法?
答案 0 :(得分:0)
这应该有效(免责声明:未经测试)
/\b(?![a-z]{2}'\b)[a-z]((?!'')['a-z])+\b/
你的不是因为你试图在字符类中嵌套带括号的表达式。这只会在课程中添加(
和)
,而不会设置下一个\1
代码的值。
(编辑)在aa'
上添加了约束。
答案 1 :(得分:0)
([a-z](?:[a-z]|'(?!'))+[a-z']|[a-z]{2})
您可能不需要使用\b
,因为正则表达式是贪婪的,并且会消耗所有单词。
此版本无法使用RegexPal进行测试(无法识别lookbehind)但具有自定义字边框:
(?<![a-z'])([a-z](?:[a-z]|'(?!'))+[a-z']|[a-z]{2})(?![a-z'])
答案 2 :(得分:0)
假设单词用空格分隔:
(?:^|\s)((?:[a-z]{2})|(?:[a-z](?!.*'')[a-z']{2,}))(?:$|\s)
在perl脚本中执行:
my $re = qr/(?:^|\s)((?:[a-z]{2})|(?:[a-z](?!.*'')[a-z']{2,}))(?:$|\s)/;
while(<DATA>) {
chomp;
say (/$re/ ? "OK: $_" : "KO: $_");
}
__DATA__
ab
abc
a'
ab''
abc'
a''b
:!ù
<强>输出:强>
OK: ab
OK: abc
KO: a'
OK: ab''
OK: abc'
KO: a''b
KO: :!ù
<强>解释强>
The regular expression:
(?-imsx:\b((?:[a-z]{2})|(?:[a-z](?!.*'')[a-z']{2,}))\b)
matches as follows:
NODE EXPLANATION
----------------------------------------------------------------------
(?-imsx: group, but do not capture (case-sensitive)
(with ^ and $ matching normally) (with . not
matching \n) (matching whitespace and #
normally):
----------------------------------------------------------------------
\b the boundary between a word char (\w) and
something that is not a word char
----------------------------------------------------------------------
( group and capture to \1:
----------------------------------------------------------------------
(?: group, but do not capture:
----------------------------------------------------------------------
[a-z]{2} any character of: 'a' to 'z' (2 times)
----------------------------------------------------------------------
) end of grouping
----------------------------------------------------------------------
| OR
----------------------------------------------------------------------
(?: group, but do not capture:
----------------------------------------------------------------------
[a-z] any character of: 'a' to 'z'
----------------------------------------------------------------------
(?! look ahead to see if there is not:
----------------------------------------------------------------------
.* any character except \n (0 or more
times (matching the most amount
possible))
----------------------------------------------------------------------
'' '\'\''
----------------------------------------------------------------------
) end of look-ahead
----------------------------------------------------------------------
[a-z']{2,} any character of: 'a' to 'z', ''' (at
least 2 times (matching the most
amount possible))
----------------------------------------------------------------------
) end of grouping
----------------------------------------------------------------------
) end of \1
----------------------------------------------------------------------
\b the boundary between a word char (\w) and
something that is not a word char
----------------------------------------------------------------------
) end of grouping
----------------------------------------------------------------------