Question

我试图编写一个正则表达式，它允许单个连字符和单个空格只在单词内但不在单词的开头或结尾。

我以为我的排序是从我昨天得到的answer中排序的，但我刚才意识到有一个小错误，我不太明白，

为什么它不会接受像

这样的输入

'forum-category-b forum-category-a'
'forum-category-b Counter-terrorism'
'forum-category-a Preventing'
'forum-category-a Preventing Violent'
'forum-category-a International-Research-and-Publications'
'International-Research-and-Publications forum-category-b forum-category-a'

但需要，

'forum-category-b'
'Counter-terrorism forum-category-a'
'Preventing forum-category-a'
'Preventing Violent forum-category-a'
'International-Research-and-Publications forum-category-b'

为什么？我该如何解决？它下面是初始测试的正则表达式，但理想情况下它应该接受上面的所有组合输入，

$aWords = array(
    'a',
    '---stack---over---flow---',
    '   stack    over    flow',
    'stack-over-flow',
    'stack over flow',
    'stacoverflow'
);

foreach($aWords as $sWord) {
    if (preg_match('/^(\w+([\s-]\w+)?)+$/', $sWord)) {
        echo 'pass: ' . $sWord . "\n";
    } else {
        echo 'fail: ' . $sWord . "\n";
    }
}

接受/拒绝如下所示的输入，

---stack---over---flow---
stack-over-flow- stack-over-flow2
   stack    over    flow

感谢。

Answer 1

你的模式不符合你的要求。让我们分开吧：

^(\w+([\s-]\w+)?)+$

它匹配仅由模式的一个或多个序列组成的字符串：

\w+([\s-]\w+)?

...这是一个单词字符序列，可选地由另一个单词字符序列组成，用一个空格或短划线字符分隔。

换句话说，您的模式搜索字符串，如：

xxx-xxxyyy-yyyzzz zzz

...但您打算编写一个可以找到的模式：

xxx-xxxxxx-xxxxxx yyy

在您的示例中，这个匹配：

Counter-terrorism forum-category-a

...但它被解释为以下序列：

(Counter(-terroris)) (m( foru)) (m(-categor) (y(-a))

正如您所看到的，该模式并没有找到您正在寻找的单词。

此示例不匹配：

forum-category-a Preventing Violent

...因为当遇到单个单词字符后跟空格或短划线时，模式不能形成“单词字符，空格或短划线，单词字符”组：

(forum(-categor)) (y(-a)) <Mismatch: Found " " but expected "\w">

如果你要将另一个角色添加到“forum-category-a”，比如说“forum-category-ax”，那么它会再次匹配，因为它可能会分裂为“ax”：

(forum(-categor)) (y(-a)) (x( Preventin)) (g( Violent))

您真正感兴趣的是像

这样的模式

^(\w+(-\w+)*)(\s\w+(-\w+)*)*$

...它会找到一个可能包含破折号的单词序列，用空格分隔：

(forum(-category)(-a)) ( Preventing) ( Violent)

顺便说一句，我使用Python脚本测试了这个，并且在尝试将您的模式与示例字符串“International-Research-and-Publications forum-category-b forum-category-a”匹配时，正则表达式引擎似乎陷入无限循环......

import re
expr = re.compile(r'^(\w+([\s-]\w+)?)+$')
expr.match('International-Research-and-Publications forum-category-b forum-category-a')

Answer 2

模式的一部分([\s-]\w+)?是问题所在。它只允许一次重复（尾随?）。尝试将上一个?更改为*，看看是否有帮助。

不，我仍然认为这是问题所在。原始模式正在寻找“单词”或“单词[space_hyphen]单词”重复1次以上。这很奇怪，因为模式应该属于另一个匹配。但是切换问号worked for me。

Answer 3

这个问题应该只有一个答案：

/^((?<=\w)[ -]\w|[^ -])+$/

只有一条规则如\w[ -]\w所述，就是这样。并且它基于每个字符粒度，并且不能反其他。为其余部分添加[^ - ]。

带有正则表达式的PHP preg_match：只有单个连字符和单词之间的空格继续

3 个答案: