Question

我有php代码，要求搜索术语，拆分它，并生成一个正则表达式来匹配（和突出显示）模式。例如：
如果我输入ou，则会生成以下模式：(o)(.*)(u)。然后用<em>$1</em>$2<em>$3</em>替换它。在以下数据中

boau #fie diu1^^j dauijz16 abc123 wwx,usq

这会产生以下影响：

b<strong>o</strong>au #fie diu1^^j dauijz16 abc123 wwx,<strong>u</strong>sq

问题在于我希望能够限制例如匹配中允许的空格数量。例如，如果我将空格限制为3，则会产生以下结果：

b<strong>o</strong>au #fie diu1^^j da<strong>u</strong>ijz16 abc123 wwx,usq

或限制为3个空格，最多1个^：

b<strong>o</strong>au #fie di<strong>u</strong>1^^j dauijz16 abc123 wwx,usq

或者，不允许任何数字：

b<strong>o</strong>au #fie di<strong>u</strong>1^j dauijz16 abc123 wwx,usq

所以我希望能够输入要搜索的模式，并为某些字符指定单独的限制，但我不知道如何执行此操作。我认为这与前瞻有关，但我无法弄清楚如何使用它们。

Answer 1

要限制空格数，我会使用：

(o)((?:\S*\s){0,3}\S*)(u)

以下是使用它的perl脚本：

my $re = qr/(o)((?:\S*\s){0,3}\S*)(u)/;
my $str = 'boau #fie d iu1^^j dauij z16 abc123 wwx,usq';
$str =~ s!$re!<em>$1</em>$2<em>$3</em>!;
say $str;

<强>输出：

b<em>o</em>au #fie d i<em>u</em>1^^j dauij z16 abc123 wwx,usq

<强>解释

The regular expression:

(?-imsx:(o)((?:\S*\s){0,3}.*?)(u))

matches as follows:

NODE                     EXPLANATION
----------------------------------------------------------------------
(?-imsx:                 group, but do not capture (case-sensitive)
                         (with ^ and $ matching normally) (with . not
                         matching \n) (matching whitespace and #
                         normally):
----------------------------------------------------------------------
  (                        group and capture to \1:
----------------------------------------------------------------------
    o                        'o'
----------------------------------------------------------------------
  )                        end of \1
----------------------------------------------------------------------
  (                        group and capture to \2:
----------------------------------------------------------------------
    (?:                      group, but do not capture (between 0 and
                             3 times (matching the most amount
                             possible)):
----------------------------------------------------------------------
      \S*                      non-whitespace (all but \n, \r, \t,
                               \f, and " ") (0 or more times
                               (matching the most amount possible))
----------------------------------------------------------------------
      \s                       whitespace (\n, \r, \t, \f, and " ")
----------------------------------------------------------------------
    ){0,3}                   end of grouping
----------------------------------------------------------------------
    \S*                      non-whitespace (all but \n, \r, \t, \f,
                             and " ") (0 or more times (matching the
                             most amount possible))
----------------------------------------------------------------------
  )                        end of \2
----------------------------------------------------------------------
  (                        group and capture to \3:
----------------------------------------------------------------------
    u                        'u'
----------------------------------------------------------------------
  )                        end of \3
----------------------------------------------------------------------
)                        end of grouping
----------------------------------------------------------------------

Answer 2

你在这里问了很多问题。

我将回答看起来最复杂的那个，即如果我将空格限制为3 ：

您可以使用此正则表达式：

$s = 'boau #fie diu1^^j dauijz16 abc123 wwx,usq';
$r = preg_replace('/(o)((?:[^ ]* ){0,3}[^ u]*)(u)/', "<em>$1</em>$2<em>$3</em>", $s);
//=> b<em>o</em>au #fie diu1^^j da<em>u</em>ijz16 abc123 wwx,usq

<强>解释

1st Capturing group (o)
o matches the character o literally (case sensitive)
2nd Capturing group ((?:[^ ]* ){0,3}[^ u]*)
(?:[^ ]* ){0,3} Non-capturing group
Quantifier: Between 0 to 3 times
[^ ]* match a single character not present in the list below
Quantifier: Between zero and unlimited times, as many times as possible, giving back
as needed [greedy]
  the literal character  
  matches the character   literally
[^ u]* match a single character not present in the list below
Quantifier: Between zero and unlimited times, as many times as possible, giving back
as needed [greedy]
 u a single character in the list  u literally (case sensitive)
3rd Capturing group (u)
u matches the character u literally (case sensitive)

此输出与预期结果匹配。我希望您可以使用相同的方法，并使用此方法为问题的其他部分构建正则表达式。

Answer 3

你可以使用否定的类：

(o)((?:[^ ]* ){0,3}[^ ]*)(u)

限制在3个空格。

regex101 demo

(o)(\D*)(u)

没有数字。 \D匹配除数字之外的任何字符。请注意，它等同于否定的类：[^\d]。

第二个要求比上述要求复杂得多：

(o)([^ ^]*(?:(\^)|( ))?[^ ^]*(?(3) |(?:( )|(\^)))?[^ ^]*(?(6) |(?:( )|(\^)))?[^ ^]*(?(8) |\^)?[^ ^]*)(u)

它会尝试匹配^或空格，并根据捕获的内容，决定是否可以匹配其他空格或插入符号。

regex101 demo

此正则表达式使用条件组，并非所有正则表达式引擎都支持。

正如您所看到的，一个限制非常简单，但多个限制将很快失控。如果你有多个条件，我会建议一个状态机，例如，伪代码：

match first character "o"
substring = "o"

statecaret = 0
statespace = 0

for (check next character)
    if character == "^"
        statecaret = statecaret + 1
    else if character == " "
        statespace = statespace + 1

    if (statecaret = 2 || statespace = 4)
        break and reject character
    else
        add character to substring

find last "u" in substring

正则表达式匹配所有但限制某些字符

3 个答案: