正则表达式匹配所有但限制某些字符

时间:2014-01-31 09:51:33

标签: regex

我有php代码,要求搜索术语,拆分它,并生成一个正则表达式来匹配(和突出显示)模式。例如:
如果我输入ou,则会生成以下模式:(o)(.*)(u)。然后用<em>$1</em>$2<em>$3</em>替换它。 在以下数据中

boau #fie diu1^^j dauijz16 abc123 wwx,usq

这会产生以下影响:

b<strong>o</strong>au #fie diu1^^j dauijz16 abc123 wwx,<strong>u</strong>sq

问题在于我希望能够限制例如匹配中允许的空格数量。例如,如果我将空格限制为3,则会产生以下结果:

b<strong>o</strong>au #fie diu1^^j da<strong>u</strong>ijz16 abc123 wwx,usq

或限制为3个空格,最多1个^

b<strong>o</strong>au #fie di<strong>u</strong>1^^j dauijz16 abc123 wwx,usq

或者,不允许任何数字:

b<strong>o</strong>au #fie di<strong>u</strong>1^j dauijz16 abc123 wwx,usq

所以我希望能够输入要搜索的模式,并为某些字符指定单独的限制,但我不知道如何执行此操作。我认为这与前瞻有关,但我无法弄清楚如何使用它们。

3 个答案:

答案 0 :(得分:1)

要限制空格数,我会使用:

(o)((?:\S*\s){0,3}\S*)(u)

以下是使用它的perl脚本:

my $re = qr/(o)((?:\S*\s){0,3}\S*)(u)/;
my $str = 'boau #fie d iu1^^j dauij z16 abc123 wwx,usq';
$str =~ s!$re!<em>$1</em>$2<em>$3</em>!;
say $str;

<强>输出:

b<em>o</em>au #fie d i<em>u</em>1^^j dauij z16 abc123 wwx,usq

<强>解释

The regular expression:

(?-imsx:(o)((?:\S*\s){0,3}.*?)(u))

matches as follows:

NODE                     EXPLANATION
----------------------------------------------------------------------
(?-imsx:                 group, but do not capture (case-sensitive)
                         (with ^ and $ matching normally) (with . not
                         matching \n) (matching whitespace and #
                         normally):
----------------------------------------------------------------------
  (                        group and capture to \1:
----------------------------------------------------------------------
    o                        'o'
----------------------------------------------------------------------
  )                        end of \1
----------------------------------------------------------------------
  (                        group and capture to \2:
----------------------------------------------------------------------
    (?:                      group, but do not capture (between 0 and
                             3 times (matching the most amount
                             possible)):
----------------------------------------------------------------------
      \S*                      non-whitespace (all but \n, \r, \t,
                               \f, and " ") (0 or more times
                               (matching the most amount possible))
----------------------------------------------------------------------
      \s                       whitespace (\n, \r, \t, \f, and " ")
----------------------------------------------------------------------
    ){0,3}                   end of grouping
----------------------------------------------------------------------
    \S*                      non-whitespace (all but \n, \r, \t, \f,
                             and " ") (0 or more times (matching the
                             most amount possible))
----------------------------------------------------------------------
  )                        end of \2
----------------------------------------------------------------------
  (                        group and capture to \3:
----------------------------------------------------------------------
    u                        'u'
----------------------------------------------------------------------
  )                        end of \3
----------------------------------------------------------------------
)                        end of grouping
----------------------------------------------------------------------

答案 1 :(得分:0)

你在这里问了很多问题。

我将回答看起来最复杂的那个,即如果我将空格限制为3

您可以使用此正则表达式:

$s = 'boau #fie diu1^^j dauijz16 abc123 wwx,usq';
$r = preg_replace('/(o)((?:[^ ]* ){0,3}[^ u]*)(u)/', "<em>$1</em>$2<em>$3</em>", $s);
//=> b<em>o</em>au #fie diu1^^j da<em>u</em>ijz16 abc123 wwx,usq

<强>解释

1st Capturing group (o)
o matches the character o literally (case sensitive)
2nd Capturing group ((?:[^ ]* ){0,3}[^ u]*)
(?:[^ ]* ){0,3} Non-capturing group
Quantifier: Between 0 to 3 times
[^ ]* match a single character not present in the list below
Quantifier: Between zero and unlimited times, as many times as possible, giving back
as needed [greedy]
  the literal character  
  matches the character   literally
[^ u]* match a single character not present in the list below
Quantifier: Between zero and unlimited times, as many times as possible, giving back
as needed [greedy]
 u a single character in the list  u literally (case sensitive)
3rd Capturing group (u)
u matches the character u literally (case sensitive)

此输出与预期结果匹配。我希望您可以使用相同的方法,并使用此方法为问题的其他部分构建正则表达式。

答案 2 :(得分:0)

你可以使用否定的类:

(o)((?:[^ ]* ){0,3}[^ ]*)(u)

限制在3个空格。

regex101 demo

(o)(\D*)(u)

没有数字。 \D匹配除数字之外的任何字符。请注意,它等同于否定的类:[^\d]

第二个要求比上述要求复杂得多:

(o)([^ ^]*(?:(\^)|( ))?[^ ^]*(?(3) |(?:( )|(\^)))?[^ ^]*(?(6) |(?:( )|(\^)))?[^ ^]*(?(8) |\^)?[^ ^]*)(u)

它会尝试匹配^或空格,并根据捕获的内容,决定是否可以匹配其他空格或插入符号。

regex101 demo

此正则表达式使用条件组,并非所有正则表达式引擎都支持。

正如您所看到的,一个限制非常简单,但多个限制将很快失控。如果你有多个条件,我会建议一个状态机,例如,伪代码:

match first character "o"
substring = "o"

statecaret = 0
statespace = 0

for (check next character)
    if character == "^"
        statecaret = statecaret + 1
    else if character == " "
        statespace = statespace + 1

    if (statecaret = 2 || statespace = 4)
        break and reject character
    else
        add character to substring

find last "u" in substring