我有php代码,要求搜索术语,拆分它,并生成一个正则表达式来匹配(和突出显示)模式。例如:
如果我输入ou
,则会生成以下模式:(o)(.*)(u)
。然后用<em>$1</em>$2<em>$3</em>
替换它。
在以下数据中
boau #fie diu1^^j dauijz16 abc123 wwx,usq
这会产生以下影响:
b<strong>o</strong>au #fie diu1^^j dauijz16 abc123 wwx,<strong>u</strong>sq
问题在于我希望能够限制例如匹配中允许的空格数量。例如,如果我将空格限制为3,则会产生以下结果:
b<strong>o</strong>au #fie diu1^^j da<strong>u</strong>ijz16 abc123 wwx,usq
或限制为3个空格,最多1个^
:
b<strong>o</strong>au #fie di<strong>u</strong>1^^j dauijz16 abc123 wwx,usq
或者,不允许任何数字:
b<strong>o</strong>au #fie di<strong>u</strong>1^j dauijz16 abc123 wwx,usq
所以我希望能够输入要搜索的模式,并为某些字符指定单独的限制,但我不知道如何执行此操作。我认为这与前瞻有关,但我无法弄清楚如何使用它们。
答案 0 :(得分:1)
要限制空格数,我会使用:
(o)((?:\S*\s){0,3}\S*)(u)
以下是使用它的perl脚本:
my $re = qr/(o)((?:\S*\s){0,3}\S*)(u)/;
my $str = 'boau #fie d iu1^^j dauij z16 abc123 wwx,usq';
$str =~ s!$re!<em>$1</em>$2<em>$3</em>!;
say $str;
<强>输出:强>
b<em>o</em>au #fie d i<em>u</em>1^^j dauij z16 abc123 wwx,usq
<强>解释强>
The regular expression:
(?-imsx:(o)((?:\S*\s){0,3}.*?)(u))
matches as follows:
NODE EXPLANATION
----------------------------------------------------------------------
(?-imsx: group, but do not capture (case-sensitive)
(with ^ and $ matching normally) (with . not
matching \n) (matching whitespace and #
normally):
----------------------------------------------------------------------
( group and capture to \1:
----------------------------------------------------------------------
o 'o'
----------------------------------------------------------------------
) end of \1
----------------------------------------------------------------------
( group and capture to \2:
----------------------------------------------------------------------
(?: group, but do not capture (between 0 and
3 times (matching the most amount
possible)):
----------------------------------------------------------------------
\S* non-whitespace (all but \n, \r, \t,
\f, and " ") (0 or more times
(matching the most amount possible))
----------------------------------------------------------------------
\s whitespace (\n, \r, \t, \f, and " ")
----------------------------------------------------------------------
){0,3} end of grouping
----------------------------------------------------------------------
\S* non-whitespace (all but \n, \r, \t, \f,
and " ") (0 or more times (matching the
most amount possible))
----------------------------------------------------------------------
) end of \2
----------------------------------------------------------------------
( group and capture to \3:
----------------------------------------------------------------------
u 'u'
----------------------------------------------------------------------
) end of \3
----------------------------------------------------------------------
) end of grouping
----------------------------------------------------------------------
答案 1 :(得分:0)
你在这里问了很多问题。
我将回答看起来最复杂的那个,即如果我将空格限制为3 :
您可以使用此正则表达式:
$s = 'boau #fie diu1^^j dauijz16 abc123 wwx,usq';
$r = preg_replace('/(o)((?:[^ ]* ){0,3}[^ u]*)(u)/', "<em>$1</em>$2<em>$3</em>", $s);
//=> b<em>o</em>au #fie diu1^^j da<em>u</em>ijz16 abc123 wwx,usq
<强>解释强>
1st Capturing group (o)
o matches the character o literally (case sensitive)
2nd Capturing group ((?:[^ ]* ){0,3}[^ u]*)
(?:[^ ]* ){0,3} Non-capturing group
Quantifier: Between 0 to 3 times
[^ ]* match a single character not present in the list below
Quantifier: Between zero and unlimited times, as many times as possible, giving back
as needed [greedy]
the literal character
matches the character literally
[^ u]* match a single character not present in the list below
Quantifier: Between zero and unlimited times, as many times as possible, giving back
as needed [greedy]
u a single character in the list u literally (case sensitive)
3rd Capturing group (u)
u matches the character u literally (case sensitive)
此输出与预期结果匹配。我希望您可以使用相同的方法,并使用此方法为问题的其他部分构建正则表达式。
答案 2 :(得分:0)
你可以使用否定的类:
(o)((?:[^ ]* ){0,3}[^ ]*)(u)
限制在3个空格。
(o)(\D*)(u)
没有数字。 \D
匹配除数字之外的任何字符。请注意,它等同于否定的类:[^\d]
。
第二个要求比上述要求复杂得多:
(o)([^ ^]*(?:(\^)|( ))?[^ ^]*(?(3) |(?:( )|(\^)))?[^ ^]*(?(6) |(?:( )|(\^)))?[^ ^]*(?(8) |\^)?[^ ^]*)(u)
它会尝试匹配^
或空格,并根据捕获的内容,决定是否可以匹配其他空格或插入符号。
此正则表达式使用条件组,并非所有正则表达式引擎都支持。
正如您所看到的,一个限制非常简单,但多个限制将很快失控。如果你有多个条件,我会建议一个状态机,例如,伪代码:
match first character "o"
substring = "o"
statecaret = 0
statespace = 0
for (check next character)
if character == "^"
statecaret = statecaret + 1
else if character == " "
statespace = statespace + 1
if (statecaret = 2 || statespace = 4)
break and reject character
else
add character to substring
find last "u" in substring