Question

我正在试图找出一个程序员卡在preg_match上的客户端的问题。我不是特别擅长这些，但我的解决方案显然不起作用。这是他的要求：

非常简单的工作。需要一个正则表达式preg_match，它匹配不在html标记中或链接的锚文本的一部分的字符串的所有情况。

例如，如果我们有字符串：

Blah blah needle blah blah <div id='needle'>blah blah <a href='#'>needle</a> blah needle</div>

preg_match应该只找到2个针的实例。

这是我的解决方案，它不能满足他们的需求：

<?php
// The string
$string = "Blah blah needle blah blah <div id='needle'>blah blah <a href='#'>needle</a> blah needle</div>";

// You need everything outside of the tags, so let's get rid of the tags
// and everything in between.
$new_string = preg_replace("/<.*>.*<\/.*>/msU","",$string);

// Now let's match 'needle'
preg_match_all("/needle/msU",$new_string,$matches);

var_export($matches);
?>

我被告知它没有用，因为它“在匹配之前删除了所有的html，所以结果是未格式化的HTML“。我不知道为什么他们不能做$ string2 = $ string;并将HTML字符串存储在别处供以后使用。我也不知道为什么那会很重要因为它只是一个preg_match而不是他们正在寻找的preg_replace。我想如果有人可以帮助我只使用一行preg_match_all，我会非常感激。

谢谢;]

Answer 1

您可以使用此代码：

$pattern = <<<'LOD'
~
  (?>  ### all that you want to skip ###

      <a\b [^>]*+ >             # opening "a" tag
      (?> [^<]++ | <(?!/a>) )*+ # possible content between "a" tags
      </a>                      # closing "a" tag
    |
      < [^>]++ >                # other tags
   ) (*SKIP)(*FAIL)  # forces the precedent subpattern to fail and 
                     # forbid to retry the substring with another subpattern
|
  needle
~x
LOD;

preg_match_all($pattern, $string, $matches);

print_r($matches);

需要preg_match帮助5

1 个答案: