PHP REGEX在<p> -Tags </p>中用URL替换字符串

时间:2014-12-03 13:44:15

标签: php regex

Heyho,

我想用链接替换一些单词,但只能在前3个p标签($ limit_p = 3)中,并且只在p标签中首次出现。 wordlist和linklist在不同的数组中。我有一个preg_replace_callback函数来替换它。 它工作正常,但如果一个单词是另一个单词的一部分并且每次都替换它,则会出现一些问题:

$text = "<p>Lorem ipsum Hello World lorem ipsum.</p><p>Hello you</p>";
$arr1 = array('/ Hello World '/,'/ Hello /');
$arr2 = array(' <a href="link2">Hello World"</a> ',' <a href="link1">Hello</a> ');
$limit_p = 3;
$limit_tag = 1;
$res = preg_replace_callback(
    '/(<p[^>]*>)(.+?)(<\/p>)/Ui', 
      function ($m) use (&$arr1, &$arr2, &$limit_tag) {
        list (, $s, $t, $e) = $m;
        $t = preg_replace($arr1, $arr2, $t, $limit_tag);
        //$t = str_replace($find, $repl, $t);
        return "$s$t$e";
      },
      $text, $limit_p
    );

我得到的是:

<p>Lorem ipsum <a href="link2"><a href="link1">Hello</a> World</a> lorem ipsum.</p><p><a href="link1">Hello</a> you</p>

我想要的是:

<p>Lorem ipsum <a href="link2">Hello World</a> lorem ipsum.</p><p><a href="link1">Hello</a> you</p>

所以我只想替换它,如果它不在a-tag中。 如果同一个单词在2个p标签中,它会被替换两次,这是我不想要的。只应替换第一次出现。

请帮帮我吗?

非常感谢!

我现在在Niet的帮助下得到了这个解决方案:

$dom = new DOMDocument();
    // loadXml needs properly formatted documents, so it's better to use loadHtml, but it needs a hack to properly handle UTF-8 encoding
    $previous_value = libxml_use_internal_errors(TRUE);
    $dom->loadHtml(mb_convert_encoding($text, 'HTML-ENTITIES', "UTF-8"));
    libxml_clear_errors();
    libxml_use_internal_errors($previous_value);
    $xpath = new DOMXPath($dom);
    foreach($xpath->query('//text()[not(ancestor::a) and (ancestor::p) and not(ancestor::strong)]') as $node)
    {
            $replaced = preg_replace_callback(
                '/\b(?:('.implode(')|(',$arr1).'))\b/',
                function($m) use (&$arr1,&$arr2) {
                    // find which pattern matched
                    array_shift($m);
                    $result = array_filter($m);
                    $keys = array_keys($result);
                    $matched = $keys[0];
                    // apply match and remove from search list
                    $result = @$arr2[$matched];
                    unset($arr1[$matched], $arr2[$matched]);
                    return $result;
                },
                $node->wholeText, -1
            );
        //$replaced = str_ireplace('match this text', 'MATCH', $node->wholeText);
        $newNode  = $dom->createDocumentFragment();
        if($replaced && $replaced != "")
            $newNode->appendXML($replaced);
        $node->parentNode->replaceChild($newNode, $node);
    }
    // get only the body tag with its contents, then trim the body tag itself to get only the original content
    return mb_substr($dom->saveXML($xpath->query('//body')->item(0)), 6, -7, "UTF-8");

它工作正常,但html代码必须有效并且有时崩溃(我很确定我的有效),但是我收到此错误: 警告:DOMDocumentFragment :: appendXML():实体:第1行:解析器错误:xmlParseEntityRef:[...]行中没有名称

2 个答案:

答案 0 :(得分:1)

问题是您是按顺序执行替换。

相反,请尝试一次性应用它们:

$arr1像这样:

$arr1 = array("Hello World","Hello");

在你最深的代码中:

$t = preg_replace_callback(
    '/\b(?:('.implode(')|(',$arr1).'))\b/',
    function($m) use (&$arr1,&$arr2) {
        // find which pattern matched
        array_shift($m);
        $result = array_filter($m);
        $keys = array_keys($result);
        $matched = $keys[0];

        // apply match and remove from search list
        $result = $arr2[$matched];
        unset($arr1[$matched], $arr2[$matched]);
        return $result;
    },
    $t
);

假设我没有弄乱它,这应该可以很好地运作。

答案 1 :(得分:1)

怎么样

$text = "<p>Lorem ipsum Hello World lorem ipsum.</p><p>Hello you</p>";
$arr1 = array('Hello World','Hello');
$arr2 = array('<a href="link2">Hello World</a>','<a href="link1">Hello</a>');

print strtr($text, array_combine($arr1, $arr2));

// <p>Lorem ipsum <a href="link2">Hello World</a> lorem ipsum.</p><p><a href="link1">Hello</a> you</p>