empresarios

Question

我正在研究这个PHP函数。这个想法是将字符串中出现的某些单词包装成某些标记（数组中给出的单词和标签）。它工作正常！但是当这些单词出现在链接文本或其'src'属性中时，当然链接被破坏并填充标记，或者生成不应该在链接内的标记。这就是我现在所拥有的：

function replace() {
  $terminos = array (
  "beneficios" => "h3",
  "valoracion" => "h2",
  "empresarios" => "h2",
  "tecnologias" => "h2",
  "...and so on..." => "etc",
  );

  foreach ($terminos as $key => $value)
  {
  $body = "string where the word empresarios should be replaced; but the word <a href='http://www.empresarios.com'>empresarios</a> should not be replaced inside <a> tags nor in the URL of their 'src' attribute.";
  $tagged = "<".$value.">".$key."</".$value.">";
  $result = str_replace($key, $tagged, $body);
  }
}

在此示例中，该函数应返回"string where the word <h2>empresarios</h2> should be replaced; but the word <a href='http://www.empresarios.com'>empresarios</a> should not be replaced inside <a> tags nor in the URL of their 'src' attribute."

我希望这个替换函数可以在字符串中完成所有操作，但不能在其属性中使用标记或！

（我想做以下主题中提到的内容，只是它不是我需要的javascript，而是在PHP中：/questions/1666790/how-to-replace-text-not-within-a-specific-tag-in-javascript）

Answer 1

使用DOM并仅修改文本节点：

$s = "foo <a href='http://test.com'>foo</a> lorem bar ipsum foo. <a>bar</a> not a test";
echo htmlentities($s) . '<hr>';

$d = new DOMDocument;
$d->loadHTML($s);

$x = new DOMXPath($d);
$t = $x->evaluate("//text()");

$wrap = array(
    'foo' => 'h1',
    'bar' => 'h2'
);

$preg_find = '/\b(' . implode('|', array_keys($wrap)) . ')\b/';

foreach($t as $textNode) {
    if( $textNode->parentNode->tagName == "a" ) {
        continue;
    }

    $sections = preg_split( $preg_find, $textNode->nodeValue, null, PREG_SPLIT_DELIM_CAPTURE);

    $parentNode = $textNode->parentNode;

    foreach($sections as $section) {  
        if( !isset($wrap[$section]) ) {
            $parentNode->insertBefore( $d->createTextNode($section), $textNode );
            continue;
        }

        $tagName = $wrap[$section];
        $parentNode->insertBefore( $d->createElement( $tagName, $section ), $textNode );
    }

    $parentNode->removeChild( $textNode );
}

echo htmlentities($d->saveHTML());

根据需要编辑用DOMText和DOMElement替换DOMText。

Answer 2

你指出的答案，在JS中，它基本相同。你只需要指定它是一个字符串。

$regexp = "/(<pre>(?:[^<](?!\/pre))*<\/pre>)|(\:\-\))/gi";

另请注意，您可能需要另一个preg_replace函数来替换单词'empresarios'以防它大写（Empresarios）或类似奇怪的东西（EmPreSAriOS）。

还要处理你的HTML。 <h2>是块元素，可以这样解释：

字符串所在的单词empresarios 应该被替换;

并替换

字符串

empresarios

应该被替换;

您可能需要使用的是<big>标记。

Answer 3

在尝试使用正则表达式模式进行替换时，一定要使用dom解析器隔离合格的文本节点，该正则表达式模式应遵守：单词边界，不区分大小写和unicode字符。如果您打算专门针对具有Unicode字符的单词，则需要在某些字符串函数中添加mb_。

利用以下见解，我为您的情况量身定制了一个解决方案。

代码：（Demo）

$html = <<<HTML
foo <a href='http://test.com'>fóo</a> lórem
bár ipsum bar food foo bark. <a>bar</a> not á test
HTML;

$lookup = [
    'foo' => 'h3',
    'bar' => 'h2'
];

libxml_use_internal_errors(true);
$dom = new DOMDocument();
$dom->loadHTML($html, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);

$xpath = new DOMXPath($dom);

$regexNeedles = [];
foreach ($lookup as $word => $tagName) {
    $regexNeedles[] = preg_quote($word, '~');
}
$pattern = '~\b(' . implode('|', $regexNeedles) . ')\b~iu' ;

foreach($xpath->query('//*[not(self::a)]/text()') as $textNode) {
    $newNodes = [];
    $hasReplacement = false;
    foreach (preg_split($pattern, $textNode->nodeValue, 0, PREG_SPLIT_NO_EMPTY | PREG_SPLIT_DELIM_CAPTURE) as $fragment) {
        $fragmentLower = strtolower($fragment);
        if (isset($lookup[$fragmentLower])) {
            $hasReplacement = true;
            $a = $dom->createElement($lookup[$fragmentLower]);
            $a->nodeValue = $fragment;
            $newNodes[] = $a;
        } else {
            $newNodes[] = $dom->createTextNode($fragment);
        }
    }
    if ($hasReplacement) {
        $newFragment = $dom->createDocumentFragment();
        foreach ($newNodes as $newNode) {
            $newFragment->appendChild($newNode);
        }
        $textNode->parentNode->replaceChild($newFragment, $textNode);
    }
}
echo substr(trim(utf8_decode($dom->saveHTML($dom->documentElement))), 3, -4);

输出：

<h3>foo</h3> <a href="http://test.com">fóo</a> lórem
bár ipsum <h2>bar</h2> food <h3>foo</h3> bark. <a>bar</a> not á test

如何替换链接标记内的字符串？

3 个答案:

empresarios