我想运行str_replace
或preg_replace
,在我的$glossary_terms
中查找某些字词(在$content
中找到),并用链接替换它们(如{{ 1}})。
然而,<a href="/glossary/initial/term">term</a>
是完整的HTML,我的链接/图片也受到影响,这不是我所追求的。
$content
的一个例子是:
$content
我遇到了this link,但我不确定这种方法是否适用于嵌套HTML。
我是否可以<div id="attachment_542" class="wp-caption alignleft" style="width: 135px"><a href="http://www.seriouslyfish.com/dev/wp-content/uploads/2011/12/Amazonas-English-1.jpg"><img class="size-thumbnail wp-image-542" title="Amazonas English" src="http://www.seriouslyfish.com/dev/wp-content/uploads/2011/12/Amazonas-English-1-288x381.jpg" alt="Amazonas English" width="125" height="165" /></a><p class="wp-caption-text">Amazonas Magazine - now in English!</p></div>
<p>Edited by Hans-Georg Evers, the magazine ‘Amazonas’ has been widely-regarded as among the finest regular publications in the hobby since its launch in 2005, an impressive achievment considering it’s only been published in German to date. The long-awaited English version is just about to launch, and we think a subscription should be top of any serious fishkeeper’s Xmas list…</p>
<p>The magazine is published in a bi-monthly basis and the English version launches with the January/February 2012 issue with distributors already organised in the United States, Canada, the United Kingdom, South Africa, Australia, and New Zealand. There are also mobile apps availablen which allow digital subscribers to read on portable devices.</p>
<p>It’s fair to say that there currently exists no better publication for dedicated hobbyists with each issue featuring cutting-edge articles on fishes, invertebrates, aquatic plants, field trips to tropical destinations plus the latest in husbandry and breeding breakthroughs by expert aquarists, all accompanied by excellent photography throughout.</p>
<p>U.S. residents can subscribe to the printed edition for just $29 USD per year, which also includes a free digital subscription, with the same offer available to Canadian readers for $41 USD or overseas subscribers for $49 USD. Please see the <a href="http://www.amazonasmagazine.com/">Amazonas website</a> for further information and a sample digital issue!</p>
<p>Alternatively, subscribe directly to the print version <a href="https://www.amazonascustomerservice.com/subscribe/index2.php">here</a> or digital version <a href="https://www.amazonascustomerservice.com/subscribe/digital.php">here</a>. Just gonna add this to the end of the post so I can do some testing.</p>
或str_replace
只在preg_replace
个标签内容;排除任何嵌套的<p>
,<a>
或<img>
代码?
提前致谢,
答案 0 :(得分:1)
“书本解决方案”将是这样的:
<?php
$html = "<your HTML string>";
$glossary_terms = array('fishes', 'invertebrates', 'aquatic plants');
$dom = new DOMDocument;
$dom->loadHTML($html);
dom_link_glossary($dom, $glossary_terms);
echo $dom->saveHTML();
// wraps all occurrences of the glossary terms in links
function dom_link_glossary(&$document, &$glossary) {
$xpath = new DOMXPath($document);
$urls = array();
$pattern = array();
// build a normalized lookup (case-insensitive, whitespace-agnostic)
foreach ($glossary as $term) {
$term_norm = preg_replace('/\s+/', ' ', strtoupper(trim($term)));
$pattern[] = preg_replace('/ /', '\\s+', preg_quote($term_norm));
$urls[$term_norm] = '/glossary/initial/' . rawurlencode($term);
}
$pattern = '/\b(' . implode('|', $pattern) . ')\b/i';
$text_nodes = $xpath->query('//text()[not(ancestor::a)]');
foreach($text_nodes as $original_node) {
$text = $original_node->nodeValue;
$hitcount = preg_match_all($pattern, $text, $matches, PREG_OFFSET_CAPTURE);
if ($hitcount == 0) continue;
$offset = 0;
$parent = $original_node->parentNode;
$refnode = $original_node->nextSibling;
$parent->removeChild($original_node);
foreach ($matches[0] as $i => $match) {
$term_txt = $match[0];
$term_pos = $match[1];
$term_norm = preg_replace('/\s+/', ' ', strtoupper($term_txt));
// insert any text before the term instance
$prefix = substr($text, $offset, $term_pos - $offset);
$parent->insertBefore($document->createTextNode($prefix), $refnode);
// insert the actual term instance as a link
$link = $document->createElement("a", $term_txt);
$link->setAttribute("href", $urls[$term_norm]);
$parent->insertBefore($link, $refnode);
$offset = $term_pos + strlen($term_txt);
if ($i == $hitcount - 1) { // last match, append remaining text
$suffix = substr($text, $offset);
$parent->insertBefore($document->createTextNode($suffix), $refnode);
}
}
}
}
?>
以下是dom_link_glossary()
的工作原理:
\b
来阻止部分匹配。$parent->removeChild()
)<a>
)用于实际术语表术语。解决方案保留原始案例和空白区域,因此
term
将成为<a href="/glossary/initial/term">term</a>
Term
将成为<a href="/glossary/initial/term">Term</a>
Foo Bar
将成为<a href="/glossary/initial/foo%20bar">Foo Bar</a>
。 HTML中的剩余空格或换行符不会破坏机制。请注意,在纯文本节点值上使用正则表达式是完全正确的。在完整的HTML上使用正则表达式是不可行的。
我建议将术语表术语与数组中各自的URL配对,而不是计算函数中的URL。这样,您就可以将多个术语指向同一个网址。
答案 1 :(得分:0)
你可以试试这个:
$content = preg_replace('/(<p\sclass=\"wp\-caption\-text\">)[^<]+(<\/p>)/i', '', $content);