PHP - 如果href或src不以http,https或www开头,则删除特定的标签或img标签

时间:2014-06-24 15:44:18

标签: php

如果aimg不以www,http或https in {开头},我想从$string_1删除特定的<src><href>代码{1}}或<a>代码。

例如,<img>通过删除:

转换为$string_1
$string_2

<img src="/wp-content/uploads/2014/06/photography-business-2.jpg" alt="photography business growth 1 650x430 6 Simple Ways To Help Grow Your Photography Business" width="650" height="430" class="alignnone size-large wp-image-609513" title="6 Simple Ways To Help Grow Your Photography Business"/>

因为<a href="/photography-business-growth/" rel="nofollow">Read more about Photography Business Growth &gt;</a> src代码不以http,https或www开头。

href
你能帮我解决这个问题吗?感谢

3 个答案:

答案 0 :(得分:2)

这是PHP的第一种方法。它适用于您的示例数据。在$ string_2中是拖尾&#34;&lt; p&gt;&lt; / p&gt;&#34;丢失。

$string_3 = $string_1;
$pattern = "([^wh]|w[^w]|ww[^w]|h[^t]|ht[^t]|htt[^p])";
$string_3 = preg_replace("/<img src=\"".$pattern."[^>]*>/","",$string_3);
$string_3 = preg_replace("/<a href=\"".$pattern."[^>]*>[^<]*<\/a>/","",$string_3);

答案 1 :(得分:2)

我会使用DOM解析器。拥有DOM文档后,您可以使用XPath选择所需的元素。

# Parse the HTML snippet into a DOM document
$doc = new DOMDocument();
$doc->loadHTML($string_1);

# Create an XPath selector
$selector = new DOMXPath($doc);

# Define the XPath query
# The syntax highlighter messed this up. Take it as it is!
$query = <<<EOF
  //a[not(starts-with(@href, "http"))
  and not(starts-with(@href, "www"))]
| //img[not(starts-with(@src, "http"))
  and not(starts-with(@src, "www"))]
EOF;

# Issue the XPath query and remove every resulting node
foreach($selector->query($query) as $node) {
    $node->parentNode->removeChild($node);
}

# Write back the modified `<div>` element into a string
echo $doc->saveHTML(
    $selector->query('//div[@class="mainpost"]')->item(0)
);

答案 2 :(得分:1)

一个解决方案是使用Javascript在前端执行此操作。如果这不是一个选项,你可以查看一个PHP库来解析和遍历DOM,例如http://simplehtmldom.sourceforge.net