如果a
或img
不以www,http或https in {开头},我想从$string_1
删除特定的<src>
和<href>
代码{1}}或<a>
代码。
例如,<img>
通过删除:
$string_1
$string_2
和
<img src="/wp-content/uploads/2014/06/photography-business-2.jpg" alt="photography business growth 1 650x430 6 Simple Ways To Help Grow Your Photography Business" width="650" height="430" class="alignnone size-large wp-image-609513" title="6 Simple Ways To Help Grow Your Photography Business"/>
因为<a href="/photography-business-growth/" rel="nofollow">Read more about Photography Business Growth ></a>
和src
代码不以http,https或www开头。
href
你能帮我解决这个问题吗?感谢
答案 0 :(得分:2)
这是PHP的第一种方法。它适用于您的示例数据。在$ string_2中是拖尾&#34;&lt; p&gt;&lt; / p&gt;&#34;丢失。
$string_3 = $string_1;
$pattern = "([^wh]|w[^w]|ww[^w]|h[^t]|ht[^t]|htt[^p])";
$string_3 = preg_replace("/<img src=\"".$pattern."[^>]*>/","",$string_3);
$string_3 = preg_replace("/<a href=\"".$pattern."[^>]*>[^<]*<\/a>/","",$string_3);
答案 1 :(得分:2)
我会使用DOM
解析器。拥有DOM文档后,您可以使用XPath
选择所需的元素。
# Parse the HTML snippet into a DOM document
$doc = new DOMDocument();
$doc->loadHTML($string_1);
# Create an XPath selector
$selector = new DOMXPath($doc);
# Define the XPath query
# The syntax highlighter messed this up. Take it as it is!
$query = <<<EOF
//a[not(starts-with(@href, "http"))
and not(starts-with(@href, "www"))]
| //img[not(starts-with(@src, "http"))
and not(starts-with(@src, "www"))]
EOF;
# Issue the XPath query and remove every resulting node
foreach($selector->query($query) as $node) {
$node->parentNode->removeChild($node);
}
# Write back the modified `<div>` element into a string
echo $doc->saveHTML(
$selector->query('//div[@class="mainpost"]')->item(0)
);
答案 2 :(得分:1)
一个解决方案是使用Javascript在前端执行此操作。如果这不是一个选项,你可以查看一个PHP库来解析和遍历DOM,例如http://simplehtmldom.sourceforge.net