Question

如果a或img不以www，http或https in {开头}，我想从$string_1删除特定的<src>和<href>代码{1}}或<a>代码。

例如，<img>通过删除：

转换为$string_1

$string_2

和

<img src="/wp-content/uploads/2014/06/photography-business-2.jpg" alt="photography business growth 1 650x430 6 Simple Ways To Help Grow Your Photography Business" width="650" height="430" class="alignnone size-large wp-image-609513" title="6 Simple Ways To Help Grow Your Photography Business"/>

因为<a href="/photography-business-growth/" rel="nofollow">Read more about Photography Business Growth ></a>和src代码不以http，https或www开头。

href

你能帮我解决这个问题吗？感谢

Answer 1

这是PHP的第一种方法。它适用于您的示例数据。在$ string_2中是拖尾＆＃34;＆lt; p＆gt;＆lt; / p＆gt;＆＃34;丢失。

$string_3 = $string_1;
$pattern = "([^wh]|w[^w]|ww[^w]|h[^t]|ht[^t]|htt[^p])";
$string_3 = preg_replace("/<img src=\"".$pattern."[^>]*>/","",$string_3);
$string_3 = preg_replace("/<a href=\"".$pattern."[^>]*>[^<]*<\/a>/","",$string_3);

Answer 2

我会使用DOM解析器。拥有DOM文档后，您可以使用XPath选择所需的元素。

# Parse the HTML snippet into a DOM document
$doc = new DOMDocument();
$doc->loadHTML($string_1);

# Create an XPath selector
$selector = new DOMXPath($doc);

# Define the XPath query
# The syntax highlighter messed this up. Take it as it is!
$query = <<<EOF
  //a[not(starts-with(@href, "http"))
  and not(starts-with(@href, "www"))]
| //img[not(starts-with(@src, "http"))
  and not(starts-with(@src, "www"))]
EOF;

# Issue the XPath query and remove every resulting node
foreach($selector->query($query) as $node) {
    $node->parentNode->removeChild($node);
}

# Write back the modified `<div>` element into a string
echo $doc->saveHTML(
    $selector->query('//div[@class="mainpost"]')->item(0)
);

Answer 3

一个解决方案是使用Javascript在前端执行此操作。如果这不是一个选项，你可以查看一个PHP库来解析和遍历DOM，例如http://simplehtmldom.sourceforge.net

PHP - 如果href或src不以http，https或www开头，则删除特定的标签或img标签

3 个答案: