Question

有没有人知道PHP中的正则表达式函数来剥离其内容的锚点，只有当锚点的href属性包含特定文本时？

例如，我有一个HTML页面，并且始终有链接。但我想只删除URL中包含“yahoo”的锚点。因此，<a href="http://pages.yahoo.com/page1">Example page</a>将成为：例如，HTML中不包含“yahoo”的其他锚点将保持不变。

Answer 1

首先，这不是正则表达式问题（或者至少它不应该是）。 PHP附带一个HTML解析器，所以我强烈建议使用它。

当您使用它时，您只需要遍历所有锚标记，检查href属性并在必要时进行修改，然后将其保存回HTML。例如：

$dom = new DOMDocument;
$dom->loadHTML($html); // $html as a string
$anchors = $dom->getElementsByTagName('a');
for ($i=0; i<$anchors->length; $i++) {
  $item = $anchors->item[$i];
  $href = $item->getAttribute('href');
  $host = parse_url($href, PHP_URL_HOST);
  if (stripos($host, 'yahoo') !== false) {
    $item->parentNode->removeChild($item);
  }
}
$html = $dom->saveHTML();

此处使用parse_url()是可选的。你可以简单地检查属性值是否在其中的任何地方都有“yahoo”，而不仅仅是拉出主机名。

对于同样的问题，这比任何基于正则表达式的解决方案显着更好，更强大。

Answer 2

尝试此功能。

public function stripAnchorTags($html, $ignore_host = false, $charset="UTF-8"){
        $dom = new DOMDocument;
        $dom->loadHTML('<?xml version="1.0" encoding="'.$charset.'"?>'.$html); // $html as a string
        $anchors = $dom->getElementsByTagName('a');
        $length = $anchors->length;
        for($i=0; $i<$length; $i++){
            $item = $anchors->item(0);
            $href = $item->getAttribute('href');
            $host = parse_url($href, PHP_URL_HOST);
            if(!$ignore_host || stripos($host, $ignore_host) === false) {
                $item->parentNode->replaceChild($dom->createTextNode($href),$item);
            }
        }
        return preg_replace('/^<!DOCTYPE.+?>/', '', str_replace( array('<html>', '</html>', '<body>', '</body>'), array('', '', '', ''), $dom->saveXML($dom->documentElement)));
    }

你可以像使用stripAnchorTags（$ html）;

一样使用它

如果你想忽略雅虎链接，那么就像这个stripAnchorTags（$ html，“yahoo”）;

仅当锚点的URL包含时，才会将锚点限制为其内容

2 个答案: