如何使用php和正则表达式从网站获取链接

时间:2011-06-24 20:00:48

标签: php regex hyperlink nofollow

如果链接链接到其他网站,我想在我的网站的所有链接中添加rel =“nofollow”。

例如,

$str = "<a href='www.linktoothersite.com'>I swear this isn't spam!</a><br><a href='www.mywebsite.com'>Hello World</a>";

输出应为

$str = "<a href='www.linktoothersite.com' rel="nofollow">I swear this isn't spam!</a><br><a href='www.mywebsite.com'>Hello World</a>";

我真的想要正则表达而不是DDOMDocument。因为当我使用DOMDocument我总是得到错误“ 警告:DOMDocument :: loadHTML()[domdocument.loadhtml]:htmlParseEntityRef:expecting';'在实体“

1 个答案:

答案 0 :(得分:4)

使用DOM解析器并遍历所有链接,检查其href属性以查找其他网站。这是未经测试的,可能需要进行一些调整。

// assuming your html is in $HTMLstring
$dom = new DOMDocument();
$dom->loadHTML($HTMLstring);

// May need to disable error checking if the HTML isn't fully valid
$dom->strictErrorChecking = FALSE;

// Get all the links
$links = $dom->getElementsByTagName("a");
foreach($links as $link) {
  $href = $link->getAttribute("href");

  // Find out if the link points to a domain other than yours
  // If your internal links are relative, you'll have to do something fancier to check
  // their destinations than this simple strpos()
  if (strpos("yourdomain.example.com", $href) == -1) {
     // Add the attribute
     $link->setAttribute("rel", "nofollow");
  }

// Save the html
$output = $dom->saveHTML;