Question

我有一个函数可以返回任何网页的所有img链接，但我想拍摄最能代表新闻文章的图像。我知道这是一个有点难的问题，但每篇新闻文章都有一些主要的图像在文章的顶部。我需要在所有其他图像中选择它。 Facebook和像网站一样的reddit可以做到这一点。你有什么想法吗？当我的网站成员共享链接时，应该有一张图片。我现在可以在网页中获取所有图像的网址，我需要找到主图像。 :)

function get_links($url) {

$xml = new DOMDocument();

libxml_use_internal_errors(true);

$html = file_get_contents($url);

if(!$xml->loadHTML($html)) {
    $errors="";
    foreach (libxml_get_errors() as $error)  {
        $errors.=$error->message."<br/>";
    }
    libxml_clear_errors();
    print "libxml errors:<br>$errors";
    return;
}

// Empty array to hold all links to return 
$links = array();

//Loop through each <img> tag in the dom and add it to the link array 
foreach ($xml->getElementsByTagName('img') as $link) {
    $url = $link->getAttribute('src');
    if (!empty($url)) {
        $links[] = $link->getAttribute('src');
    }
}

//Return the links 
return $links;
}

Answer 1

您可以改进现有功能，但如果您想优先考虑Open Graph数据的存在，请在getElementsByTagName('img')逻辑之前添加此功能......

$xpath = new DOMXPath( $xml );
if( $xpathNodeList = $xpath->query('//meta[@property="og:image" and @content]') )
{
  return array( $xpathNodeList->item(0)->getAttribute('content') );
}

或将其添加到您的阵列......

// Empty array to hold all links to return
$links = array();

$xpath = new DOMXPath( $xml );
if( $xpathNodeList = $xpath->query('//meta[@property="og:image" and @content]') )
{
  $links[] = $xpathNodeList->item(0)->getAttribute('content');
}

从另一个网址获取主图像？

1 个答案: