Question

以下是我此时使用的代码

$file = array_rand($files);
$filename = "http://example.com/".$files[$file];
echo $filename;
libxml_use_internal_errors(true);
$c = file_get_contents($filename);
$d = new DomDocument();
$d->loadHTML($c);
$xp = new domxpath($d);
foreach ($xp->query("//meta[@name='og:title']") as $el) {
echo $el->getAttribute("content");
}
foreach ($xp->query("//meta[@name='og:image']") as $el) {
echo $el->getAttribute("content");
}

$ filename具有正确的URL值，但它不回显og：image和og：title的内容？

修改

这是我网页的典型组织

<?php require_once("headertop.php")?>
<meta property="og:image" content="url" />
<meta property="og:title" content="content here." />
<meta property="og:description" content="description here." />
<title>Page title</title>
<?php require_once("headerbottom.php")?>

编辑2

From one answer I understood this. I have to use

$rootNamespace = $d->lookupNamespaceUri($d->namespaceURI);
$xpath->registerNamespace('og', $rootNamespace);

然后使用

<meta property="og:image" content="url" />

我是对的吗？

Answer 1

'og'是一个名称空间，因此它不会以这种方式被拉出来。您需要为DOMXPath对象定义该命名空间：

http://php.net/manual/en/domxpath.registernamespace.php

编辑：以下是我使用VICE主页汇总的示例。我从他们的开发者网站上删除了Facebook OpenGraph XML命名空间。

<?php                                                                              
error_reporting(E_ERROR);
$html = file_get_contents("http://www.vice.com/");
$doc = new DomDocument();
$doc->loadHTML($html);
$xp = new DOMXPath($doc);
$xp->registerNamespace('og', 'http://ogp.me/ns#');
print_r($xp->query("//meta[@name='og:title']")->item(0)->getAttribute('content'));

Answer 2

这应该可以正常工作：

<?php
$html = new DOMDocument();
@$html->loadHTML(file_get_contents('http://www.imdb.com/title/tt0117500/'));

foreach($html->getElementsByTagName('meta') as $meta) {
    if(strpos($meta->getAttribute('property'), 'og') !==false) {
        echo $meta->getAttribute('content') . '<br/>';
    }
}
?>

无法从网页中提取og标签？

2 个答案: