Question

我正在尝试提取网页的元数据（开放图形标记），以下代码适用于所有http页面，但只要给出一些https链接就会失败。

ex：https://www.facebook.com/hellocad111

我检查了我的服务器，它支持openssl，如下所示：

$w = stream_get_wrappers();
echo 'openssl: ',  extension_loaded  ('openssl') ? 'yes':'no', "\n";
echo 'http wrapper: ', in_array('http', $w) ? 'yes':'no', "\n";
echo 'https wrapper: ', in_array('https', $w) ? 'yes':'no', "\n";
echo 'wrappers: ', var_dump($w);

，回复是

openssl: yes 
http wrapper: yes
https wrapper: yes 
wrappers: array(10) { [0]=> string(5) "https" [1]=> string(4) "ftps" [2]=> string(13) "compress.zlib" [3]=> string(14) "compress.bzip2" [4]=> string(3) "php" [5]=> string(4) "file" [6]=> string(4) "data" [7]=> string(4) "http" [8]=> string(3) "ftp" [9]=> string(3) "zip" }

这是我的代码：

function show($link)
{
    $html=file_get_contents($link);
    libxml_use_internal_errors(true); 
    $doc = new DomDocument();
    $doc->loadHTML($html);
    $xpath = new DOMXPath($doc);
    $query = '//*/meta[starts-with(@property, \'og:\')]';
    $metas = $xpath->query($query);

    foreach ($metas as $meta)
    {
       $property = $meta->getAttribute('property');
       $content = $meta->getAttribute('content');
       $rmetas[$property] = $content;
    }
    return $rmetas;
    }

我什么也没得到。

Answer 1

Hah，服务条款......
所有这些意味着除非您首先使用API，否则他们不会给予您支持;你是“允许”做任何你想做的事但是在这个问题上，我倾向于同意@Jens;只需使用API。我一直遇到类似的问题，试图在没有 API的情况下抓取网站，而且这种情况大规模无法实现。然而，Facebook已经足够友好提供友好的API，所以不要在口中看到礼物马。（无论意味着......）

xpath查询无法在https页面中运行？

1 个答案: