从instagram bio获取URL而无需登录

时间:2016-08-23 08:17:36

标签: php html web-scraping

我想用php从bio获取一个URL 网址:https://www.instagram.com/sukhcha.in/(可以是任何人的个人资料)
我尝试使用simple_html_dom,但在从url获取html时始终显示https错误。

2 个答案:

答案 0 :(得分:0)

您可以使用CURL来获取数据。

    $url = 'https://weather.com/weather/tenday/l/USMO0460:1:US';
    $curl = curl_init($url);
    curl_setopt($curl, CURLOPT_RETURNTRANSFER, true);
    curl_setopt($curl, CURLOPT_HTTPHEADER, array('Content-Type: application/x-www-form-urlencoded'));
    $curl_response = curl_exec($curl);

调试数据

    echo '<pre>';
    print_r($curl_response);
    echo '</pre>';

关闭卷曲

    curl_close($curl);

答案 1 :(得分:0)

根据我的评论中的建议,您应该使用cURL,因为它支持HTTPS协议:

<?php
$ch = curl_init();
curl_setopt($ch, CURLOPT_TIMEOUT, 0); // Timeout (0 : no timeout)
curl_setopt($ch, CURLOPT_HEADER, false); // Do not download header
curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:24.0) Gecko/20100101 Firefox/24.0'); // creates user-agent
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true); // do not output content
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true); // follow redirections
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false); // do not check HTTPS host (very important, if you set it to true, it probably won't work)
curl_setopt($ch, CURLOPT_URL, 'https://www.instagram.com/sukhcha.in/');
$content = curl_exec($ch);
?>

然后,您必须在$content变量上使用XPath来提取所需的部分。