从网站获取数据

时间:2015-07-01 04:13:17

标签: php html curl

我有自己的外部网站,我想从网站上获取一些数据。我使用CURL来获取网站的内容,但我想要一些部分:

编辑:非常坦率地说,我想得到Facebook页面的TimeStamp,如果你在页面上使用Inspect元素,你会看到如下代码:

<span class="fsm fwn fcg"><a class="_5pcq">
<abbr title="Tuesday, June 30, 2015 at 5:00pm" data-utime="1435663826" data-shorten="1" class="_5ptz timestamp livetimestamp">5 hrs</abbr></a>
<span class="fsm fwn fcg"><a class="_5pcq">
<abbr title="Tuesday, June 30, 2015 at 5:01pm" data-utime="1435663827" data-shorten="1" class="_5ptz timestamp livetimestamp">5 hrs</abbr></a>
<span class="fsm fwn fcg"><a class="_5pcq">
<abbr title="Tuesday, June 30, 2015 at 5:02pm" data-utime="1435663828" data-shorten="1" class="_5ptz timestamp livetimestamp">5 hrs</abbr></a>
<span class="fsm fwn fcg"><a class="_5pcq">
<abbr title="Tuesday, June 30, 2015 at 5:03pm" data-utime="1435663829" data-shorten="1" class="_5ptz timestamp livetimestamp">5 hrs</abbr></a>
<span class="fsm fwn fcg"><a class="_5pcq">
<abbr title="Tuesday, June 30, 2015 at 5:04pm" data-utime="1435663830" data-shorten="1" class="_5ptz timestamp livetimestamp">5 hrs</abbr></a>
</span>

我只想显示“data-utime”的值1435663826.这是我的代码,它将获取内容。在此之后我应该使用什么?

 $cookie = tmpfile();
    $userAgent = 'Mozilla/5.0 (Windows NT 6.2; WOW64) AppleWebKit/537.31 (KHTML, like Gecko) Chrome/26.0.1410.64 Safari/537.31' ;

    $ch = curl_init("https://www.mywebsite.com");

    $options = array(
        CURLOPT_CONNECTTIMEOUT => 20 , 
        CURLOPT_USERAGENT => $userAgent,
        CURLOPT_AUTOREFERER => true,
        CURLOPT_FOLLOWLOCATION => true,
        CURLOPT_RETURNTRANSFER => true,
        CURLOPT_COOKIEFILE => $cookie,
        CURLOPT_COOKIEJAR => $cookie ,
        CURLOPT_SSL_VERIFYPEER => 0 ,
        CURLOPT_SSL_VERIFYHOST => 0
    );

    curl_setopt_array($ch, $options);
    $kl = curl_exec($ch);
    curl_close($ch);

    echo $kl; // Final output after fetching

2 个答案:

答案 0 :(得分:1)

您可以使用PHP的DOM扩展程序load and parse html文档,然后使用DOMXPath的实例来query特定元素。

答案 1 :(得分:0)

如果你已经获得了html标签,那么

试试这个:

<div class="cssmenu">Not centered and not bold</div>
<div class="cssmenu aligncenter">Text centered</div>
<div class="cssmenu">
  <div class="hassub">It contains subsections, so I am Italic</div>
</div>