Question

我正试图从reddit主页（www.reddit.com）获得头条新闻和评分，并将它们放入阵列中。目前它只检索一个标题，我无法弄清楚如何从页面中检索所有标题和评级。

目前我有以下代码：

<?php
    $url = "http://www.reddit.com/";
    $ch = curl_init($url);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
    $var= curl_exec($ch);
    curl_close($ch);

    $third= stripos($var,'<p class="title"><a class="title " ',0);
    $fourth= stripos($var,'</span></p>',0);

    //echo substr($var,$first,$second-$first);
    echo substr($var,$third,$fourth-$third);
?>

提前致谢。

Answer 1

如果您真的想使用正则表达式模式，请尝试一下：

<?php
    $url = "http://www.reddit.com/";
    $ch = curl_init($url);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
    $var= curl_exec($ch);
    curl_close($ch);
    preg_match_all('/<a class="title " href="(.{0,255})" tabindex="1"(?:([\sa-z]+)="([a-z]+)")? >(.{0,255})<\/a>&#32;/', $var, $matches);
    print_r($matches[4]);
?>

Answer 2

当我们尝试使用curl从另一个站点获取数据时，我们将以“html string”格式获得响应。因此我们必须使用DOMDocument来获取html标签的值。在这里，我可以成功获取标题文本，请查看以下代码：

<?php
$url = "http://www.reddit.com";
// Curl call to get heading tags
$ch = curl_init();
//set the url, number of POST vars, POST Data
curl_setopt($ch,CURLOPT_URL,$url);
curl_setopt($ch, CURLOPT_HEADER,0);
curl_setopt($ch, CURLOPT_RETURNTRANSFER,1);
$responseOfCurl = curl_exec($ch) or die(curl_error());
if( curl_error($ch) )
{
 echo "<br> CURL ERROR: ".curl_error($ch);
}   
curl_close($ch);
//print_r($responseOfCurl);
// To get file src path from html response.
$dom = new DOMDocument();
@$dom->loadHTML($responseOfCurl);
$xpath = new DOMXPath($dom);    
$tags = $xpath->query("//p[@class='title']/a/@href|//p[@class='title']");   
$i=1;
$headingArray = array();
if(!empty($tags))
{
 foreach ($tags as $tag) 
 {    
  $redditHeading = "";
  $redditHeading = trim( $tag->nodeValue ); 
  $headingArray[].=$redditHeading;
 } 
  print_r($headingArray);
}
?>

这里我查询获取标题文本如下：如果您查看reddit.com页面的查看源，您将会知道标题文本采用以下格式：

<p class='title'>
 <a class='title' href='abc.com'>heading text</a>
</p>

因此我将以下查询作为标记类名和

标记类名， “// P [@类= '标题'] /一个/ @ HREF | // P [@类= '标题']”。

$ headingArray在这个数组中，您将获得reddit.com的所有标题。对于交叉检查，您从reditt.com放置一个标题并在此数组中搜索。

因此，您必须进行另一次查询以从html标记获取评级文本。

Answer 3

是的，您创建一个数组并在该数组中存储相应的标题和评级或者您必须创建此类查询的组合，以获取评分和标题文本。并将其存储在数组中。

使用PHP cURL从Reddit中检索标题和评级

3 个答案: