DOM解析器抓取<a> tag by class=&#34;Decision&#34;</a>的href

时间:2013-12-17 00:26:18

标签: php html parsing dom

我正在使用DOM解析器而且我遇到了问题。我基本上试图抓住标签中只包含类别ID'thumbnail'的href。我一直在尝试在屏幕上打印链接但仍然没有结果。任何帮助表示赞赏。我也打开了error_reporting(E_ALL);但仍然没有。

$html = file_get_contents('http://www.reddit.com/r/funny');
$dom = new DOMDocument();
@$dom->loadHTML($html);
$classId = "thumbnail ";
$div = $html->find('a#'.$classId);
echo $div;

我也试过了,但仍然有相同的结果:

include('simple_html_dom.php');
$html = file_get_contents('http://www.reddit.com/r/funny');
$dom = new DOMDocument();
@$dom->loadHTML($html);
// grab all the on the page
$xpath = new DOMXPath($dom);
$hrefs = $xpath->evaluate("/html/body//a");
$ret = $html->find('a[class=thumbnail]');
echo $ret;

4 个答案:

答案 0 :(得分:3)

你快到了那里:

<?php
$dom = new DOMDocument();
@$dom->loadHTMLFile('http://www.reddit.com/r/funny');

$xpath = new DOMXPath($dom);
$hrefs = $xpath->evaluate("/html/body//a[contains(concat(' ',normalize-space(@class),' '),' thumbnail ')]");
var_dump($hrefs);

给出:

class DOMNodeList#28 (1) {
  public $length =>
  int(25)
}

25场比赛,我称之为成功。

答案 1 :(得分:1)

此代码可能有效:

$html = file_get_contents('http://www.reddit.com/r/funny');
$dom = new DOMDocument();
@$dom->loadHTML($html);

$xpath = new DOMXPath($dom);
$hyperlinks = $xpath->query('//a[@class="thumbnail"]');

foreach($hyperlinks as $hyperlink) {
   echo $hyperlink->getAttribute('href'), '<br>;'
}

答案 2 :(得分:0)

如果您使用的是simple_html_dom,为什么要做所有这些多余的事情?它已经将资源包装在您需要的所有内容中 - http://simplehtmldom.sourceforge.net/manual.htm

include('simple_html_dom.php');

// set up:
$html = new simple_html_dom();

// load from URL:
$html->load_file('http://www.reddit.com/r/funny');

// find those <a> elements:
$links = $html->find('a[class=thumbnail]');

// done.
echo $links;

答案 3 :(得分:0)

测试并进行了一些更改 - 这也很完美。

<?php
    // load the url and set up an array for the links
    $dom = new DOMDocument();
    @$dom->loadHTMLFile('http://www.reddit.com/r/funny');
    $links = array();

    // loop thru all the A elements found
    foreach($dom->getElementsByTagName('a') as $link) {
        $url = $link->getAttribute('href');
        $class = $link->getAttribute('class');

        // Check if the URL is not empty and if the class contains thumbnail
        if(!empty($url) && strpos($class,'thumbnail') !== false) {
            array_push($links, $url);
        }
    }

    // Print results
    print_r($links);
?>