需要帮助选择PHP简单的HTML DOM解析器

时间:2013-11-17 02:05:35

标签: php dom

在社区网站上工作,将其从ASP转换为PHP。目前,客户每周手动输入电影时间为我们的本地剧院,他们从另一个网站抓取。我想我会尝试自动化这个过程,因为我们正在重做网站,所以我找到了PHP Simple HTML DOM Parser。我一直坚持选择电影的评级(PG,18等)。

这是一个包含一部电影信息的div:

            <div class="mshow">
                <span style="float:right; font-size:11px;">
                    <a href="/trailers/enders-game/19330/" title="enders-game movie trailer" style="font-size:11px;">Trailer</a> | 
                    <a href="/reviews/enders-game/30945/" title="Ender's Game movie reviews" style="font-size:11px;">Rating: </a>
                    <b>Tribute</b>
                    <img src="/images/stars/4_sm.gif" alt="Current rating: 3.88" border="0" />
                </span>
                <strong>
                    <a href="/movies/enders-game/30945/" title="Ender's Game movie info">Ender's Game</a>
                </strong>
                (PG)<br />
                <div class="block">&nbsp;</div>
                <div class="rsd">Fri, Nov 15: </div>
                <div class="rst" >7:00pm &nbsp;&nbsp;9:20pm &nbsp;&nbsp;</div><br />
                <div class="rsd">Sat, Nov 16: </div>
                <div class="rst" >1:00pm &nbsp;&nbsp;3:15pm &nbsp;&nbsp;7:00pm &nbsp;&nbsp;9:20pm &nbsp;&nbsp;</div><br />
                <div class="rsd">Sun, Nov 17: </div>
                <div class="rst" >1:00pm &nbsp;&nbsp;3:15pm &nbsp;&nbsp;7:00pm &nbsp;&nbsp;9:20pm &nbsp;&nbsp;</div><br />
                <div class="rsd">Mon, Nov 18: </div>
                <div class="rst" >7:00pm &nbsp;&nbsp;9:20pm &nbsp;&nbsp;</div><br />
                <div class="rsd">Tue, Nov 19: </div>
                <div class="rst" >7:00pm &nbsp;&nbsp;9:20pm &nbsp;&nbsp;</div><br />
                <div class="rsd">Wed, Nov 20: </div>
                <div class="rst" >7:00pm &nbsp;&nbsp;9:20pm &nbsp;&nbsp;</div><br />
                <div class="rsd">Thu, Nov 21: </div>
                <div class="rst" >7:00pm &nbsp;&nbsp;9:20pm &nbsp;&nbsp;</div><br />
            </div>

到目前为止,这是我的代码:

            <?php
            include_once('../simple_html_dom.php');

            $html = file_get_html('http://www.tribute.ca/showtimes/theatres/may-cinema-6/mayc5/?datefilter=-1');
            $movies = array();
            foreach ($html->find("div.mshow") as $movie) {
                $item['trailer'] = $movie->find('a', 0)->href;
                $item['reviews'] = $movie->find('a', 1)->href;
                $item['link'] = $movie->find('a', 2)->href;
                $item['title'] = $movie->find('a', 2)->plaintext;
                $movies[] = $item;
            }

            var_dump($movies);
            ?>

我无法弄清楚如何抓住(PG)。有什么建议吗?

编辑:这有效,但似乎不是一个很好的解决方案。

            function parseDOM($url) {
                $movies = array();
                foreach ($url->find("div.mshow") as $movie) {
                    $item['trailer'] = $movie->find('a', 0)->href;
                    $item['reviews'] = $movie->find('a', 1)->href;
                    $item['link'] = $movie->find('a', 2)->href;
                    $item['title'] = $movie->find('a', 2)->plaintext;
                    $info = $movie->plaintext;
                    preg_match('/\((.*?)\)/', $info, $matches);
                    $item['rating'] = $matches[1];
                    $movies[] = $item;
                }
                return $movies;
            }

1 个答案:

答案 0 :(得分:1)

不幸的是,简单HTML DOM 库是bad choice。它不支持完整的XPath查询,也没有似乎兄弟节点选择器。

使用内置的DOM模块,您可以轻松实现所需的目标:

$dom = new DOMDocument;
@$dom->loadHTMLFile('http://www.tribute.ca/showtimes/theatres/may-cinema-6/mayc5/?datefilter=-1');
$xpath = new DOMXPath($dom);
$movies = array();

foreach ($xpath->query("//div[@class='mshow']") as $movie) {
    $item = array();
    $links = $xpath->query('.//a', $movie);
    $item['trailer'] = $links->item(0)->getAttribute('href');
    $item['reviews'] = $links->item(1)->getAttribute('href');
    $item['link'] = $links->item(2)->getAttribute('href');
    $item['title'] = $links->item(2)->nodeValue;
    $item['rating'] = trim($xpath->query('.//strong/following-sibling::text()',
        $movie)->item(0)->nodeValue);
    $movies[] = $item;
}

var_dump($movies);

这给了我以下内容:

array(7) {
  [0]=>
  array(5) {
    ["trailer"]=>
    string(28) "/trailers/enders-game/19330/"
    ["reviews"]=>
    string(27) "/reviews/enders-game/30945/"
    ["link"]=>
    string(26) "/movies/enders-game/30945/"
    ["title"]=>
    string(12) "Ender's Game"
    ["rating"]=>
    string(4) "(PG)"
  }
  [1]=>
  array(5) {
    ["trailer"]=>
    string(27) "/trailers/free-birds/19436/"
    ["reviews"]=>
    string(26) "/reviews/free-birds/36183/"
    ["link"]=>
    string(25) "/movies/free-birds/36183/"
    ["title"]=>
    string(10) "Free Birds"
    ["rating"]=>
    string(3) "(G)"
  }
  [2]=>
  array(5) {
    ["trailer"]=>
    string(30) "/trailers/free-birds-3d/14421/"
    ["reviews"]=>
    string(29) "/reviews/free-birds-3d/37230/"
    ["link"]=>
    string(28) "/movies/free-birds-3d/37230/"
    ["title"]=>
    string(13) "Free Birds 3D"
    ["rating"]=>
    string(3) "(G)"
  }
  [3]=>
  array(5) {
    ["trailer"]=>
    string(45) "/trailers/jackass-presents-bad-grandpa/19318/"
    ["reviews"]=>
    string(44) "/reviews/jackass-presents-bad-grandpa/36493/"
    ["link"]=>
    string(43) "/movies/jackass-presents-bad-grandpa/36493/"
    ["title"]=>
    string(29) "Jackass Presents: Bad Grandpa"
    ["rating"]=>
    string(5) "(14A)"
  }
  [4]=>
  array(5) {
    ["trailer"]=>
    string(27) "/trailers/last-vegas/19291/"
    ["reviews"]=>
    string(26) "/reviews/last-vegas/35853/"
    ["link"]=>
    string(25) "/movies/last-vegas/35853/"
    ["title"]=>
    string(10) "Last Vegas"
    ["rating"]=>
    string(4) "(PG)"
  }
  [5]=>
  array(5) {
    ["trailer"]=>
    string(36) "/trailers/thor-the-dark-world/19327/"
    ["reviews"]=>
    string(35) "/reviews/thor-the-dark-world/32002/"
    ["link"]=>
    string(34) "/movies/thor-the-dark-world/32002/"
    ["title"]=>
    string(20) "Thor: The Dark World"
    ["rating"]=>
    string(4) "(PG)"
  }
  [6]=>
  array(5) {
    ["trailer"]=>
    string(39) "/trailers/thor-the-dark-world-3d/14425/"
    ["reviews"]=>
    string(38) "/reviews/thor-the-dark-world-3d/34705/"
    ["link"]=>
    string(37) "/movies/thor-the-dark-world-3d/34705/"
    ["title"]=>
    string(23) "Thor: The Dark World 3D"
    ["rating"]=>
    string(4) "(PG)"
  }
}