在社区网站上工作,将其从ASP转换为PHP。目前,客户每周手动输入电影时间为我们的本地剧院,他们从另一个网站抓取。我想我会尝试自动化这个过程,因为我们正在重做网站,所以我找到了PHP Simple HTML DOM Parser。我一直坚持选择电影的评级(PG,18等)。
这是一个包含一部电影信息的div:
<div class="mshow">
<span style="float:right; font-size:11px;">
<a href="/trailers/enders-game/19330/" title="enders-game movie trailer" style="font-size:11px;">Trailer</a> |
<a href="/reviews/enders-game/30945/" title="Ender's Game movie reviews" style="font-size:11px;">Rating: </a>
<b>Tribute</b>
<img src="/images/stars/4_sm.gif" alt="Current rating: 3.88" border="0" />
</span>
<strong>
<a href="/movies/enders-game/30945/" title="Ender's Game movie info">Ender's Game</a>
</strong>
(PG)<br />
<div class="block"> </div>
<div class="rsd">Fri, Nov 15: </div>
<div class="rst" >7:00pm 9:20pm </div><br />
<div class="rsd">Sat, Nov 16: </div>
<div class="rst" >1:00pm 3:15pm 7:00pm 9:20pm </div><br />
<div class="rsd">Sun, Nov 17: </div>
<div class="rst" >1:00pm 3:15pm 7:00pm 9:20pm </div><br />
<div class="rsd">Mon, Nov 18: </div>
<div class="rst" >7:00pm 9:20pm </div><br />
<div class="rsd">Tue, Nov 19: </div>
<div class="rst" >7:00pm 9:20pm </div><br />
<div class="rsd">Wed, Nov 20: </div>
<div class="rst" >7:00pm 9:20pm </div><br />
<div class="rsd">Thu, Nov 21: </div>
<div class="rst" >7:00pm 9:20pm </div><br />
</div>
到目前为止,这是我的代码:
<?php
include_once('../simple_html_dom.php');
$html = file_get_html('http://www.tribute.ca/showtimes/theatres/may-cinema-6/mayc5/?datefilter=-1');
$movies = array();
foreach ($html->find("div.mshow") as $movie) {
$item['trailer'] = $movie->find('a', 0)->href;
$item['reviews'] = $movie->find('a', 1)->href;
$item['link'] = $movie->find('a', 2)->href;
$item['title'] = $movie->find('a', 2)->plaintext;
$movies[] = $item;
}
var_dump($movies);
?>
我无法弄清楚如何抓住(PG)。有什么建议吗?
编辑:这有效,但似乎不是一个很好的解决方案。
function parseDOM($url) {
$movies = array();
foreach ($url->find("div.mshow") as $movie) {
$item['trailer'] = $movie->find('a', 0)->href;
$item['reviews'] = $movie->find('a', 1)->href;
$item['link'] = $movie->find('a', 2)->href;
$item['title'] = $movie->find('a', 2)->plaintext;
$info = $movie->plaintext;
preg_match('/\((.*?)\)/', $info, $matches);
$item['rating'] = $matches[1];
$movies[] = $item;
}
return $movies;
}
答案 0 :(得分:1)
不幸的是,简单HTML DOM 库是bad choice。它不支持完整的XPath查询,也没有似乎兄弟节点选择器。
使用内置的DOM模块,您可以轻松实现所需的目标:
$dom = new DOMDocument;
@$dom->loadHTMLFile('http://www.tribute.ca/showtimes/theatres/may-cinema-6/mayc5/?datefilter=-1');
$xpath = new DOMXPath($dom);
$movies = array();
foreach ($xpath->query("//div[@class='mshow']") as $movie) {
$item = array();
$links = $xpath->query('.//a', $movie);
$item['trailer'] = $links->item(0)->getAttribute('href');
$item['reviews'] = $links->item(1)->getAttribute('href');
$item['link'] = $links->item(2)->getAttribute('href');
$item['title'] = $links->item(2)->nodeValue;
$item['rating'] = trim($xpath->query('.//strong/following-sibling::text()',
$movie)->item(0)->nodeValue);
$movies[] = $item;
}
var_dump($movies);
这给了我以下内容:
array(7) { [0]=> array(5) { ["trailer"]=> string(28) "/trailers/enders-game/19330/" ["reviews"]=> string(27) "/reviews/enders-game/30945/" ["link"]=> string(26) "/movies/enders-game/30945/" ["title"]=> string(12) "Ender's Game" ["rating"]=> string(4) "(PG)" } [1]=> array(5) { ["trailer"]=> string(27) "/trailers/free-birds/19436/" ["reviews"]=> string(26) "/reviews/free-birds/36183/" ["link"]=> string(25) "/movies/free-birds/36183/" ["title"]=> string(10) "Free Birds" ["rating"]=> string(3) "(G)" } [2]=> array(5) { ["trailer"]=> string(30) "/trailers/free-birds-3d/14421/" ["reviews"]=> string(29) "/reviews/free-birds-3d/37230/" ["link"]=> string(28) "/movies/free-birds-3d/37230/" ["title"]=> string(13) "Free Birds 3D" ["rating"]=> string(3) "(G)" } [3]=> array(5) { ["trailer"]=> string(45) "/trailers/jackass-presents-bad-grandpa/19318/" ["reviews"]=> string(44) "/reviews/jackass-presents-bad-grandpa/36493/" ["link"]=> string(43) "/movies/jackass-presents-bad-grandpa/36493/" ["title"]=> string(29) "Jackass Presents: Bad Grandpa" ["rating"]=> string(5) "(14A)" } [4]=> array(5) { ["trailer"]=> string(27) "/trailers/last-vegas/19291/" ["reviews"]=> string(26) "/reviews/last-vegas/35853/" ["link"]=> string(25) "/movies/last-vegas/35853/" ["title"]=> string(10) "Last Vegas" ["rating"]=> string(4) "(PG)" } [5]=> array(5) { ["trailer"]=> string(36) "/trailers/thor-the-dark-world/19327/" ["reviews"]=> string(35) "/reviews/thor-the-dark-world/32002/" ["link"]=> string(34) "/movies/thor-the-dark-world/32002/" ["title"]=> string(20) "Thor: The Dark World" ["rating"]=> string(4) "(PG)" } [6]=> array(5) { ["trailer"]=> string(39) "/trailers/thor-the-dark-world-3d/14425/" ["reviews"]=> string(38) "/reviews/thor-the-dark-world-3d/34705/" ["link"]=> string(37) "/movies/thor-the-dark-world-3d/34705/" ["title"]=> string(23) "Thor: The Dark World 3D" ["rating"]=> string(4) "(PG)" } }