我正在试图弄清楚如何只从page获得电影的标题。
我有这个,但我无法让它发挥作用。我也不太了解DomDocument。这当前获取页面上的所有链接。但是,我需要获取列出的电影标题的链接。
$content = file_get_contents("http://www.imdb.com/movies-in-theaters/");
$dom = new DomDocument();
$dom->loadHTML($content);
$urls = $dom->getElementsByTagName('a');
答案 0 :(得分:2)
$dom = new DomDocument();
@$dom->loadHTMLFile('http://www.imdb.com/movies-in-theaters/');
$urls = $dom->getElementsByTagName('a');
$titles = array();
foreach ($urls as $url)
{
if ('overview-top' === $url->parentNode->parentNode->getAttribute('class'))
$titles[] = $url->nodeValue;
}
print_r($titles);
将输出:
Array
(
[0] => Star Trek Into Darkness (2013)
[1] => Frances Ha (2012)
[2] => Stories We Tell (2012)
[3] => Erased (2012)
[4] => The English Teacher (2013)
[5] => Augustine (2012)
[6] => Black Rock (2012)
[7] => State 194 (2012)
[8] => Iron Man 3 (2013)
[9] => The Great Gatsby (2013)
[10] => Pain & Gain (2013)
[11] => Peeples (2013)
[12] => 42 (2013)
[13] => Oblivion (2013)
[14] => The Croods (2013)
[15] => The Big Wedding (2013)
[16] => Mud (2012)
[17] => Oz the Great and Powerful (2013)
)
您也可以使用XPath来执行此操作,但我不太清楚这样做。