从特定链接提取链接文本

时间:2013-05-14 01:44:34

标签: php domdocument

我正在试图弄清楚如何只从page获得电影的标题。

我有这个,但我无法让它发挥作用。我也不太了解DomDocument。这当前获取页面上的所有链接。但是,我需要获取列出的电影标题的链接。

$content =  file_get_contents("http://www.imdb.com/movies-in-theaters/");

$dom = new DomDocument();
$dom->loadHTML($content);
$urls = $dom->getElementsByTagName('a');

1 个答案:

答案 0 :(得分:2)

$dom = new DomDocument();
@$dom->loadHTMLFile('http://www.imdb.com/movies-in-theaters/');
$urls = $dom->getElementsByTagName('a');
$titles = array();

foreach ($urls as $url)
{
    if ('overview-top' === $url->parentNode->parentNode->getAttribute('class'))
        $titles[] = $url->nodeValue;
}

print_r($titles);

将输出:

Array
(
    [0] =>  Star Trek Into Darkness (2013)
    [1] =>  Frances Ha (2012)
    [2] =>  Stories We Tell (2012)
    [3] =>  Erased (2012)
    [4] =>  The English Teacher (2013)
    [5] =>  Augustine (2012)
    [6] =>  Black Rock (2012)
    [7] =>  State 194 (2012)
    [8] =>  Iron Man 3 (2013)
    [9] =>  The Great Gatsby (2013)
    [10] =>  Pain & Gain (2013)
    [11] =>  Peeples (2013)
    [12] =>  42 (2013)
    [13] =>  Oblivion (2013)
    [14] =>  The Croods (2013)
    [15] =>  The Big Wedding (2013)
    [16] =>  Mud (2012)
    [17] =>  Oz the Great and Powerful (2013)
)

您也可以使用XPath来执行此操作,但我不太清楚这样做。