PHP - SimpleHTMLDom - 如何访问表元素?

时间:2014-01-06 11:51:13

标签: php web-scraping simple-html-dom

我正在尝试使用simplehtmldom基于metacritic获取每个专辑的艺术家 - http://www.metacritic.com/browse/albums/release-date/coming-soon/date?view=detailed

艺术家姓名包含在单独的td元素中,这些元素的名称为artistName

到目前为止,我设法弄明白了

    $html = file_get_html('http://www.metacritic.com/browse/albums/release-date/coming-soon/date?view=detailed');
    $es = $html->find('table.musicTable td');

我从哪里开始?我发现示例和文档有点令人困惑。任何帮助都将非常感激。

2 个答案:

答案 0 :(得分:1)

我建议使用PHP:DOM扩展 DOM manual here

这是一个非常强大的工具,用于解析和操作XML或HTML文档

对于你的情况,你可以这样做

<?php
$html = file_get_contents('http://www.metacritic.com/browse/albums/release-date/coming-soon/date?view=detailed');
$doc = new DOMDocument();
$doc->loadHTML($html);
$searchNode = $doc->getElementsByTagName("table"); 

foreach( $searchNode as $searchNode ) 
{ 
    //do your things here
} 
?>

甚至可以使用xpath来查询文档节点

Xpath usage

答案 1 :(得分:1)

每个名称都包含在<td class="artistName">内的锚点中,这就是创建以下代码所需的全部内容:

$url = "http://www.metacritic.com/browse/albums/release-date/coming-soon/date?view=detailed";

//Create a DOM object
$html = new simple_html_dom();
// Load HTML from a string
$html->load_file($url);

// Find the anchor containing the name inside all "td.artistName" elements
$anchors = $html->find('td.artistName a');

// loop through all found anchors and print the content
foreach($anchors as $anchor) {

    $name = $anchor->plaintext;

    echo $name . "<br>";
}

// Clear DOM object
$html->clear();
unset($html);

输出

Peter Gabriel 
Stephen Malkmus & The Jicks 
TOY 
Black Knights 
Broken Bells 
Bruce Springsteen 
David Broza 
Eskimo Callboy 
...

Working DEMO

请阅读MANUAL了解更多示例和详细信息