如何使用PHP DOM OBJECT提取一些内容?

时间:2012-03-16 07:13:52

标签: php html parsing dom object

我有一个php DOM对象的问题 http://php.net/manual/en/class.domdocument.php

是否只能显示该表中第三个标记和第二个标记的内容?

/*** a new dom object ***/ 
$dom = new domDocument; 

/*** load the html into the object ***/ 
@$dom->loadHTML($html); 

/*** discard white space ***/ 
$dom->preserveWhiteSpace = false; 

/*** the table by its tag name ***/ 
$tables = $dom->getElementsByTagName('table'); 

/*** get all rows from the table ***/ 
$rows = $tables->item(0)->getElementsByTagName('tr'); 

/*** loop over the table rows ***/ 
foreach ($rows as $row) 
{ 
    /*** get each column by tag name ***/ 
    $cols = $row->getElementsByTagName('td'); 

    /*** echo the values ***/ 
    echo $cols->item(0)->nodeValue.'<br />'; 
    echo $cols->item(1)->nodeValue.'<br />'; 
    echo $cols->item(2)->nodeValue.'<br />'; 
    echo $cols->item(3)->nodeValue.'<br />';
    echo $cols->item(4)->nodeValue.'<br />';
    echo $cols->item(5)->nodeValue.'<br />';
    echo '<hr />'; 
} 

编辑:

我收到此错误:致命错误:无法使用DOMNodeList类型的对象作为

中的数组
<?php

/*** a new dom object ***/ 
$dom = new domDocument; 

/*** load the html into the object ***/ 
@$dom->loadHTML('content.html'); 

/*** discard white space ***/ 
$dom->preserveWhiteSpace = false; 

$xpath = new DOMXPath($dom);

$selected = $xpath->query('//table/tr/td[first()+1]');
echo $selected[0]->nodeValue;
?>

EDIT2:

<?php

$output = file_get_contents('test.php');

/*** a new dom object ***/ 
$dom = new domDocument; 

/*** load the html into the object ***/ 
@$dom->loadHTML($output); 

/*** discard white space ***/ 
$dom->preserveWhiteSpace = false; 

/*** the table by its tag name ***/ 
$tables = $dom->getElementsByTagName('table');//get all the tables

if($tables->length > 2) { //check there are more than 2

    $thirdTable = $tables->item(2);

    $cols = $thirdTable->getElementsByTagName('td'); 

    /*** echo the values ***/ 
    echo $cols->item(0)->nodeValue.'<br />'; 
    echo $cols->item(1)->nodeValue.'<br />'; 
    echo $cols->item(2)->nodeValue.'<br />'; 
    echo $cols->item(3)->nodeValue.'<br />';
    echo $cols->item(4)->nodeValue.'<br />';
    echo $cols->item(5)->nodeValue.'<br />';
    echo '<hr />'; 
}

?>

EDIT3 - 此代码仅显示第三个表标记中的内容。但它也只需显示第三个表中第二个tr标记的内容。

$html = file_get_contents('content.html');

/*** a new dom object ***/ 
$dom = new domDocument; 

/*** load the html into the object ***/ 
@$dom->loadHTML($html); 

/*** discard white space ***/ 
$dom->preserveWhiteSpace = false; 

/*** the table by its tag name ***/ 
$tables = $dom->getElementsByTagName('table'); 

/*** get all rows from the table ***/ 
$rows = $tables->item(2)->getElementsByTagName('tr')->item(1); 

/*** loop over the table rows ***/ 
foreach ($rows as $row) 
{ 
    /*** get each column by tag name ***/ 
    $cols = $row->getElementsByTagName('td'); 

    /*** echo the values ***/ 
    echo $cols->item(0)->nodeValue.'<br />'; 
    echo $cols->item(1)->nodeValue.'<br />'; 
    echo $cols->item(2)->nodeValue.'<br />'; 
    echo $cols->item(3)->nodeValue.'<br />';
    echo $cols->item(4)->nodeValue.'<br />';
    echo $cols->item(5)->nodeValue.'<br />';
    echo '<hr />'; 
}

2 个答案:

答案 0 :(得分:2)

我不明白你的问题。使用$cols->item(2),您获得了所需的第二个DOMElement。

如果您只想要第一个(或第二个......),您可以使用XPath

$xpath = new DOMXpath($document);
$selected = $xpath->query('//table/tr/td[first()+1] | //table/tr/td[first()+2]');
echo $selected[0]->nodeValue;

如果您不想使用DOMXPath,则可以使用getElementsByTagName 首先你得到所有的表格 然后你检查有超过2个 然后你拿第三个 然后你拿tr元素 你在数组中保留第二个和第三个

$tables = $dom->getElementsByTagName('table');//get all the tables
if($tables->length > 2){//check there are more than 2
    $thirdTable = $tables->item(2);
    //get the tr then td
}

答案 1 :(得分:1)

您正尝试在foreach上使用DOMNodeList。这是一个对象,而不是一个数组。您需要使用for loop来迭代它:

$tables = $dom->getElementsByTagName('table');
if( $tables->length < 3 ) {
  // Ahh crap! There is no third table!
}
$thirdTable = $tables->item(2);
$rows = $thirdTable->getElementsByTagName('tr');
for( $i = 0; $i < $rows->length; $i++ ) {
  $row = $rows->item( $i );
  $cols = $row->getElementsByTagName('td');
  $secondTd = $row->item( 1 );
  $thirdTd = $row->item( 2 );
}