如何使用HTML
中的DOMDocument
PHP
文件中提取信息
我的HTML
页面中包含此部分的来源
这是我需要处理的页面中的第三个表:
<table>
<tbody>
<tr>
<td>A</td>
<td>B</td>
<td>C</td>
<td>D</td>
</tr>
<tr>
<td>1</td>
<td>2</td>
<td>3</td>
<td>4</td>
</tr>
</tbody>
</table>
如果我的用法要求我用B和D显示行,我应该如何提取该表的第一行并使用DOMDocument打印它?
答案 0 :(得分:14)
这样做,它只是抓住第三个表,循环遍历行并检查第二和第四列中的B
和D
。如果找到,它会打印出每个列的值,然后停止循环。
$dom = new DOMDocument();
$dom->loadHTML(.....);
// get the third table
$thirdTable = $dom->getElementsByTagName('table')->item(2);
// iterate over each row in the table
foreach($thirdTable->getElementsByTagName('tr') as $tr)
{
$tds = $tr->getElementsByTagName('td'); // get the columns in this row
if($tds->length >= 4)
{
// check if B and D are found in column 2 and 4
if(trim($tds->item(1)->nodeValue) == 'B' && trim($tds->item(3)->nodeValue) == 'D')
{
// found B and D in the second and fourth columns
// echo out each column value
echo $tds->item(0)->nodeValue; // A
echo $tds->item(1)->nodeValue; // B
echo $tds->item(2)->nodeValue; // C
echo $tds->item(3)->nodeValue; // D
break; // don't check any further rows
}
}
}
答案 1 :(得分:0)
此代码经过我的测试
$table = "<table>
<tbody>
<tr>
<td>A</td>
<td>B</td>
<td>C</td>
<td>D</td>
</tr>
<tr>
<td>1</td>
<td>2</td>
<td>3</td>
<td>4</td>
</tr>
</tbody>
</table>";
$doc = new DOMDocument();
$doc->loadHTML('<?xml encoding="utf-8"?>' . $table);
$rows =$doc->getElementsByTagName('tr');
$tds= $doc->getElementsByTagName('td');
$ths= $doc->getElementsByTagName('th');
foreach ($ths as $th) {
echo "<p> th = ".$th." </p>";
}
foreach ($tds as $td) {
echo "<p> td = ".$td." </p>";
}