使用PHP从DOM中的表中提取值

时间:2013-01-12 18:07:15

标签: php dom

我使用curl来获取HTML文件,我正在尝试使用DOM来获取我需要的内容。

我已经尝试了所有东西(从我的角度来看),但它并没有像我想的那样工作。

假设我有这个:

html = '<table cellpadding="0" cellspacing="0" id="uniq">
        <tr>
            <th>text</th>
            <td class="1_th">a1</td>
            <td class="1_td">b1</td>
        </tr>
        <tr>
            <th rowspan="3">text1</th>
            <td class="1_th">a2</td>
            <td class="1_td">b2</td>
        </tr>
        <tr>
            <td class="1_th">a3</td>
            <td class="1_td">b3</td>
        </tr>
        <tr>
            <td class="1_th">a4</td>
            <td class="1_td">b4</td>
        </tr>
        <tr>
            <th rowspan="2">text2</th>
            <td class="1_th">a5</td>
            <td class="1_td">b5</td>
        </tr>
        <tr>
            <td class="1_th">a6</td>
            <td class="1_td">b7</td>
        </tr>
    </table>'

我希望能够用PHP回应这个:

text - a1 -b1
text1 - a2 -b2
text1 -a3 -b3
text1 -a4 -b4
text2 -a5 -b5
text2 -a6 -b6

表格很大,th的变量行间距在15到20之间。我想echo,因为我想在MySQL中插入这些值。

我试过了:

$dom = new DOMDocument();
@$dom->loadHTML($html);
$x = new DOMXPath($dom);
$table = $x->query('//*[@id="uniq"]')->item(0);
$rows = $table->getElementsByTagName("tr");
foreach ($rows as $row) {
    $tds = $row->nodeValue;
    echo $th;
}

没关系,我找到了我需要的解决方案,谢谢你试图帮助我

这是我做的,对我来说没问题:

$dom = new DOMDocument();
    @$dom->loadHTML($html);
    $x = new DOMXPath($dom); 


$table = $x->query("//*[@id='item_specification']/tr");
$rows = $table;
foreach ($rows as $row) {
 $atr_name = $row -> getElementsByTagName('td')->item(0)->nodeValue;
 $atr_val = $row -> getElementsByTagName('td')->item(1)->nodeValue;
 $cell1 = $row -> getElementsByTagName('th');
`$ifth = $cell1->length;
`if ($ifth == 1) {
$atr_cat = $row->getElementsByTagName('th')->item(0)->nodeValue;
}
  echo "{$atr_cat} - {$atr_name} - {$atr_val} <br \>";  
}

1 个答案:

答案 0 :(得分:0)

尝试使用strip_tags(),如下所示:

<?php
$html = '<table cellpadding="0" cellspacing="0" id="uniq">
<tr>
<th>text</th>
<td class="1_th">a1</td>
<td class="1_td">b1</td>
</tr>
<tr>
<th rowspan="3">text1</th>
<td class="1_th">a2</td>
<td class="1_td">b2</td>
</tr>
<tr>
<td class="1_th">a3</td>
<td class="1_td">b3</td>
</tr>
<tr>
<td class="1_th">a4</td>
<td class="1_td">b4</td>
</tr>
<tr>
<th rowspan="2">text2</th>
<td class="1_th">a5</td>
<td class="1_td">b5</td>
</tr>
<tr>
<td class="1_th">a6</td>
<td class="1_td">b7</td>
</tr>
</table>';
$html = strip_tags($html);
echo $html;
?>

Here is a PHPFiddle此功能的实施。

输出:

text a1 b1 text1 a2 b2 a3 b3 a4 b4 text2 a5 b5 a6 b7