PHP DOM nodeValue不起作用

时间:2013-11-14 06:09:46

标签: php html dom

我正在尝试使用DOM解析HTML表格并且它工作正常但是当某些单元格包含html时它无法正常工作。

以下是示例HTML表格

<tr>
<td>Razon Social: </td>
<td>Circulo Inmobiliaria Sur (Casa Central)</td>
</tr>

<tr>
<td>Email: </td>
<td> <img src="generateImage.php?email=myemail@domain.com"/> </td>
</tr>

和PHP代码:

$rows = $dom->getElementsByTagName('tr');

foreach ($rows as $row)   
{
    $cells = $row->getElementsByTagName('td');

    if(strpos($cells->item(0)->textContent, "Razon") > 0)
    {
        $_razonSocial = $cells->item(1)->textContent;
    }
    else if(strpos($cells->item(0)->textContent, "Email") > 0)
    {
        $_email = $cells->item(1)->textContent;
    }
}   

echo "Razon Social: $_razonSocial<br>Email: $_email";

输出:

Razon Social: Circulo Inmobiliaria Sur (Casa Central) 
Email: 

电子邮件是空的,必须是:

<img src="generateImage.php?email=myemail@domain.com"/>

我甚至尝试过

$cells->item(1)->nodeValue;

而不是

$cells->item(1)->textContent;

但这也行不通。如何让它返回HTML值?

2 个答案:

答案 0 :(得分:0)

将item作为item_specification

提供给您的表
 $dom = new DOMDocument();
        @$dom->loadHTML($html);
        $x = new DOMXPath($dom); 


    $table = $x->query("//*[@id='item_specification']/tr");
    $rows = $table;
    foreach ($rows as $row) {
     $atr_name = $row -> getElementsByTagName('td')->item(0)->nodeValue;
     $atr_val = $row -> getElementsByTagName('td')->item(1)->nodeValue;
     }

echo " {$atr_name} - {$atr_val} <br \>";

工作正常。

答案 1 :(得分:0)

正如我已经提到的,<img src="generateImage.php?email=myemail@domain.com"/>不是文本。这是另一个HTML实体。所以试试这个:

if(strpos($cells->item(0)->textContent, "Razon") !== false) {
    $_razonSocial = $cells->item(1)->textContent;
} else if(strpos($cells->item(0)->textContent, "Email") !== false) {
    $count = 0;
    // here we get all child nodes of td.
    // space before img-tag is also a child node, but it has type DOMText
    // so we skip it.
    foreach ($cells->item(1)->childNodes as $child) {
        if (++$count == 2)
            $_email = $child->getAttribute('src');
    }
    // now in $_email you have full src value and can somehow extract email
}