我正在尝试使用DOM解析HTML表格并且它工作正常但是当某些单元格包含html时它无法正常工作。
以下是示例HTML表格
<tr>
<td>Razon Social: </td>
<td>Circulo Inmobiliaria Sur (Casa Central)</td>
</tr>
<tr>
<td>Email: </td>
<td> <img src="generateImage.php?email=myemail@domain.com"/> </td>
</tr>
和PHP代码:
$rows = $dom->getElementsByTagName('tr');
foreach ($rows as $row)
{
$cells = $row->getElementsByTagName('td');
if(strpos($cells->item(0)->textContent, "Razon") > 0)
{
$_razonSocial = $cells->item(1)->textContent;
}
else if(strpos($cells->item(0)->textContent, "Email") > 0)
{
$_email = $cells->item(1)->textContent;
}
}
echo "Razon Social: $_razonSocial<br>Email: $_email";
输出:
Razon Social: Circulo Inmobiliaria Sur (Casa Central)
Email:
电子邮件是空的,必须是:
<img src="generateImage.php?email=myemail@domain.com"/>
我甚至尝试过
$cells->item(1)->nodeValue;
而不是
$cells->item(1)->textContent;
但这也行不通。如何让它返回HTML值?
答案 0 :(得分:0)
将item作为item_specification
提供给您的表 $dom = new DOMDocument();
@$dom->loadHTML($html);
$x = new DOMXPath($dom);
$table = $x->query("//*[@id='item_specification']/tr");
$rows = $table;
foreach ($rows as $row) {
$atr_name = $row -> getElementsByTagName('td')->item(0)->nodeValue;
$atr_val = $row -> getElementsByTagName('td')->item(1)->nodeValue;
}
echo " {$atr_name} - {$atr_val} <br \>";
工作正常。
答案 1 :(得分:0)
正如我已经提到的,<img src="generateImage.php?email=myemail@domain.com"/>
不是文本。这是另一个HTML实体。所以试试这个:
if(strpos($cells->item(0)->textContent, "Razon") !== false) {
$_razonSocial = $cells->item(1)->textContent;
} else if(strpos($cells->item(0)->textContent, "Email") !== false) {
$count = 0;
// here we get all child nodes of td.
// space before img-tag is also a child node, but it has type DOMText
// so we skip it.
foreach ($cells->item(1)->childNodes as $child) {
if (++$count == 2)
$_email = $child->getAttribute('src');
}
// now in $_email you have full src value and can somehow extract email
}