Question

由于我现在无法理解的一些奇怪的原因，我无法从页面中的一个表中获取超过3行

这是页面。

http://www.reedmfgco.com/en/products/cutters-and-cutter-wheels/cutter-wheels/cutter-wheels-for-tubing-cutters-plastic/

我想解析底部的表格。

由于页面中只有一个表格，因此我的Xpath非常简单。$xpath -> query('//tr')

如果我执行以下操作

echo $xpath -> query('//tr')->lenght;

我得到3

为什么我得到3那里有9行，我应该9。

编辑这是我使用的代码

$Dom = new DOMDocument();
@$Dom -> loadHTML($this->html);
$xpath = new DOMXPath($Dom);
echo $xpath -> query('//tr')->lenght;

请注意，$ this-＆gt; html是我帖子中上一个链接的原始html。

Answer 1

此页面上的HTML源代码对XML无效。如果您打开页面的源代码并查找标记<tr>，它还有3个元素。表格行产品没有开头标记<tr>

对于此问题，您可以使用正则表达式来规范化表的内容。

$html = file_get_contents('http://www.reedmfgco.com/en/products/cutters-and-cutter-wheels/cutter-wheels/cutter-wheels-for-tubing-cutters-plastic/');

preg_match('`<tbody>(.*)<\/tbody>`', $html, $matches);
if (!empty($matches)) {
    $tableBody = str_replace('</tr><td', '</tr><tr><td', $matches[1]);
}

解析表，使用DOMXpath不能超过3行

1 个答案: