Question

我有一个特别的问题，我无法破解。我搜索了每个教程或表单条目，但没有运气成功我需要做什么。所以我的HTML文件：

<html>
 <head>**SOMETHING HERE**</head>
 <body>
  <div>
   <table>
    <thead>
  <tr><th>TEXT/NUM IS HERE</th><th>TEXT/NUM IS HERE</th><th>TEXT/NUM IS HERE</th></tr>
    </thead><tbody>**SOMETHING HERE**</tbody></tfoot>**SOMETHING HERE**</tfoot>
   </table>
  </div>
 </body>
</html>

我需要的是浏览“thead =＆gt; tr”标签中的每个标签（th）并将这些“th”标签之间的值记录到数组中;

为此，我计划使用DOMDocument和DOMXPath。

我试图解决这个问题的方法很多，但大多数网上找到的是：

$file = "index.html";
$dom = new DOMDocument();
$dom->loadHTMLfile($file);
$thead = $dom->getElementsByTagName('thead');
$thead->parentNode;
$th = $thead->getElementsByTagName('th')
echo $th->nodeValue . "\n";

但我仍然遇到很多错误，无法找到办法。有没有办法简单地做这个好结局，当然还有父元素中的foreach元素。

谢谢。

Answer 1

使用DOMXPath：

$html = <<<EOL
<html>
    <head>**SOMETHING HERE**</head>
    <body>
        <div>
            <table>
                <thead>
                    <tr>
                        <th>TEXT/NUM IS HERE</th>
                        <th>TEXT/NUM IS HERE</th>
                        <th>TEXT/NUM IS HERE</th>
                    </tr>
                </thead>
                <tbody>**SOMETHING HERE**</tbody>
                <tfoot>**SOMETHING HERE**</tfoot>
            </table>
        </div>
    </body>
</html>
EOL;

$dom = new DOMDocument();
$dom->loadHTML($html);

$xpath = new DOMXPath($dom);

$nodes = $xpath->query('//table/thead/tr/th');

$data = array();

foreach ($nodes as $node) {
    $data[] = $node->textContent;
}

print_r($data);

Answer 2

<?php
$html = new file_get_html('file.html');
$th = $html->find('thead th');
$array = array();
foreach($th as $text) 
    $array[] = $th->innertext;
?>

这使用了Simple HTML Dom Parser，可以找到here.

Answer 3

如果你想保持它与你所拥有的风格相同（并因此了解你做错了什么），试试这个：

$file = "index.html";
$dom = new DOMDocument();
$dom->loadHTMLfile($file);

$oTHeadList = $dom->getElementsByTagName('thead');

foreach( $oTHeadList as $oThisTHead ){

    $oThList = $oThisTHead->getElementsByTagName('th');

    foreach( $oThList as $oThisTh ) {

        echo $oThisTh->nodeValue . "\n";
    }
}

基本上“getElementsByTagName”返回NodeList而不是Node，所以你必须遍历它们才能到达各个节点。

此外，在您的HTML中，您有一个结束tfoot而不是开放的结果，如果您使用您提供的html文档进行测试，那么头标记内的**SOMETHING HERE**将导致警告被抛出（与任何其他无效的HTML一样）。

如果你想抑制警告加载你可以添加一个'@'，但是在你的代码周围过多地加上那个符号并不是一个好主意。

@$dom->loadHTMLfile($file);

用php简单的HTML文件解析器

3 个答案: