Question

好吧，我有一个HTML文件，结构如下：

<h3>Heading 1</h3>
  <table>
   <!-- contains a <thead> and <tbody> which also cointain several columns/lines-->
  </table>
<h3>Heading 2</h3>
  <table>
   <!-- contains a <thead> and <tbody> which also cointain several columns/lines-->
  </table>

我想得到JUST第一个包含所有内容的表。所以我将加载HTML文件

<?php 
  $dom = new DOMDocument();
  libxml_use_internal_errors(true);
  $dom->loadHTML(file_get_contents('http://www.example.com'));
  libxml_clear_errors();
?>

所有表格都具有相同的类别，并且没有特定的ID。这就是为什么我能想到的唯一方法就是抓住h3-tag的值为＃34;标题1＆＃34;。我已经找到this one，这对我很有用。（考虑到可以添加其他表格和标题的事实使解决方案不利） 我怎么能抓住值为＃34的h3标签;标题1＆＃34;？ +我如何选择下表？

编辑＃1：我无法访问HTML文件，因此我无法对其进行编辑。
编辑＃2：我的解决方案（感谢Martin Henriksen）现在是：

<?php
    $doc = new DOMDocument(1.0);
    libxml_use_internal_errors(true);
    $doc->loadHTML(file_get_contents('http://example.com'));
    libxml_clear_errors();
    foreach($doc->getElementsByTagName('h3') as $element){
      if($element->nodeValue == 'exampleString')
        $table = $element->nextSibling->nextSibling;
        $innerHTML= '';
        $children = $table->childNodes;
        foreach ($children as $child) {
          $innerHTML .= $child->ownerDocument->saveXML( $child );
        }
        echo $innerHTML;
        file_put_contents("test.xml", $innerHTML);
    }
  ?>

Answer 1

您可以使用 simple_html_dom.php 类在HTML中查找任何标记，您可以从此链接下载此文件https://sourceforge.net/projects/simplehtmldom/?source=typ_redirect

比

<?php
include_once('simple_html_dom.php');

$htm  = "**YOUR HTML CODE**";
$html = str_get_html($htm);
$h3_tag = $html->find("<h3>",0)->innertext;
echo "HTML code in h3 tag"; 
print_r($h3_tag);
?>

Answer 2

您可以抓取标记为DomElements的所有h3，并通过访问nodeValue来检查其保留的值。找到h3标记后，您可以按nextSibling选择DomTree中的下一个元素。

foreach($dom->getElementsByTagName('h3') as $element)
{
    if($element->nodeValue == 'Heading 1')
        $table = $element->nextSibling;
}

如何找到具有特定值的h3标签

2 个答案: