Question

If I have a webpage like this:

<body>
  <header>
    <a href='http://domain1.com'>link 1 text</a>
  </header>

  <a href='http://domain2.com'>link 2 text</a>

  <footer>
    <a href='http://domain3.com'>link 3 text</a>
  </footer>
</body>

How do I pull the <a> tags out of the <body> but exclude the links from <header> and <footer>?

In the real web page, there will be a lot of <a> tags in the <header> so I'd rather not have to cycle through ALL of them.

I want to pull out the URLs and anchor text from each of the <a> tags that are NOT inside the <header> or <footer> tags.

EDIT: this is how I find links in the header:

$header = $html->find('header',0);
foreach ($header->find('a') as $a){
  do something
}

I would like to do this (note the use of "!")

$foo = $html->find('!header,!footer');
foreach ($foo->find('a') as $a){
  do something
}

Answer 1

在查找链接之前，从正在使用的DOM中删除页眉和页脚。

<?php
    include("simple_html_dom.php");
    $source = <<<EOD
    <body>
        <header>
            <a href='http://domain1.com'>link 1 text</a>
        </header>

        <a href='http://domain2.com'>link 2 text</a>

        <a href='http://domain4.com'>link 4 text</a>

        <footer>
            <a href='http://domain3.com'>link 3 text</a>
        </footer>
    </body>
EOD;

    $html = str_get_html($source);
    foreach ($html->find('header, footer') as $unwanted) {
        $unwanted->outertext = "";
    }
    $html->load($html->save()); 
    $links = $html->find("a");
    foreach ($links as $link) {
        print $link;
};

?>

Answer 2

不破坏身体？你可以这样做：

$bad_as = $html->find('header a, footer a');
foreach($html->find('a') as $a){
  if(in_array($a, $bad_as)) continue;
  // do something
}

Answer 3

简单的html-dom是不可能的，当然这很简单。你不能用simple-html-dom来做到这一点。

$html->find('body > a');

此Css选择器选择父级为<a>元素的所有<body>个元素您需要遍历body的子节点，然后获取<a>

我建议查看How do you parse and process HTML/XML in PHP?

就我而言，我使用Symfony / DomCrawler和Symfony / CssSelector来做这件事。

如何获得<a> tags in but exclude header and footer sections

3 个答案: