Question

我使用Simple HTML DOM解析器来执行此操作。基本上，我尝试在<p>和<table class="first">之间提取所有<div class="second">代码：

<div id="main>
   <table class="first">
   <p>
   <p>
   <p>
   <div class="second">
   <p>
</div>

在这种情况下，有3 <p> s，但有时可能只有两个甚至一个。任何<p>都可以包含id或class。有人能指出我正确的方向吗？

Answer 1

不幸的是，这不能直接使用simple-html-dom ...

workarround将从起始节点开始（即table.first）并获取以下所有兄弟节点（或类型为X的节点[由您自己指定哪个，如果需要]）直到结束节点（即div.second）

这是一个正常工作的代码:(我修改了输入以获取有效的html代码）

$input =  <<<_DATA_
    <div id="main">
        <p>p1</p>

        <table class="first">
            <tr>
                <td>
                    <p>pInTable</p>
                </td>
            </tr>
        </table>

        <p>p2</p>
        <p>p3</p>
        <p>p4</p>

        <div class="second">MyDiv</div>

        <p>p5</p>
    </div>
_DATA_;

// Create a DOM object
$html = new simple_html_dom();
// Load HTML from a string
$html->load($input);

// Get the starting node
$startPoint = $html->find('table.first', 0);

// While the current node has a sibling
while ( $next = $startPoint->next_sibling() ) {
    // And as long as it's different from the end node => div.second
    if ( $next->tag == 'div' && $next->class == 'second' )
        break;
    else{
        // Print the content
        echo $next->plaintext;
        echo '<br/>';
        // And move to the next node
        $startPoint = $next;
    }
}

<强>输出

p2
p3
p4

Answer 2

你可以在manuals page at the section 'descendant'中找到很多很好的例子

//to parse a webpage
$html = file_get_html("your website");

//only for the table
$tablePs = $html->find('table[class="first"] p'); 

//only for the container div
$divPsContainer = $html->find('div[id="main"] p'); 

$divPsSecond = $html->find('div[class="second"] p'); 

//for all in the div
//then you iterate

foreach($tablePs as $p){
..
}

foreach($divPsSecond as $p){
...
}

foreach($divPsContainer as $p){
...
}
?>

simplehtmldom我如何在两个元素之间提取

2 个答案: