所以,有我的代码:
<div id="first">
<div id="third">Lorem</div>
Lorem Ipsum Dolorez [...]
<script></script>
....
<div id="second">
Lorem Ipsum[...]
<a href=""/>
</div>
....
</div>
我需要得到Lorem Ipsum Dolorez [...]
,它位于两个div块之间一个div
块和一个script
块,Lorem Ipsum[...]
这是div内部,但没有超链接。
我尝试使用simple_html_dom.php
,但我无法弄清楚如何做到这一点。
编辑:这是一个网站 - 我无法更改此代码。
答案 0 :(得分:1)
您可以使用DOM library和xpath选择这些节点:(注释中嵌入了解释)
$html = '
<div id="first">
<div id="third">Lorem</div>
Lorem Ipsum Dolorez [...]
<script></script>
this never gets picked up
<div id="second">
Lorem Ipsum[...]
<a href=""></a>
<span> this span is extraced since its not an anchor element </span>
</div>
</div>';
$doc = new DOMDocument;
$doc->loadHTML($html);
$xpath = new DOMXPath($doc);
$first_lorem = $xpath->query('//div[@id="first"]/div[@id="third"]/following-sibling::text()[following::script]');
// first, find the div#first and inside that a div#third ...
// ... and take text node siblings of that div ...
// ... if those siblings have a script node following them (so if there's a <script> after them)
$first_lorem_html = '';
// loop the results and concat the html output
foreach ($first_lorem as $node) {
$first_lorem_html .= $doc->saveHTML($node);
}
print $first_lorem_html;
// get the every child of div#second except the ones named 'a'
$second_lorem = $xpath->query('//div[@id="second"]/node()[name() != "a"]');
$second_lorem_html = '';
foreach ($second_lorem as $node) {
$second_lorem_html .= $doc->saveHTML($node);
}
print $second_lorem_html;
答案 1 :(得分:0)
尝试使用strip_tags php函数。例如:
echo strip_tags('<div id="second">Lorem Ipsum[...]<a href=""/></div>');
返回:
Lorem Ipsum [...]
答案 2 :(得分:0)
根据simple_html_dom参考:http://simplehtmldom.sourceforge.net/
您可以这样做:
$html->find('div[id=third]', 0)->plaintext