有人可以帮助我。
我试图从某个页面获取html看起来像这样的信息。
<div class="block">
<h2>Season 1</h2>
<div class="episode"><a href="somelink.com">Episode 1</a></div>
<div class="episode"><a href="somelink.com">Episode 2</a></div>
<h2>Season 2</h2>
<div class="episode"><a href="somelink.com">Episode 1</a></div>
</div>
但是我坚持的是每个季节我想用div中的季节剧集将它们包装在div中,例如
<div class="block">
<div class="season">
<h2>Season 1</h2>
<div class="episode"><a href="somelink.com">Episode 1</a></div>
<div class="episode"><a href="somelink.com">Episode 2</a></div>
</div>
<div class="season">
<h2>Season 2</h2>
<div class="episode"><a href="somelink.com">Episode 1</a></div>
</div>
</div>
PHP代码我正在使用
$page = "someurl.com";
$page = $this->curl->get($page);
$dom = new DOMDocument();
@$dom->loadHTML($page);
$divs = $dom->getElementsByTagName('div');
for($i=0;$i<$divs->length;$i++){
if ($divs->item($i)->getAttribute("class")=="block") {
$h2s = $divs->item($i)->getElementsByTagName('h2');
if (count($h2s) > 0) {
foreach ($h2s as $h2) {
// Stuck at this point
}
}
}
}
我如何在PHP DOM中执行此操作有人可以请给我一个示例谢谢。
答案 0 :(得分:1)
下面的代码将<h2>
及其.episode
个兄弟姐妹包装在.season
容器中
$page = '<div class="block">
<h2>Season 1</h2>
<div class="episode"><a href="s1ep1.com">Episode 1</a></div>
<div class="episode"><a href="s1ep2.com">Episode 2</a></div>
<h2>Season 2</h2>
<div class="episode"><a href="s2ep1.com">Episode 1</a></div>
<div class="episode"><a href="s2ep1.com">Episode 2</a></div>
</div>';
$dom = new DOMDocument();
$origVal = libxml_use_internal_errors(true);
@$dom->loadHTML($page);
libxml_clear_errors();
libxml_use_internal_errors($origVal);
//create a tmeplate 'season' div
$season = $dom->createElement('div');
$season->setAttribute('class', 'season');
//get all '.block' divs using xpath
$xpath = new DOMXPath($dom);
$divs = $xpath->query("//*[@class='block']");
$clones = array();
$clone = '';
foreach($divs as $currDiv) {
//check if the 'block' contains any <h2> elemnts, if not, skip this block
if(!count($currDiv->getElementsByTagName('h2'))) {
continue;
}
foreach($currDiv->childNodes as $child) {
if(in_array($child->nodeName, array(
'#text',
'#comment'
))
) {
//ignore white space (and text content), and comments in 'block' div
continue;
}
if($child->nodeName == 'h2') {
if($clone) {
//save all clones of 'season' template div in an array for further use
$clones[] = $clone;
}
$clone = $season->cloneNode(true);
}
//this is the tricky part. If we do not append a clone of original div, then it actually moves the div to $clone. This changes HTML structure and disrupts the current loop
//so we append the clones of child to the 'season' div
if($child->nodeName == 'h2' || $child->getAttribute('class') == 'episode') {
$clone->appendChild($child->cloneNode(true));
}
}
$clones[] = $clone;
//remove all children of current 'block' div
while($currDiv->childNodes->length) {
$currDiv->removeChild($currDiv->firstChild);
}
//isnert all 'season' nodes in it
foreach($clones as $c) {
$currDiv->appendChild($c);
}
}
echo $dom->saveHTML();