检查下一个节点xpath

时间:2017-05-17 09:11:49

标签: php xpath

我遇到了我的代码,我需要转换这个简单的html

<p>
    Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod
</p>
<p>
    <img src="xxxxx" />
</p>
<div class="sourceimg">azerrty</div>
<p>
    <img src="xxxxx">
</p>

<p>
    Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod
</p>
<p>
    <img src="xxxxx">
</p>
<div class="sourceimg">qwerty</div>
<p>
    Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod
</p>

<p>
    Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod
</p>
<figure>
    <img src="xxxxx" />
    <figcaption>
        <cite>
            azerrty
        </cite>
    </figcaption>
</figure>
<p>
    Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod
</p>
<figure>
    <img src="xxxxx">
</figure>
<figure>
    <img src="xxxxx">
    <figcaption>
        <cite>
            qwerty
        </cite>
    </figcaption>
</figure>
   <p>
    Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod
</p>

我设法用img包装img,但我不知道如果它存在与否,我怎么能检查下一个节点(<div class="sourceimg"> xxx </div>

这就是我所做的:

<?php

ini_set('display_errors', 1);
ini_set('display_startup_errors', 1);
error_reporting(E_ALL);

$html = <<<EOF
<p>
    Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod
</p>
<p>
    <img src="xxxxx" />
</p>
<div class="sourceimg">azerrty</div>
<p>
    <img src="xxxxx">
</p>
<div class="sourceimg">azerrty</div>
<p>
    Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod
</p>
<p>
    <img src="xxxxx">
</p>
<div class="sourceimg">qwerty</div>
<p>
    Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod
</p>
EOF;

libxml_use_internal_errors(true);
$dom = new DOMDocument();
$dom->loadHTML($html);
$xpath = new DOMXPath($dom);

$matches = $xpath->query('//p//img');

if($matches->length > 0){
    foreach($matches as $node){

        $figure_node = $dom->createElement('figure');

        $node->parentNode->replaceChild($figure_node, $node);
        $figure_node->appendChild($node);


    }
}

$contenu = $dom->saveHTML();
echo $contenu;

?>

和输出:

<p>
    <figure><img src="xxxxx">
    </figure>
</p>
<div class="sourceimg">azerrty</div>
<p>
    <figure><img src="xxxxx">
    </figure>
</p>
<div class="sourceimg">azerrty</div>
<p>
    <figure><img src="xxxxx">
    </figure>
</p>
<div class="sourceimg">qwerty</div>

1 个答案:

答案 0 :(得分:1)

[更新代码]我将执行以下操作:

...
libxml_use_internal_errors(true);
$dom = new DOMDocument();
$dom->loadHTML($html);
$xpath = new DOMXPath($dom);

$matches = $xpath->query('//p//img');
$matchesDivs = $xpath->query('//div');

if($matches->length > 0 && $matchesDivs->length > 0){
    $divSeen = [];
    $step = -1;
    foreach($matches as $node){
         if($node->getElementsByTagName('img')->length == 0){
             continue;
         }             
         $step++;
         $figure_node = $dom->createElement('figure');
         $figure_node->appendChild($node->getElementsByTagName('img')[0]);
         $node->parentNode->replaceChild($figure_node, $node);

         if(!in_array($matchesDivs[$step]->nodeValue, $divSeen)){
            $figCaption_node = $dom->createElement('figcaption');
            $cite_node = $dom->createElement('cit',$matchesDivs[$step]->nodeValue);

            $figCaption_node->appendChild($cite_node);
            $figure_node->appendChild($figCaption_node);
            $divSeen[]=$matchesDivs[$step]->nodeValue;
         }
         $matchesDiv[$step]->parentNode->removeChild($matchesDiv[$step]);
    }
}

$contenu = $dom->saveHTML();
echo $contenu;

?>

在此处查看:eval.in

输出:

<p>
   Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod
</p>
<figure><img src="xxxxx"><figcaption><cit>azerrty</cit></figcaption></figure>

<figure><img src="xxxxx"></figure>

<p>
  Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod
</p>
<figure><img src="xxxxx"><figcaption><cit>qwerty</cit></figcaption></figure>