正则表达式或以其他方式获取正确格式的字符串

时间:2016-02-01 08:46:28

标签: php regex

请帮帮我..我有以下字符串

<p>this is text before first image</p>
<p><a href=""><img class="size-full wp-image-2178636" src="image1.jpg" alt="first" /></a> this is first caption</p>
<p>this is text before second image.</p>
<p><a href=""><img src="image2.jpg" alt="second" class="size-full wp-image-2178838" /></a> this is second caption</p>
<p>there may be many more images</p>

我需要上面的字符串格式如下:

<p>this is text before first image</p>
<a href="">
<figure>
    <img class="size-full wp-image-2178636" src="image1.jpg" alt="first" />
    <figcaption class="newcaption">
        <h1>this is first caption</h1>
    </figcaption>
</figure>
</a>
<p>this is text before second image.</p>
<a href="">
<figure>
    <img class="size-full wp-image-2178636" src="image2.jpg" alt="first" />
    <figcaption class="newcaption">
        <h1>this is second caption</h1>
    </figcaption>
</figure>
</a>
<p>there may be many more images</p>

请帮助我..我们如何通过正则表达式或使用其他方式来做到这一点。我是用PHP做的。

此致 萨钦。

1 个答案:

答案 0 :(得分:0)

虽然SO不应该是代码编写服务,但这里有一个快速的问题。使用DOMDocument方法的脏解决方案:

$html = '...'; // your input data
$input = new DOMDocument();
$input->loadHTML($html);
$ps = $input->getElementsByTagName('p');

$output = new DOMDocument();    
$counter = 0;

foreach ($ps as $p) {
    if ($counter%2 === 0) {
        // text before image
        $p_before_image = $output->createElement("p", $p->nodeValue);
        $output->appendChild($p_before_image);
    }
    elseif ($p->hasChildNodes()) {
        // image output routine
        $as_input = $p->getElementsByTagName("a");
        $a_output = $output->importNode($as_input->item(0));
        $figure = $output->createElement("figure");

        $imgs_input = $p->getElementsByTagName("img");
        $img_output = $output->importNode($imgs_input->item(0)); 
        $figure->appendChild($img_output);

        $figcaption = $output->createElement("figcaption");
        $figcaption->setAttribute("class", "newcaption");
        $h1 = $output->createElement("h1", $p->nodeValue);
        $figcaption->appendChild($h1);
        $figure->appendChild($figcaption);

        $a_output->appendChild($figure);
        $output->appendChild($a_output);
     }
     else {
        // Document malformed
     }
     $counter++;
}

print $output->saveHTML();

请注意,saveHTML()将输出普通的旧HTML。因此,imgs不会变成自闭标签。如果这对您很重要,您可能需要查看saveXML()