php domDocument()saveHTML仅保存HTML从<img/>开始时的第一个图像

时间:2017-02-21 11:41:26

标签: php html domdocument

我有以下问题。当HTML从<img>代码开始并保存$dom->saveHTML()时,我只获得第一张图片作为回复。但是当我在<img>标记之前添加任何字符串时,我会获得HTML的额外<p></p>标记。那是为什么?

$h = '<img src="https://example.com/one.jpg" alt=""><br><p>bla</p><img src="https://example.com/foo.jpg" alt=""><br>';

$h = 'abc<img src="https://example.com/one.jpg" alt=""><br><p>bla</p><img src="https://example.com/foo.jpg" alt=""><br>';

以上是示例输入

<?php

$h = '<img src="https://example.com/one.jpg" alt=""><br><p>bla</p><img src="https://example.com/foo.jpg" alt=""><br>';

    echo'start<br />';
    echo htmlspecialchars($h);
    echo'<br />end<br />';

    $dom = new domDocument();
    $dom->loadHTML($h, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);
    $dom->preserveWhiteSpace = false;
    $images = $dom->getElementsByTagName('img');
    foreach ($images as $image) {
        $img_class =  $image->getAttribute('class');

        if($img_class == '') {
            $image->setAttribute('class', 'img-responsive img-rounded');
            echo'add class <br />';
        }
    }

    $my_post_content = $dom->saveHTML();

    echo'start<br />';
    echo htmlspecialchars($my_post_content);
    echo'<br />end<br />';

1 个答案:

答案 0 :(得分:0)

您好朋友我对您的脚本进行了一些测试,似乎第二张图片由于LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD而消失,而不是传递给$dom->loadHTML($h, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);

可以有一个简单的解决方案来做到这一点&#34; hack&#34;并使用这样的东西:

$h = 'abc<img src="https://example.com/one.jpg" alt=""><br><p>bla</p><img src="https://example.com/foo.jpg" alt=""><br>';

然后只需从字符串中手动剪切必要的东西,但我给你一个更好的解决方案:

$h = '<img src="https://example.com/one.jpg" alt=""><br><p>bla</p><img src="https://example.com/foo.jpg" alt=""><br>';

echo'start<br />';
echo htmlspecialchars($h);
echo'<br />end<br />';

// blank document is used because we want to extract only the
// html inside <body> from $dom 
$blank = new DOMDocument;

// initialize the $dom object and nothing is changed in this code
$dom = new domDocument();
$dom->loadHTML($h); 
$dom->preserveWhiteSpace = false;
$images = $dom->getElementsByTagName('img');
foreach ($images as $image) {
    $img_class = $image->getAttribute('class');

    if ($img_class == '') {
        $image->setAttribute('class', 'img-responsive img-rounded');
        echo'add class <br />';
    }
}

// now get the body that will containg updated HTML 
// and insert all it's children in the blank document
$body = $dom->getElementsByTagName('body')->item(0);
foreach ($body->childNodes as $child) {
    $blank->appendChild($blank->importNode($child, true));
}

$my_post_content = $blank->saveHTML($blank);

echo'start<br />';
echo htmlspecialchars($my_post_content);
echo'<br />end<br />';
exit;

并输出:

start
<img src="https://example.com/one.jpg" alt=""><br><p>bla</p><img src="https://example.com/foo.jpg" alt=""><br>
end
add class
add class
start
<img src="https://example.com/one.jpg" alt="" class="img-responsive img-rounded"><br><p>bla</p><img src="https://example.com/foo.jpg" alt="" class="img-responsive img-rounded"><br>
end

如你所见,你有两张照片。

干杯!