我有以下问题。当HTML从<img>
代码开始并保存$dom->saveHTML()
时,我只获得第一张图片作为回复。但是当我在<img>
标记之前添加任何字符串时,我会获得HTML的额外<p></p>
标记。那是为什么?
$h = '<img src="https://example.com/one.jpg" alt=""><br><p>bla</p><img src="https://example.com/foo.jpg" alt=""><br>';
$h = 'abc<img src="https://example.com/one.jpg" alt=""><br><p>bla</p><img src="https://example.com/foo.jpg" alt=""><br>';
以上是示例输入
<?php
$h = '<img src="https://example.com/one.jpg" alt=""><br><p>bla</p><img src="https://example.com/foo.jpg" alt=""><br>';
echo'start<br />';
echo htmlspecialchars($h);
echo'<br />end<br />';
$dom = new domDocument();
$dom->loadHTML($h, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);
$dom->preserveWhiteSpace = false;
$images = $dom->getElementsByTagName('img');
foreach ($images as $image) {
$img_class = $image->getAttribute('class');
if($img_class == '') {
$image->setAttribute('class', 'img-responsive img-rounded');
echo'add class <br />';
}
}
$my_post_content = $dom->saveHTML();
echo'start<br />';
echo htmlspecialchars($my_post_content);
echo'<br />end<br />';
答案 0 :(得分:0)
您好朋友我对您的脚本进行了一些测试,似乎第二张图片由于LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD
而消失,而不是传递给$dom->loadHTML($h, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);
可以有一个简单的解决方案来做到这一点&#34; hack&#34;并使用这样的东西:
$h = 'abc<img src="https://example.com/one.jpg" alt=""><br><p>bla</p><img src="https://example.com/foo.jpg" alt=""><br>';
然后只需从字符串中手动剪切必要的东西,但我给你一个更好的解决方案:
$h = '<img src="https://example.com/one.jpg" alt=""><br><p>bla</p><img src="https://example.com/foo.jpg" alt=""><br>';
echo'start<br />';
echo htmlspecialchars($h);
echo'<br />end<br />';
// blank document is used because we want to extract only the
// html inside <body> from $dom
$blank = new DOMDocument;
// initialize the $dom object and nothing is changed in this code
$dom = new domDocument();
$dom->loadHTML($h);
$dom->preserveWhiteSpace = false;
$images = $dom->getElementsByTagName('img');
foreach ($images as $image) {
$img_class = $image->getAttribute('class');
if ($img_class == '') {
$image->setAttribute('class', 'img-responsive img-rounded');
echo'add class <br />';
}
}
// now get the body that will containg updated HTML
// and insert all it's children in the blank document
$body = $dom->getElementsByTagName('body')->item(0);
foreach ($body->childNodes as $child) {
$blank->appendChild($blank->importNode($child, true));
}
$my_post_content = $blank->saveHTML($blank);
echo'start<br />';
echo htmlspecialchars($my_post_content);
echo'<br />end<br />';
exit;
并输出:
start
<img src="https://example.com/one.jpg" alt=""><br><p>bla</p><img src="https://example.com/foo.jpg" alt=""><br>
end
add class
add class
start
<img src="https://example.com/one.jpg" alt="" class="img-responsive img-rounded"><br><p>bla</p><img src="https://example.com/foo.jpg" alt="" class="img-responsive img-rounded"><br>
end
如你所见,你有两张照片。
干杯!