从内容中删除图像并将其放在第一个图像中

时间:2014-06-09 20:53:25

标签: php domdocument

我想将第一个<img>移动到XML Feed中<description>的第一个位置。我想改变这个:

$html =
'this is content and this is the 1st image <img src="first_image.jpg"> within the text and <img src="second_image.jpg"> is the second one.'

到此:

$html = '<img src="the_source.jpg">
    this is content and this is the 1st image within the text and <img src="second_image.jpg"> is the second one.'

在php中使用DOMDocument。我现在有这个:

$dom = new DOMDocument;
$dom->loadHTML(mb_convert_encoding($html, 'HTML-ENTITIES', 'UTF-8'));
$xpath = new DOMXpath($dom);
if($image = $xpath->query('.//img[1]')){
    $image->parentNode->removeChild($image);
}

1 个答案:

答案 0 :(得分:1)

DOMXPath::query总是返回一个DOMNodeList对象(即使列表中有一个节点),因此你需要取第一个项目(即DOMNode对象)来使用parentNode属性:

if($image = $xpath->query('//img[1]')){
    $node = $image->item(0);
    $parent = $node->parentNode;
    $parent->removeChild($node);
    $parent->insertBefore($node, $parent->firstChild);
}

<小时/> 关于您的特定RSS提要,似乎描述标记不是img节点的直接祖先。你可以试试这个:

$xml = file_get_contents('./rssfeed.xml');
$xml = html_entity_decode($xml, ENT_XML1, "UTF-8");
$xml = preg_replace('~<img\s[^>]*\K(?<!/)>~', '/>', $xml);

$dom = new DOMDocument;
$dom->loadXML($xml);

$descNode = $dom->getElementsByTagName('description')->item(1);
$imgNode = $descNode->getElementsByTagName('img')->item(0);

$imgNode->parentNode->removeChild($imgNode);
$descNode->insertBefore($imgNode, $descNode->firstChild);
echo htmlspecialchars($dom->saveXML());

如果你想保留htmlentities:

$dom = new DOMDocument;
$dom->load('./rssfeed.xml');
$descNode = $dom->getElementsByTagName('description')->item(1);
$contentText = $descNode->nodeValue;
$imgTag = '';
$contentText = preg_replace_callback('~<img\s[^>]*>~',
                      function($m) use (&$imgTag) { $imgTag = $m[0]; return; },
                      $contentText, 1);
$descNode->nodeValue = htmlentities($imgTag . $contentText);
echo htmlspecialchars($dom->saveXML());