如何在php中的img标签上使用preg_replace类和src?

时间:2013-10-10 08:20:40

标签: php

我有一个例子:

<a href="http://test.html" class="watermark" target="_blank">
   <img width="399" height="4652" src="http://test.html/uploads/2013/10/10.jpg" class="aligncenter size-full wp-image-78360">
</a>

我使用preg_replace来更改标记的值类和img标记的src

$content = preg_replace('#<a(.*?)href="([^"]*/)?(([^"/]*)\.[^"]*)"([^>]*?)><img(.*?)src="([^"]*/)?(([^"/]*)\.[^"]*)"([^>]*?)></a>#', '<a href=$2$3 class="fancybox"><img$1src="http://test.html/uploads/2013/10/10_new.jpg"></a>', $content); 

如何结果?

<a href="http://test.html" class="fancybox" target="_blank">
    <img width="399" height="4652" src="http://test.html/uploads/2013/10/10_new.jpg" class="aligncenter size-full wp-image-78360">
</a>

2 个答案:

答案 0 :(得分:1)

正如在SO上每天多次提到的那样,正则表达式并不是HTML操作的最佳工具 - 幸运的是我们有DOMDocument对象!

如果您只提供了该字符串,则可以进行如下更改:

$orig = '   <a href="http://test.html" class="watermark" target="_blank">
                <img width="399" height="4652" src="http://test.html/uploads/2013/10/10.jpg" class="aligncenter size-full wp-image-78360">
        </a>';
$doc = new DOMDocument();
$doc->loadHTML($orig);
$anchor = $doc->getElementsByTagName('a')->item(0);
if($anchor->getAttribute('class') == 'watermark')
{
    $anchor->setAttribute('class','fancybox');
    $img = $anchor->getElementsByTagName('img')->item(0);
    $currSrc = $img->getAttribute('src');
    $img->setAttribute('src',preg_replace('/(\.[^\.]+)$/','_new$1',$currSrc));
}
$newStr = $doc->saveHTML($anchor);

否则,如果您使用的是完整文档HTML源代码:

$orig = '<!DOCTYPE html>
<html>
<head>
    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
    <title></title>
</head>
<body>
    <a href="http://test.html" class="watermark" target="_blank">
        <img width="399" height="4652" src="http://test.html/uploads/2013/10/10.jpg" class="aligncenter size-full wp-image-78360">
    </a>
    <span>random</span>
    <a href="http://test.html" class="watermark" target="_blank">
        <img width="399" height="4652" src="http://test.html/uploads/2013/10/10.jpg" class="aligncenter size-full wp-image-78360">
    </a>
    <a href="#foobar" class="gary">
        <img src="/imgs/yay.png" />
    </a>
</body>
</html>';
$doc = new DOMDocument();
$doc->loadHTML($orig);
$anchors = $doc->getElementsByTagName('a');
foreach($anchors as $anchor)
{
    if($anchor->getAttribute('class') == 'watermark')
    {
        $anchor->setAttribute('class','fancybox');
        $img = $anchor->getElementsByTagName('img')->item(0);
        $currSrc = $img->getAttribute('src');
        $img->setAttribute('src',preg_replace('/(\.[^\.]+)$/','_new$1',$currSrc));
    }
}
$newStr = $doc->saveHTML();

虽然对于大脑锻炼,我提供了一个正则表达式解决方案,因为这是原始问题,有时 DOM文档可能是过多的代码量(但仍然更可取)

$newStr = preg_replace('#<a(.+?)class="watermark"(.+?)<img(.+?)src="(.+?)(\.[^.]+?)"(.*?>.*?</a>)#s','<a$1class="fancybox"$2<img$3src="$4_new$5"$6',$orig);

答案 1 :(得分:0)

Don't parse HTML with regex.

查找html中包含watermark课程的所有链接,将课程更改为fancybox,并更新第一个子图片src

$dom = new DOMDocument;
@$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
foreach ($xpath->query('//a[contains(@class, "watermark")]') as $a) {
    $a->setAttribute('class', 'fancybox');

    $img = $xpath->query('descendant::img', $a)->item(0);
    # old value = $img->getAttribute('src');
    $img->setAttribute('src', 'new_value');
}
echo $dom->saveHTML();