从图像链接中删除其他所有内容但保留src

时间:2016-06-24 22:37:03

标签: php regex preg-replace preg-match-all

我试图从图像中删除一些atrtibutes但它只删除了一个属性的名称并保留其余部分..

我有一张如下图所示的图片:

<img class="aligncenter size-full wp-image-sd174" src="http://www.blahblah.com/wp-content/uploads/2016/06/07d333r.jpg" alt="alt title" srcset="http://www.blahblah.com/wp-content/uploads/2016/06/07d333r.jpg 700w, http://www.blahblah.com/wp-content/uploads/2016/06/07d333r.jpg 241w, http://www.blahblah.com/wp-content/uploads/2016/06/07d333r.jpg 624w" sizes="(max-width: 700px) 100vw, 700px" height="870" width="700">

我想删除除<img src="image path">

以外的所有内容

我尝试了下面的代码但它只删除了属性的名称..例如srcset。

$html = "<img class="aligncenter size-full wp-image-sd174" src="http://www.blahblah.com/wp-content/uploads/2016/06/07d333r.jpg" alt="alt title" srcset="http://www.blahblah.com/wp-content/uploads/2016/06/07d333r.jpg 700w, http://www.blahblah.com/wp-content/uploads/2016/06/07d333r.jpg 241w, http://www.blahblah.com/wp-content/uploads/2016/06/07d333r.jpg 624w" sizes="(max-width: 700px) 100vw, 700px" height="870" width="700">";

$one = preg_replace('#(<img.+?)srcset=(["\']?)\d*\2(.*?/?>)#i', '$1$3', $html);
$two= preg_replace('#(<img.+?)sizes=(["\']?)\d*\2(.*?/?>)#i', '$1$3', $one);

3 个答案:

答案 0 :(得分:3)

试试这个:

$html = preg_replace("/(<img\\s)[^>]*(src=\\S+)[^>]*(\\/?>)/i", "$1$2$3", $html);

它不会替换不必要的属性,它会通过打开和关闭图像标记来提取src属性。

它适用于html中的任意数量的<img>标记。

答案 1 :(得分:1)

您可以使用DOM extension正确操作HTML结构。

对于非常简单的情况使用正则表达式可能没什么问题,但it won't be a complete solution无论它看起来多么复杂。

剥离<img>以外的所有src属性:

$html = '<img class="aligncenter size-full wp-image-sd174" src="http://www.blahblah.com/wp-content/uploads/2016/06/07d333r.jpg" alt="alt title" srcset="http://www.blahblah.com/wp-content/uploads/2016/06/07d333r.jpg 700w, http://www.blahblah.com/wp-content/uploads/2016/06/07d333r.jpg 241w, http://www.blahblah.com/wp-content/uploads/2016/06/07d333r.jpg 624w" sizes="(max-width: 700px) 100vw, 700px" height="870" width="700">';

echo stripImageAttributes($html);

输出:

<img src="http://www.blahblah.com/wp-content/uploads/2016/06/07d333r.jpg">

stripImageAttributes()的定义:

(它旨在处理HTML片段,而不是完整的文档。)

/** 
 * @param string $html
 * @return string 
 */ 
function stripImageAttributes($html)
{
    // init document
    $doc = new DOMDocument();
    $doc->loadHTML('<!doctype html><html><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"></head><body>' . $html . '</body></html>');

    // init xpath
    $xpath = new DOMXPath($doc);

    // process images
    $body = $xpath->query('/html/body')->item(0);

    foreach ($xpath->query('//img', $body) as $image) {
        $toRemove = null;

        foreach ($image->attributes as $attr) {
            if ('src' !== $attr->name) {
                $toRemove[] = $attr;
            }
        }

        if ($toRemove) {
            foreach ($toRemove as $attr) {
                $image->removeAttribute($attr->name);
            }
        }
    }

    // convert the document back to a HTML string
    $html = '';
    foreach ($body->childNodes as $node) {
        $html .= $doc->saveHTML($node);
    }

    return $html;
}

答案 2 :(得分:0)

我建议您采用以下方法。

考虑到每个属性必须用空格分隔,您可以使用简单的explode()函数拆分所有属性,然后迭代以获得所需的属性并创建干净的图像标记。

function cleanImage($html) {
    $output = '';
    $image_components = explode(' ',$html);
    foreach($image_components as $component) {
        if(substr($component,0,4) == 'src=') {
            $output = '<img '.$component.">";
            break;
        }
    }
    return $output;
}


$html = '<img class="aligncenter size-full wp-image-sd174" src="http://www.blahblah.com/wp-content/uploads/2016/06/07d333r.jpg" alt="alt title" srcset="http://www.blahblah.com/wp-content/uploads/2016/06/07d333r.jpg 700w, http://www.blahblah.com/wp-content/uploads/2016/06/07d333r.jpg 241w, http://www.blahblah.com/wp-content/uploads/2016/06/07d333r.jpg 624w" sizes="(max-width: 700px) 100vw, 700px" height="870" width="700">';

$image = cleanImage($html);