Question

我的网站上有以下代码。它用于在html块中找到没有http：//或/的前面的图像。如果是这种情况，它会将网站网址添加到图像源的前面。

例如：

<img src="http://domain.com/image.jpg"> will stay the same
<img src="/image.jpg"> will stay the same
<img src="image.jpg"> will be changed to <img src="http://domain.com/image.jpg">

我觉得我的代码效率很低......有关如何使用更少代码运行的任何想法？

preg_match_all('/<img[\s]+[^>]*src\s*=\s*[\"\']?([^\'\" >]+)[\'\" >]/i', $content_text, $matches);
if (isset($matches[1])) {
  foreach($matches[1] AS $link) {
    if (!preg_match("/^(https?|ftp)\:\/\//sie", $link) && !preg_match("/^\//sie", $link)) {
      $full_link = get_option('siteurl') . '/' . $link;
      $content_text = str_replace($link, $full_link, $content_text);
    }
  }
}

Answer 1

首先，你可以停止使用正则表达式来处理HTML，特别是当你使用HTML解析器（PHP至少有3个）轻松完成你所做的事情时。例如：

$dom = new DomDocoument;
$dom->loadHTML($html);
$images = $dom->getElementsByTagName('img');
foreach ($images as $image) {
  $src = $image->getAttribute('src');
  $url = parse_url($src);
  $image->setAttribute('src', http_build_url('http://www.example.com', $url);
}
$html = $dom->saveHTML();

问题解决了。好吧，差不多。您将主机名添加到相对URL而不是以/开头的情况的情况下有点令人费解并且未在此片段中处理，但这是一个相对较小的更改（它涉及检查$url['path']）。

请参阅Parse HTML With PHP And DOM，Document Object Model，parse_url()和http_build_url()。 PHP比正则表达式有更好的工具。

哦，好的衡量标准为Parsing Html The Cthulhu Way。

Answer 2

也许完全不同的方法也可以起作用：

<base href="http://domain.com/" />

Answer 3

尝试将HTML与正则表达式匹配非常困难。

即使您的代码似乎有效，但很有可能某些IMG标记会因为它们没有您所描述的格式而滑过。

Answer 4

这未经过测试，但我正在考虑这样的事情......

preg_match_all('/<img\b[^>]*\bsrc\s*=\s*[\'"]?([^\'">]*)/i', $content_text, $matches);

PHP＆amp;正则表达式：为图像添加网站URL

4 个答案: