Question

我有一个数据字符串设置为$ content，此数据的示例如下

This is some sample data which is going to contain an image in the format <img src="http://www.randomdomain.com/randomfolder/randomimagename.jpg">.  It will also contain lots of other text and maybe another image or two.

我试图抓住<img src="http://www.randomdomain.com/randomfolder/randomimagename.jpg">并将其保存为另一个字符串，例如$ extracted_image

到目前为止，我有这个......

if( preg_match_all( '/<img[^>]+src\s*=\s*["\']?([^"\' ]+)[^>]*>/', $content, $extracted_image ) ) {
$new_content .= 'NEW CONTENT IS '.$extracted_image.'';

它返回的全部是......

NEW CONTENT IS Array

我意识到我的尝试可能完全错误，但有人可以告诉我哪里出错了吗？

Answer 1

您的第一个问题是http://php.net/manual/en/function.preg-match-all.php将数组放入$matches，因此您应该从数组中输出单个项目。尝试$extracted_image[0]开始。

Answer 2

如果您只想要一个结果，则需要使用其他功能：

preg_match()返回第一个也是唯一一个匹配。 preg_match_all()返回包含所有匹配项的数组。

Answer 3

使用正则表达式解析有效的html是不明智的。由于src属性之前可能有意外的属性，因为非img标签可以将正则表达式欺骗成假阳性匹配，并且由于属性值可以用单引号或双引号引起来，因此您应该使用dom解析器。它干净，可靠且易于阅读。

代码：（Demo）

$string = <<<HTML
This is some sample data which is going to contain an image
in the format <img src="http://www.randomdomain.com/randomfolder/randomimagename.jpg">.
It will also contain lots of other text and maybe another image or two
like this: <img alt='another image' src='http://www.example.com/randomfolder/randomimagename.jpg'>
HTML;

$srcs = [];
$dom=new DOMDocument;
$dom->loadHTML($string);
foreach ($dom->getElementsByTagName('img') as $img) {
    $srcs[] = $img->getAttribute('src');
}

var_export($srcs);

输出：

array (
  0 => 'http://www.randomdomain.com/randomfolder/randomimagename.jpg',
  1 => 'http://www.example.com/randomfolder/randomimagename.jpg',
)

使用preg_match_all从字符串中提取Image SRC

3 个答案: