Question

我的网站包含许多不同的产品页面，每个页面在所有页面上都以相同的格式显示一定数量的图像。我希望能够屏蔽每个页面的网址，这样我就可以从每个页面中检索每个图像的网址。我们的想法是为每个由热链接图像组成的页面创建一个库。

我知道这可以在php中完成，但我不知道如何废弃多个链接的页面。有什么想法吗？

Answer 1

我建议使用DOM解析器，例如PHP自己的DOMDocument。例如：

$page = file_get_contents('http://example.com/images.php');
$doc = new DOMDocument(); 
$doc->loadHTML($page);
$images = $doc->getElementsByTagName('img'); 
foreach($images as $image) {
    echo $image->getAttribute('src') . '<br />';
}

Answer 2

您可以使用正则表达式（正则表达式）浏览页面源并解析所有IMG标记。

这个正则表达式可以很好地完成这项工作：<img[^>]+src="(.*?)"

这是如何工作的？

// <img[^>]+src="(.*?)"
// 
// Match the characters "<img" literally «<img»
// Match any character that is not a ">" «[^>]+»
//    Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»
// Match the characters "src="" literally «src="»
// Match the regular expression below and capture its match into backreference number 1 «(.*?)»
//    Match any single character that is not a line break character «.*?»
//       Between zero and unlimited times, as few times as possible, expanding as needed (lazy) «*?»
// Match the character """ literally «"»

示例PHP代码：

preg_match_all('/<img[^>]+src="(.*?)"/i', $subject, $result, PREG_PATTERN_ORDER);
for ($i = 0; $i < count($result[0]); $i++) {
    // image URL is in $result[0][$i];
}

您需要做更多工作来解决相对网址等问题。

Answer 3

我非常喜欢PHP Simple HTML DOM Parser这样的事情。抓取图像的一个例子就在首页上：

// Create DOM from URL or file
$html = file_get_html('http://www.google.com/');

// Find all images
foreach($html->find('img') as $element)
       echo $element->src . '<br>';

Answer 4

你可以这样废弃页面。

http://simplehtmldom.sourceforge.net/

但它需要PHP 5 +。

PHP中图像链接的屏幕抓取

4 个答案: