Question

我正在尝试从这些页面获取图片网址：

http://www.amazon.co.uk/The-Classics-3xCD-Box-Set/dp/B000W3Q4X2/ref=sr_1_fkmr0_1/277-3029293-0823745?ie=UTF8&qid=1410727619&sr=8-1-fkmr0&keywords=Classic+Euphoria+3xCD+Box+Chicane+Hybrid+++P%26P

http://www.amazon.co.uk/Hinari-HIN172-Digital-Steam-Generator/dp/B00472M9S8/ref=sr_1_fkmr0_1/280-9070877-0582850?ie=UTF8&qid=1410725454&sr=8-1-fkmr0&keywords=Hinari+HIN172+2500+W+Digital+Steam+Generator+BOXED

可以在data-a-dynamic-image div中的img标记的imgTagWrapperId属性中找到该图片。

最终图片应返回为：

http://ecx.images-amazon.com/images/I/81Vi7ECR9hL.jpg

E.g。应从原始图片网址_SX522_

中删除http://ecx.images-amazon.com/images/I/81Vi7ECR9hL._SX522_.jpg

我只需要从源中返回一个图像。

Answer 1

$html=file_get_contents('http://www.amazon.co.uk/The-Classics-3xCD-Box-Set/dp/B000W3Q4X2/ref=sr_1_fkmr0_1/277-3029293-0823745?ie=UTF8&qid=1410727619&sr=8-1-fkmr0&keywords=Classic+Euphoria+3xCD+Box+Chicane+Hybrid+++P%26P');
$html = preg_replace('/\s{2,}/', ' ', $html); // replace all instances of more than one whitespace with a single space
preg_match('/\{\&quot\;(https?\:\/\/[\S]+)\&quot\;/', $html, $matches); // can be either http or https potentially?
print_r($matches);

数组（

[0] => {&quot;http://ecx.images-amazon.com/images/I/41pi9o3crTL.jpg&quot;
[1] => http://ecx.images-amazon.com/images/I/41pi9o3crTL.jpg

）

相同的正则表达式适用于Javascript：

document.getElementById('imgTagWrapperId').outerHTML.match(/\{\&quot\;(https?\:\/\/[\S]+)\&quot\;/);
["{&quot;http://ecx.images-amazon.com/images/I/41pi9o3crTL.jpg&quot;", "http://ecx.images-amazon.com/images/I/41pi9o3crTL.jpg"]

用于图像属性的JavaScript正则表达式

1 个答案: