我正在尝试从这些页面获取图片网址:
http://www.amazon.co.uk/The-Classics-3xCD-Box-Set/dp/B000W3Q4X2/ref=sr_1_fkmr0_1/277-3029293-0823745?ie=UTF8&qid=1410727619&sr=8-1-fkmr0&keywords=Classic+Euphoria+3xCD+Box+Chicane+Hybrid+++P%26P
http://www.amazon.co.uk/Hinari-HIN172-Digital-Steam-Generator/dp/B00472M9S8/ref=sr_1_fkmr0_1/280-9070877-0582850?ie=UTF8&qid=1410725454&sr=8-1-fkmr0&keywords=Hinari+HIN172+2500+W+Digital+Steam+Generator+BOXED
可以在data-a-dynamic-image
div中的img
标记的imgTagWrapperId
属性中找到该图片。
最终图片应返回为:
http://ecx.images-amazon.com/images/I/81Vi7ECR9hL.jpg
E.g。应从原始图片网址_SX522_
http://ecx.images-amazon.com/images/I/81Vi7ECR9hL._SX522_.jpg
我只需要从源中返回一个图像。
答案 0 :(得分:0)
$html=file_get_contents('http://www.amazon.co.uk/The-Classics-3xCD-Box-Set/dp/B000W3Q4X2/ref=sr_1_fkmr0_1/277-3029293-0823745?ie=UTF8&qid=1410727619&sr=8-1-fkmr0&keywords=Classic+Euphoria+3xCD+Box+Chicane+Hybrid+++P%26P');
$html = preg_replace('/\s{2,}/', ' ', $html); // replace all instances of more than one whitespace with a single space
preg_match('/\{\"\;(https?\:\/\/[\S]+)\"\;/', $html, $matches); // can be either http or https potentially?
print_r($matches);
数组(
[0] => {"http://ecx.images-amazon.com/images/I/41pi9o3crTL.jpg" [1] => http://ecx.images-amazon.com/images/I/41pi9o3crTL.jpg
)
相同的正则表达式适用于Javascript:
document.getElementById('imgTagWrapperId').outerHTML.match(/\{\"\;(https?\:\/\/[\S]+)\"\;/);
["{"http://ecx.images-amazon.com/images/I/41pi9o3crTL.jpg"", "http://ecx.images-amazon.com/images/I/41pi9o3crTL.jpg"]