用于图像属性的JavaScript正则表达式

时间:2014-09-14 21:12:20

标签: javascript regex

我正在尝试从这些页面获取图片网址:

http://www.amazon.co.uk/The-Classics-3xCD-Box-Set/dp/B000W3Q4X2/ref=sr_1_fkmr0_1/277-3029293-0823745?ie=UTF8&qid=1410727619&sr=8-1-fkmr0&keywords=Classic+Euphoria+3xCD+Box+Chicane+Hybrid+++P%26P

http://www.amazon.co.uk/Hinari-HIN172-Digital-Steam-Generator/dp/B00472M9S8/ref=sr_1_fkmr0_1/280-9070877-0582850?ie=UTF8&qid=1410725454&sr=8-1-fkmr0&keywords=Hinari+HIN172+2500+W+Digital+Steam+Generator+BOXED

可以在data-a-dynamic-image div中的img标记的imgTagWrapperId属性中找到该图片。

最终图片应返回为:

http://ecx.images-amazon.com/images/I/81Vi7ECR9hL.jpg

E.g。应从原始图片网址_SX522_

中删除http://ecx.images-amazon.com/images/I/81Vi7ECR9hL._SX522_.jpg

我只需要从源中返回一个图像。

1 个答案:

答案 0 :(得分:0)

$html=file_get_contents('http://www.amazon.co.uk/The-Classics-3xCD-Box-Set/dp/B000W3Q4X2/ref=sr_1_fkmr0_1/277-3029293-0823745?ie=UTF8&qid=1410727619&sr=8-1-fkmr0&keywords=Classic+Euphoria+3xCD+Box+Chicane+Hybrid+++P%26P');
$html = preg_replace('/\s{2,}/', ' ', $html); // replace all instances of more than one whitespace with a single space
preg_match('/\{\&quot\;(https?\:\/\/[\S]+)\&quot\;/', $html, $matches); // can be either http or https potentially?
print_r($matches);
  

数组(

[0] => {"http://ecx.images-amazon.com/images/I/41pi9o3crTL.jpg"
[1] => http://ecx.images-amazon.com/images/I/41pi9o3crTL.jpg
     

相同的正则表达式适用于Javascript:

document.getElementById('imgTagWrapperId').outerHTML.match(/\{\&quot\;(https?\:\/\/[\S]+)\&quot\;/);
["{"http://ecx.images-amazon.com/images/I/41pi9o3crTL.jpg"", "http://ecx.images-amazon.com/images/I/41pi9o3crTL.jpg"]