我无法抓取产品图片。我正在使用ajax。我的ajax文件是test.html,这是我的代码: -
$( "#click_me" ).click(function () {
$.ajax({
url: "test.php",
asyn:false,
success: function(result){
console.log(result);
}});
});
Test.php文件代码: -
$url="http://www.kohls.com/catalog/bedroom-mattresses-accessories-furniture.jsp?CN=Room:Bedroom+Category:Mattresses%20%26%20Accessories+Department:Furniture&cc=bed_bath-TN3.0-S-mattresses";
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_HEADER, false);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_AUTOREFERER, true);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_USERAGENT,"Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:31.0) Gecko/20100101 Firefox/31.0 ");
$out = curl_exec($ch);
curl_close($ch);
$out = str_replace("\n", '', $out);
echo $out;
注意:请检查$url
。图像是动态填充的,我们无法抓取它们。请我快速指导,我也使用过pythonjs来刮它们但是没有用!!!
谢谢!!!
答案 0 :(得分:0)
您需要解析HTML中的图像。 DOMDocument是一个很好的选择。
示例代码(UNTESTED但理论上应该有效)
$url="http://www.kohls.com/catalog/bedroom-mattresses-accessories-furniture.jsp?CN=Room:Bedroom+Category:Mattresses%20%26%20Accessories+Department:Furniture&cc=bed_bath-TN3.0-S-mattresses";
$html=file_get_contents($url);
$domd=@DOMDocument::loadHTML($html);
foreach($domd->getElementsByTagName("img") as $img){
$src=$img->getAttribute("src");
if(empty($src)){continue;}
$src='http://www.kohls.com'.$src;
$filename=basename($src);
echo "downloading ".$filename.PHP_EOL;
file_put_contents($filename,file_get_contents($src));
}
如果你想要卷曲,只需用你的卷曲函数替换file_get_contents (这也是内存饥渴,因为整个图像将被下载到ram,无论它有多大。使用curl,你可以用CURLOPT_FILE优化它直接写入文件。如果你想要可以节省大量的RAM从NASA等下载图像)