我正在使用PHP从 SamsClub.com 抓取数据
$res = file_get_contents('http://www.samsclub.com/sams/bath-towel-apple-gr-100-cotton/prod10450797.ip');
我使用PHP explode创建函数来获取数据。
function getData($content,$start,$end){
$str = explode($start,$content);
$str = explode($end,$str[1]);
return $str[0];
}
成功获取所有必需数据,但只剩下一件事。这是产品的变化意味着你可以在快照中看到的其他颜色,有不同的颜色可供选择。
当我们选择其他颜色项目#&产品型号也发生了变化,如下面的快照
所示
我也想获取其他颜色的“ item#& model#”等信息。
等待你们的回应。
答案 0 :(得分:2)
要执行此操作,您需要使用库(PHP Simple HTML DOM Parser)。只需将simple_html_dom.php上传到你能够包含它的地方(在我的代码中,它就在同一个文件夹中)。
<?php
$url = 'http://www.samsclub.com/sams/bath-towel-apple-gr-100-cotton/prod10450797.ip';
include('simple_html_dom.php');
$html = file_get_html($url);
$colour = array(); $item = array(); $model = array();
$script = $html->find('div[id=variance] script', 0)->innertext;
$script = preg_replace('/\s+/', ' ', $script);
$scripts = explode (";", $script);
$script = $scripts[2];
$id = $scripts[4];
$type = $scripts[5];
$script = str_replace("skuJson.skuVariantJson = $.parseJSON('", "", $script);
$script = str_replace("')", "", $script);
$colours = json_decode($script);
preg_match("/'([a-z0-9]*)'/", $type, $types); $type = $types[1];
preg_match("/'([a-z0-9]*)'/", $id, $ids); $id = $ids[1];
$script = $html->find('script', -1)->innertext;
$scripts = explode (";", $script);
$time = $scripts[0];
preg_match('/"([0-9]*)"/', $time, $times); $time = $times[1];
foreach ($colours as $key => $value) {
$url = 'http://www.samsclub.com/sams/shop/product/ajax/ajaxSkuVariant.jsp?skuId='. $value .'&productId='. $id .'&productType='. $type .'&_='. $time;
$html = file_get_html($url);
preg_match('/"legacyItemNumber":"([0-9]*)"/', $html, $match); $item[] = $match[1];
preg_match('/"model":"([a-z-]*)"/i', $html, $match); $model[] = $match[1];
$colour[] = substr($key, 0, -1);
}
//Print results
echo "<pre>"; print_r($colour); echo "</pre>";
echo "<pre>"; print_r($item); echo "</pre>";
echo "<pre>"; print_r($model); echo "</pre>";
?>
您需要更改的唯一内容是开头的$ url变量。为什么所有这些代码,您可能会问......因为您要查找的数据不在同一页面上,并且每次点击颜色时都会通过ajax调用,所以基本上我们正在制作一个很多请求(每种颜色一个)。这是输出:
Array
(
[0] => White
[1] => Burgundy
[2] => Apple Green
[3] => Lilac
[4] => Chocolate
[5] => Sage
[6] => Grey
[7] => PckBlue
[8] => Linen
[9] => null
[10] => Plum
[11] => Clay
[12] => Light Blue
)
Array
(
[0] => 252368
[1] => 252505
[2] => 252414
[3] => 433076
[4] => 252389
[5] => 117268
[6] => 252438
[7] => 613317
[8] => 252382
[9] => 433083
[10] => 252541
[11] => 117175
[12] => 252400
)
Array
(
[0] => SAMW-B
[1] => SAMB-B
[2] => SAMA-B
[3] => SAMLC-B
[4] => SAMCH-B
[5] => SAMSS-B
[6] => SAMGR-B
[7] => SAMPB-B
[8] => SAMLI-B
[9] => SAMDR-B
[10] => SAMP-B
[11] => SAMTC-B
[12] => SAMLB-B
)
答案 1 :(得分:-1)
我建议使用.NET和浏览器类进行抓取。通过这种方式,您可以让机器人点击每种颜色,然后获取所需的值。