从samsclub.com获取产品详细信息

时间:2014-09-05 08:13:42

标签: javascript php ajax scrape

我正在使用PHP从 SamsClub.com 抓取数据

    $res = file_get_contents('http://www.samsclub.com/sams/bath-towel-apple-gr-100-cotton/prod10450797.ip');

我使用PHP explode创建函数来获取数据。

function getData($content,$start,$end){
    $str = explode($start,$content);
    $str = explode($end,$str[1]);
    return $str[0];
}

成功获取所有必需数据,但只剩下一件事。这是产品的变化意味着你可以在快照中看到的其他颜色,有不同的颜色可供选择。

enter image description here

当我们选择其他颜色项目#&产品型号也发生了变化,如下面的快照

所示

enter image description here

我也想获取其他颜色的“ item#& model#”等信息。

等待你们的回应。

2 个答案:

答案 0 :(得分:2)

要执行此操作,您需要使用库(PHP Simple HTML DOM Parser)。只需将simple_html_dom.php上传到你能够包含它的地方(在我的代码中,它就在同一个文件夹中)。

<?php

$url = 'http://www.samsclub.com/sams/bath-towel-apple-gr-100-cotton/prod10450797.ip';

include('simple_html_dom.php');

$html = file_get_html($url);
$colour = array(); $item = array(); $model = array();
$script = $html->find('div[id=variance] script', 0)->innertext;
$script = preg_replace('/\s+/', ' ', $script);
$scripts = explode (";", $script);

$script = $scripts[2];
$id = $scripts[4];
$type = $scripts[5];

$script = str_replace("skuJson.skuVariantJson = $.parseJSON('", "", $script);
$script = str_replace("')", "", $script);

$colours = json_decode($script);

preg_match("/'([a-z0-9]*)'/", $type, $types); $type = $types[1];
preg_match("/'([a-z0-9]*)'/", $id, $ids);     $id   = $ids[1];

$script = $html->find('script', -1)->innertext;
$scripts = explode (";", $script);

$time = $scripts[0];
preg_match('/"([0-9]*)"/', $time, $times);    $time   = $times[1];

foreach ($colours as $key => $value) {
    $url = 'http://www.samsclub.com/sams/shop/product/ajax/ajaxSkuVariant.jsp?skuId='. $value .'&productId='. $id .'&productType='. $type .'&_='. $time;
    $html = file_get_html($url);
    preg_match('/"legacyItemNumber":"([0-9]*)"/', $html, $match); $item[] = $match[1];
    preg_match('/"model":"([a-z-]*)"/i', $html, $match); $model[] = $match[1];
    $colour[] = substr($key, 0, -1);
}

//Print results
echo "<pre>"; print_r($colour); echo "</pre>";
echo "<pre>"; print_r($item);   echo "</pre>";
echo "<pre>"; print_r($model);  echo "</pre>";

?>

您需要更改的唯一内容是开头的$ url变量。为什么所有这些代码,您可能会问......因为您要查找的数据不在同一页面上,并且每次点击颜色时都会通过ajax调用,所以基本上我们正在制作一个很多请求(每种颜色一个)。这是输出:

Array
(
    [0] => White
    [1] => Burgundy
    [2] => Apple Green
    [3] => Lilac
    [4] => Chocolate
    [5] => Sage
    [6] => Grey
    [7] => PckBlue
    [8] => Linen
    [9] => null
    [10] => Plum
    [11] => Clay
    [12] => Light Blue
)

Array
(
    [0] => 252368
    [1] => 252505
    [2] => 252414
    [3] => 433076
    [4] => 252389
    [5] => 117268
    [6] => 252438
    [7] => 613317
    [8] => 252382
    [9] => 433083
    [10] => 252541
    [11] => 117175
    [12] => 252400
)

Array
(
    [0] => SAMW-B
    [1] => SAMB-B
    [2] => SAMA-B
    [3] => SAMLC-B
    [4] => SAMCH-B
    [5] => SAMSS-B
    [6] => SAMGR-B
    [7] => SAMPB-B
    [8] => SAMLI-B
    [9] => SAMDR-B
    [10] => SAMP-B
    [11] => SAMTC-B
    [12] => SAMLB-B
)

答案 1 :(得分:-1)

我建议使用.NET和浏览器类进行抓取。通过这种方式,您可以让机器人点击每种颜色,然后获取所需的值。