Php抓取网站内容并用逗号分隔符显示结果

时间:2016-02-15 03:51:04

标签: php screen-scraping

我有这个通过PHP从网站上抓取的脚本。我想要的只是用逗号分隔符显示结果,并且有很多页面都有分页。只是为了向他们展示。

我的代码是

    $ch = curl_init('http://www.qatarliving.com/v3/classifieds/search/category/mobile-devices');
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);

/* 
 * XXX: This is not a "fix" for your problem, this is a work-around.  You 
 * should fix your local CAs 
 */
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);

/* Set a browser UA so that we aren't told to update */
curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/34.0.1847.116 Safari/537.36');

$res = curl_exec($ch);

if ($res === false) {
    die('error: ' . curl_error($ch));
}

curl_close($ch);

$d = new DOMDocument();
@$d->loadHTML($res);

$output = array(
    'class' => '',
);

$x = new DOMXPath($d);



$myspan = $x->query('//span[@class="b-card b-card-mod-h item  "]');
if($myspan->length > 0){
    foreach($myspan as $row){
        echo $row->nodeValue . "<br/>";
    }
}

,结果是

2,000 QAR Mobile phones, Al Gharrafa iPhone 6 128 like new By professional76
75 QAR Mobile phones, Other Virtual Reality Cardboard [NEW] By 1StopGulf

...

2 个答案:

答案 0 :(得分:1)

尝试下面的解决方案$data_array将包含所需的输出数组:

<?php
$ch = curl_init('http://www.qatarliving.com/v3/classifieds/search/category/mobile-devices');
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);

/*
 * XXX: This is not a "fix" for your problem, this is a work-around.  You
 * should fix your local CAs
 */
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);

/* Set a browser UA so that we aren't told to update */
curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/34.0.1847.116 Safari/537.36');

$res = curl_exec($ch);

if ($res === false) {
    die('error: ' . curl_error($ch));
}

curl_close($ch);

$d = new DOMDocument();
@$d->loadHTML($res);

$output = array(
    'class' => '',
);

$x = new DOMXPath($d);

$myspan = $x->query('//span[@class="b-card b-card-mod-h item  "]');
$data_array = array();
if ($myspan->length > 0) {
    foreach ($myspan as $row) {
        $data = $row->getElementsByTagName('p');
        $array = array();
        foreach ($data as $dt) {
            $tag = '';
            $class = $dt->getAttribute('class');
            $value = $dt->nodeValue;
            if ($class == 'b-card--el-deposit-val') {
                $tag = 'price';
            } else if ($class == 'b-card--el-deposit-time') {
                $tag = 'deposittime';
            } else if ($class == 'b-ad-excerpt b-par-mod-clear b-line-mod-thin--mix-item') {
                $tag = 'category';
            } else if ($class == 'b-card--el-description') {
                $tag = 'name';
            }
            if ($tag) {
                $array[$tag] = $value;
            }
        }

        $data = $row->getElementsByTagName('a');
        foreach ($data as $dt) {
            $tag = '';
            $class = $dt->getAttribute('class');
            $value = $dt->nodeValue;
            if (trim($class) == 'b-card--el-agency-title') {
                $tag = 'addedby';
            }
            if ($tag) {
                $array[$tag] = $value;
            }
        }
        $data_array[] = $array;
    }
    echo '<pre>';
    print_r($data_array);
}

<强>输出:

Array
(
    [0] => Array
        (
            [price] => 1,200
            [deposittime] => QAR
            [category] => Tablets, West Bay
            [name] => iPad Air 64gb Silver with Leather cover
            [addedby] => rocknrolla
        )

    [1] => Array
        (
            [price] => 2,500
            [deposittime] => QAR
            [category] => Mobile phones, Fereej Al Ameer / Muraikh
            [name] => iPhone 6 Plus 64gb
            [addedby] => nabbool
        )

    [2] => Array
        (
            [price] => 2,300
            [deposittime] => QAR
            [category] => Mobile phones, Al Sadd
            [name] => lady use 6 plus gold 64 gb
            [addedby] => nijumok
        )

    [3] => Array
        (
            [price] => 2,050
            [deposittime] => QAR
            [category] => Mobile phones, Old Airport
            [name] => LG v10 blue for sale
            [addedby] => ramsah92
        )

    [4] => Array
        (
            [price] => 1,750
            [deposittime] => QAR
            [category] => Mobile phones, Industrial Area
            [name] => Neat and cleaned iPhone 6 16gb
            [addedby] => ali murtza
        )

    [5] => Array
        (
            [price] => 1,350
            [deposittime] => QAR
            [category] => Mobile phones, Ain Khaled
            [name] => Brand new honour 7 sell or sawp...4g with 1 year warenty
            [addedby] => makbool_khan
        )

    [6] => Array
        (
            [price] => 250
            [deposittime] => QAR
            [category] => Mobile phones, Al Sadd
            [name] => NOTE 3 ACCESSORIES
            [addedby] => MRS70
        )

    [7] => Array
        (
            [price] => 0
            [deposittime] => QAR
            [category] => Tablets, West Bay
            [name] => Hi, I'm looking for a Sony Xperia Z4 Tablet
            [addedby] => carl_albrecht
        )

    [8] => Array
        (
            [price] => 50
            [deposittime] => QAR
            [category] => Mobile phones, Doha
            [name] => SN0009 -Luxury Ultra-thin Shockproof Armor Back Case Cover for Apple iPhone 6S 
            [addedby] => Qesale
        )

    [9] => Array
        (
            [price] => 75
            [deposittime] => QAR
            [category] => Mobile phones, Doha
            [name] => SN0003 - Dual Fast Adaptive USB Car Charger Adapter + Lightning Cable for iPhone Samsung
            [addedby] => Qesale
        )

    [10] => Array
        (
            [price] => 2,000
            [deposittime] => QAR
            [category] => Mobile phones, Al Gharrafa
            [name] => iPhone 6 128 like new
            [addedby] => professional76
        )

    [11] => Array
        (
            [price] => 75
            [deposittime] => QAR
            [category] => Mobile phones, Other
            [name] => Virtual Reality Cardboard [NEW]
            [addedby] => 1StopGulf
        )

)

答案 1 :(得分:0)

试试这段代码

$url='http://www.qatarliving.com/v3/classifieds/search/category/mobile-devices'; $file_contents = file_get_contents($url); $value=preg_match_all('/(.*?)/s',$file_contents,$title_data); $value=preg_match_all('/(.*?)/s',$file_contents,$price_data); $value=preg_match_all('/(.*?)/s',$file_contents,$label_data); for($i=0;$i < count($title_data[0]);$i++) { echo strip_tags($title_data[0][$i].', '.$price_data[0][$i].', '.$label_data[0][$i]).'
'; }

会产生以下输出

iPhone 6 128像new,2,000,QAR

虚拟现实纸板[新],75,QAR

适用于iPhone 6 / 6S / 6S Plus的多功能皮革钱包[新],140,QAR