多卷曲返回空数据

时间:2017-04-19 09:00:00

标签: php multithreading curl

我正在尝试使用Curl函数获取数据,但不幸的是,它对于大多数网站都返回空。我的代码如下

$responses = multi([
    'blocket' => ['url' => 'http://blocket.se','opts' => [ CURLOPT_RETURNTRANSFER => true]]
]);
print_r($responses);


function multi(array $requests, array $opts = []) { 
// create array for curl handles
$chs = [];
// merge general curl options args with defaults
$opts += [CURLOPT_CONNECTTIMEOUT => 3, CURLOPT_TIMEOUT => 3, CURLOPT_RETURNTRANSFER => 1];
// create array for responses
$responses = [];
// init curl multi handle
$mh = curl_multi_init();
// create running flag
$running = null;
// cycle through requests and set up
foreach ($requests as $key => $request) {

    // init individual curl handle
    $chs[$key] = curl_init();
    // set url
    curl_setopt($chs[$key], CURLOPT_URL, $request['url']);
    $scraper[$key] = $request['scraper'];
    // check for post data and handle if present
    if (isset($request['post_data'])) {
        curl_setopt($chs[$key], CURLOPT_POST, 1);
        curl_setopt($chs[$key], CURLOPT_POSTFIELDS, $request['post_array']);
    }
    // set opts 
    curl_setopt_array($chs[$key], (isset($request['opts']) ? $request['opts'] + $opts : $opts));
    curl_multi_add_handle($mh, $chs[$key]);
}
do {
    // execute curl requests
    curl_multi_exec($mh, $running);
    // block to avoid needless cycling until change in status
    curl_multi_select($mh);
// check flag to see if we're done
} while($running > 0);
// cycle through requests
foreach ($chs as $key => $ch) {
    // handle error
    if (curl_errno($ch)) {
        $responses[$key] = ['data' => null, 'info' => null, 'error' => curl_error($ch), 'scraper' => $scraper[$key]];
    } else {
        // save successful response
        $responses[$key] = ['data' => curl_multi_getcontent($ch), 'info' => curl_getinfo($ch), 'error' => null, 'scraper' => $scraper[$key]];
    }
    // close individual handle
    curl_multi_remove_handle($mh, $ch);
}
// close multi handle
curl_multi_close($mh);
// return respones
return $responses;
}

结果

Array ( [blocket] => Array ( [data] => [info] => Array ( [url] => http://blocket.se/ [content_type] => [http_code] => 302 [header_size] => 119 [request_size] => 49 [filetime] => -1 [ssl_verify_result] => 0 [redirect_count] => 0 [total_time] => 0.328 [namelookup_time] => 0 [connect_time] => 0.172 [pretransfer_time] => 0.172 [size_upload] => 0 [size_download] => 0 [speed_download] => 0 [speed_upload] => 0 [download_content_length] => 0 [upload_content_length] => -1 [starttransfer_time] => 0.328 [redirect_time] => 0 [redirect_url] => https://www.blocket.se [primary_ip] => 185.49.132.3 [certinfo] => Array ( ) [primary_port] => 80 [local_ip] => 192.168.0.135 [local_port] => 58357 ) [error] =>) )

正如你在Resutl中看到的那样[数据]是空的。

更新

与@Sahil讨论后发现上述代码适用于没有SSL的网站。但那些做的,这段代码失败了。所以我尝试使用SSL_VERIFYPEER和SSL_VERIFYHOST以及CURLOPT_FOLLOWLOCATION但到目前为止这些都没有帮助

2 个答案:

答案 0 :(得分:1)

@Sahil指出代码很好。基本上问题是CURL不适用于HTTPS网站。这是因为没有在php.ini中定义CA根证书。

如果您遇到类似问题,请访问http://curl.haxx.se/docs/caextract.html并下载证书。将其保存在您想要的位置,并在php.ini中定义此文件的绝对路径 例如

curl.cainfo = c:\wamp\cacert.pem

正如在使用CURLOPT_SSL_VERIFYPEER = false的各种网站上提到的那样,您的网站容易受到攻击

答案 1 :(得分:0)

一切正常,您的代码唯一的问题是您的网址,您当前的网址是通过302响应重定向。试试这个。

更改您的网址

http://www.blocket.se/

此:

https://www.blocket.se/

enter image description here PHP代码:

<?php

ini_set('display_errors', 1);
$responses = multi([
    'blocket' => ['url' => 'https://www.blocket.se/', 'opts' => [ CURLOPT_RETURNTRANSFER => true]]
        ]);
print_r($responses);

function multi(array $requests, array $opts = [])
{

    $chs = [];

    $opts += [CURLOPT_CONNECTTIMEOUT => 3, CURLOPT_TIMEOUT => 3, CURLOPT_RETURNTRANSFER => 1];

    $responses = [];

    $mh = curl_multi_init();

    $running = null;

    foreach ($requests as $key => $request)
    {
        $chs[$key] = curl_init();
        curl_setopt($chs[$key], CURLOPT_URL, $request['url']);
        $scraper[$key] = $request['scraper'];
        if (isset($request['post_data']))
        {
            curl_setopt($chs[$key], CURLOPT_POST, 1);
            curl_setopt($chs[$key], CURLOPT_POSTFIELDS, $request['post_array']);
        }
        curl_setopt_array($chs[$key], (isset($request['opts']) ? $request['opts'] + $opts : $opts));
        curl_multi_add_handle($mh, $chs[$key]);
    }
    do
    {
        curl_multi_exec($mh, $running);
        curl_multi_select($mh);
    } while ($running > 0);
    foreach ($chs as $key => $ch)
    {
        if (curl_errno($ch))
        {
            $responses[$key] = ['data' => null, 'info' => null, 'error' => curl_error($ch), 'scraper' => $scraper[$key]];
        } else
        {
            $responses[$key] = ['data' => curl_multi_getcontent($ch), 'info' => curl_getinfo($ch), 'error' => null, 'scraper' => $scraper[$key]];
        }
        curl_multi_remove_handle($mh, $ch);
    }
    curl_multi_close($mh);
    return $responses;
}