PHP卷曲不会返回正确的结果

时间:2016-02-16 16:58:45

标签: php curl web-crawler bots

我目前正在做像搜索机器人这样的网络爬虫。我发现curl有一些多url-get-content。这是我的代码:

protected function multiRequest($data, $options = array()) {

    // array of curl handles
    $curly = array();
    // data to be returned
    $result = array();

    // multi handle
    $mh = curl_multi_init();

    // loop through $data and create curl handles
    // then add them to the multi-handle
    foreach ($data as $id => $d) {

        $curly[$id] = curl_init();

        $url = (is_array($d) && !empty($d['url'])) ? $d['url'] : $d;
        curl_setopt($curly[$id], CURLOPT_URL, $url);
        curl_setopt($curly[$id], CURLOPT_HEADER, 0);
        curl_setopt($curly[$id], CURLOPT_RETURNTRANSFER, 1);

        curl_multi_add_handle($mh, $curly[$id]);
    }

    // execute the handles
    $running = null;
    do {
        $mrc = curl_multi_exec($mh, $running);
    } while ($running > 0);
    //} while ($mrc == CURLM_CALL_MULTI_PERFORM);

    // get content and remove handles
    foreach ($curly as $id => $c) {
        $result[$id] = curl_multi_getcontent($c);
        curl_multi_remove_handle($mh, $c);
    }

    // all done
    curl_multi_close($mh);

    return $result;
}

使用一个链接传递单个数组时有效。然后它返回一个包含onr big String(html内容)的数组。 但是,当第二次使用一个更大的数组(~30个链接)调用它时,它会返回一个大小相同且充满空字符串的数组,就好像服务器只是不想回答所有这些请求一样。我的代码有问题吗?

感谢您的帮助

Erik Brendel

1 个答案:

答案 0 :(得分:1)

好的,我找到了。只需要一点时间:Why does cURL return an empty string?

似乎这5行实际上做了魔术

curl_setopt($ch,CURLOPT_USERAGENT,'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.17 (KHTML, like Gecko) Chrome/24.0.1312.52 Safari/537.17');
curl_setopt($ch, CURLOPT_AUTOREFERER, true); 
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch, CURLOPT_VERBOSE, 1);