php - curl_multi“在网址中找到非法字符”

时间:2016-04-08 12:16:23

标签: php .htaccess curl

我有一个用于URL状态检查工具的php脚本,该工具将检查给定的网址并显示404错误的网址。

StatusCheckerRequest的输入带有“\ n”分隔的URL

public function PostStatusChecker(StatusCheckerRequest $request){
    $urls = $request->source;
    $seperateURLs = explode("\n", $urls);
    // -- create all the individual cURL handles and set their options
    $curl_handles = array();
    foreach ($seperateURLs as $url) {
        $curl_handles[$url] = curl_init();
        curl_setopt($curl_handles[$url], CURLOPT_URL, $url);
        curl_setopt($curl_handles[$url], CURLOPT_RETURNTRANSFER, 1);
        curl_setopt($curl_handles[$url], CURLOPT_CONNECTTIMEOUT, 20);
        curl_setopt($curl_handles[$url], CURLOPT_SSL_VERIFYPEER, false);
    }
    // -- start going through the cURL handles and running them
    $curl_multi_handle = curl_multi_init();
    $i = 0; // count where we are in the list so we can break up the runs into smaller blocks
    $block = array(); // to accumulate the curl_handles for each group we'll run simultaneously
    $results = array();
    $curlErrors = array();
    foreach ($curl_handles as $a_curl_handle) {
        $i++; // increment the position-counter

        // add the handle to the curl_multi_handle and to our tracking "block"
        curl_multi_add_handle($curl_multi_handle, $a_curl_handle);
        $block[] = $a_curl_handle;

        // -- check to see if we've got a "full block" to run or if we're at the end of out list of handles
        if (($i % BLOCK_SIZE == 0) or ($i == count($curl_handles))) {
            // -- run the block
            $running = NULL;
            do {
                // track the previous loop's number of handles still running so we can tell if it changes
                $running_before = $running;

                // run the block or check on the running block and get the number of sites still running in $running
                curl_multi_exec($curl_multi_handle, $running);
                print_r (curl_multi_info_read($curl_multi_handle));
            } while ($running > 0);


            // -- once the number still running is 0, curl_multi_ is done, so check the results
            foreach ($block as $handle) {
                // HTTP response code
                $code = curl_getinfo($handle,  CURLINFO_HTTP_CODE);
                $results['httpCode'][] = $code;

                // cURL error number
                 $curl_errno = curl_errno($handle);
                $results['curlErrorNo'][] = $curl_errno;

                // cURL error message
                $curl_error = curl_error($handle);
                $results['curlErrorMessage'][] = $curl_error;        

                // remove the (used) handle from the curl_multi_handle
                curl_multi_remove_handle($curl_multi_handle, $handle);
            }

            // reset the block to empty, since we've run its curl_handles
            $block = array();
        }
    }
    // close the curl_multi_handle once we're done
    curl_multi_close($curl_multi_handle);
    print_r($results);
    die();
}

我使用了Stack Overflow中的curl_multi_exec示例,当我使用这些URL检查结果时:

Array
(
    [0] => stackoverfloww.com
    [1] => www.laravel2.com
    [2] => http://stackoverflow.com
    [3] => http://laravel.com
)

输出

[httpCode] => Array
(
    [0] => 0
    [1] => 0
    [2] => 0
    [3] => 301
)

[curlErrorMessage] => Array
(
    [0] => Illegal characters found in URL
    [1] => Illegal characters found in URL
    [2] => Illegal characters found in URL
    [3] => 
)

我尝试了不同的输入,结果总是最后一个URL返回200或301,其他都是0.我还检查curl_multi_info_read的结果,结果全部为3“找到非法字符”网址,最后一个的值是0.

你能帮忙解决这个问题吗? 非常感谢你。

1 个答案:

答案 0 :(得分:2)

快速搜索cURL源代码会发现此错误来自于提供给CURLOPT_URL的网址包含字符\r和/或\n

来自lib/url.c

  /* We might pass the entire URL into the request so we need to make sure
   * there are no bad characters in there.*/
  if(strpbrk(data->change.url, "\r\n")) {
    failf(data, "Illegal characters found in URL");
    return CURLE_URL_MALFORMAT;
  }

您应该通过$url = trim($url);运行网址,因为网址末尾可能还有剩余的\r\n