PHP cURL无法加载响应数据

时间:2016-07-04 05:01:28

标签: php curl https

我正在尝试使用php进行数据抓取,但我需要访问的网址需要发布数据。

<?php 

//set POST variables
$url = 'https://www.ncaa.org/';
//$url = 'https://web3.ncaa.org/hsportal/exec/hsAction?hsActionSubmit=searchHighSchool';

// This is the data to POST to the form. The KEY of the array is the name of the field. The value is the value posted.
$data_to_post = array();
$data_to_post['hsCode'] = '332680';
$data_to_post['state'] = '';
$data_to_post['city'] = '';
$data_to_post['name'] = '';
$data_to_post['hsActionSubmit'] = 'Search';

// Initialize cURL
$curl = curl_init();

// Set the options
curl_setopt($curl,CURLOPT_URL, $url);

// This sets the number of fields to post
curl_setopt($curl,CURLOPT_POST, sizeof($data_to_post));

// This is the fields to post in the form of an array.
curl_setopt($curl,CURLOPT_POSTFIELDS, $data_to_post);

//execute the post
$result = curl_exec($curl);

//close the connection
curl_close($curl);

?>

当我尝试访问托管实际信息的第二个$ url时,它返回无法加载响应数据,但它将允许我访问ncaa主页。即使我发送了正确的表单数据,我是否有理由无法加载响应数据?

2 个答案:

答案 0 :(得分:1)

该网站显然会检查已识别的用户代理。默认情况下,PHP curl不会发送User-Agent标头。添加

curl_setopt($curl, CURLOPT_USERAGENT, 'curl/7.21.4');

并且脚本返回响应。但是,在这种情况下,响应表明它需要比您拥有的浏览器更新的浏览器。因此,您应该从真实的浏览器中复制用户代理字符串,例如

curl_setopt($curl, CURLOPT_USERAGENT, '"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.103 Safari/537.36');

此外,它要求参数以application/x-www-form-urlencoded格式发送。当您使用数组作为CURLOPT_POSTFIELDS的参数时,它使用multipart/form-data。所以将该行更改为:

curl_setopt($curl,CURLOPT_POSTFIELDS, http_build_query($data_to_post));

将数组转换为URL编码的字符串。

在网址中,请忽略?hsActionSubmit=searchHighSchool,因为该参数是在POST字段中发送的。

最终的工作脚本如下所示:

<?php
//set POST variables
//$url = 'https://www.ncaa.org/';
$url = 'https://web3.ncaa.org/hsportal/exec/hsAction';

// This is the data to POST to the form. The KEY of the array is the name of the field. The value is the value posted.
$data_to_post = array();
$data_to_post['hsCode'] = '332680';
$data_to_post['state'] = '';
$data_to_post['city'] = '';
$data_to_post['name'] = '';
$data_to_post['hsActionSubmit'] = 'Search';

// Initialize cURL
$curl = curl_init();

// Set the options
curl_setopt($curl,CURLOPT_URL, $url);

// This sets the number of fields to post
curl_setopt($curl,CURLOPT_POST, sizeof($data_to_post));

// This is the fields to post in the form of an array.
curl_setopt($curl,CURLOPT_POSTFIELDS, http_build_query($data_to_post));
curl_setopt($curl, CURLOPT_USERAGENT, '"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.103 Safari/537.36');
//execute the post
$result = curl_exec($curl);

//close the connection
curl_close($curl);

答案 1 :(得分:0)

curl HTTPS连接需要关闭特定选项。 CURLOPT_SSL_VERIFYPEER

// Initialize cURL
$curl = curl_init();

// Set the options
curl_setopt($curl,CURLOPT_URL, $url);

// ** This option MUST BE FALSE **
**curl_setopt ($ch, CURLOPT_SSL_VERIFYPEER, FALSE);**

// This sets the number of fields to post
curl_setopt($curl,CURLOPT_POST, sizeof($data_to_post));

// This is the fields to post in the form of an array.
curl_setopt($curl,CURLOPT_POSTFIELDS, $data_to_post);

//execute the post
$result = curl_exec($curl);

//close the connection
curl_close($curl);