我准备用几个标签抓取一个网站。在每个选项卡上单击一个AJAX-Request,发送到他们的服务器,返回将显示的选项卡的数据。
由于我需要获取这些数据,因此我检查了HTTP-Requests并使用“hurl.it”(网站)操作了Header来检查响应。 我收到了正确的结果但是当我使用相同的Header设置我的Curl Session时,响应不一样/可读。
使用Live HTTP Headers Add On,我能够提取AJAX - URL
头
内容类型:application / xml
X-Requested-With:XMLHttpRequest
推荐人:http://xxxx.xxx.xx/Organisation/Details/41283
通过hurl.it回复
200 OK 646字节547 ms
头
缓存控制:私人
内容编码:gzip
内容长度:382
Content-Type:application / json; charset = utf-8
日期:2016年1月29日星期五01:36:42 GMT
服务器:Microsoft-IIS / 7.5
设置Cookie:.ASPXANONYMOUS = fsbx3gX1CykkKL2OIvPFH9GcPj97KEPkK-6WVTA24eI87k0F3gjpt0fyVA2P90S8heeaoqjUps9-UFtzgm8mRAiPqnbS50kytk_NY5K4yHPwa-5l0kCqNzPAo0yjBsPmbisbg3N7P7h6Oz5EdRaN8Fkr0y3G6wdIILI8yMQBj1S1X4GULf9rpQ8IvvSo13KB0; expires = Fri,29-Jan-2016 03:36:42 GMT;路径= /; HttpOnly
X-Aspnet-Version:4.0.30319
X-Aspnetmvc-Version:3.0
X-Powered-By:ASP.NET
身体
{ “数据”:[ { “ID”: “9fe29051-31e6-4bfa-a2f1-194d70c0aab9”, “NrtId”: “930ec525-2199-44a9-bc27-c1b28524c9bf”, “RtoId”: “0e69a479-63e3-4d64-9340-f2e9cc8d84df”,” TrainingComponentType“:2,”代码“:”TLI41210“,”标题“:”运输和物流证书IV(道路运输 - 汽车驾驶指令)“,”IsImplicit“:false,”ExtentId“:”01“,”Extent “:”交付并评估“,”StartDate“:新日期(2011,11,7,0,0,0),”EndDate“:新日期(2016,11,6,0,0,0),”DeliveryNsw “:真正的” DeliveryVic “:真实的,” DeliveryQld “:真实的,” DeliverySa “:真实的,” DeliveryWa “:真实的,” DeliveryTas “:真实的,” DeliveryNt “:真实的,” DeliveryAct “:真实的,” ScopeDecisionType“: 0,“ScopeDecision”:“交付和评估”}],“总计”:1}
**来自CURL的回复 - var_dump()**
string(382)“ m j 0 _E蔀蔀| + = B Kz(= q8 ICȻWζiq t { 年;rD@PtǙ.ZZaX;的Nz〜([ !Jor7FH1hE〜AJ#'䭮>MgVrǙȊK SA&放大器;݇LevuSl3;?ᱴd] 4PR] 1 @`X typ8 1 R= t(S 6 [ +- Vr9 # f 4 2# Ew їѯ ѯ r FGZ O \ 。䲰䲰7 f^ W [ ;Z “
这是一个字符集问题还是我设置我的卷曲选项错误?
CURL
$url = http://xxxx.xxx.xx/Organisation/AjaxDetailsLoadScope/e11d03e7-37e7-49e8-be54-0bed8eb1c247?_=1454029562507&tabIndex=3
$header = array(
'Accept: */*',
'Accept-Encoding: gzip, deflate',
'Content-Length: 0',
'Content-Type: application/xml',
'X-Requested-With: XMLHttpRequest',
"Referer: http://xxxx.xxx.xx/Organisation/Details/$this->code"
);
//..
//$header and $url are saved in arrays and then passed to curlMulti()
function curlMulti($urls, $headers = false) {
$mh = curl_multi_init();
// For each of the URLs in array
foreach ($urls as $id => $d) {
$ch[$id] = curl_init();
$url = (is_array($d) && !empty($d['url'])) ? $d['url'] : $d;
if (is_array($headers) && $headers[$id] != false) {
curl_setopt($ch[$id], CURLOPT_POST, 1);
curl_setopt($ch[$id], CURLOPT_HTTPHEADER, $headers[$id]);
}
curl_setopt($ch[$id], CURLOPT_URL, $url);
curl_setopt($ch[$id], CURLOPT_RETURNTRANSFER, TRUE);
curl_multi_add_handle($mh, $ch[$id]);
}
$running = NULL; // Set $running to NULL
do {
curl_multi_exec($mh, $running);
} while ($running > 0); // While $running is greater than zero
foreach ($ch as $id => $content) {
$results[$id] = curl_multi_getcontent($content);
curl_multi_remove_handle($mh, $content);
}
curl_multi_close($mh);
return $results;
}
答案 0 :(得分:0)
我正在和Headers一起玩,现在让它正常工作..
必须删除'接受:* / *', '接受编码:标题中的gzip,deflate'
$header = array(
'Content-Length: 0',
'Content-Type: application/xml',
'X-Requested-With: XMLHttpRequest',
"Referer: http://xxxx.xxx.xx/Organisation/Details/$this->code"
);
像魅力一样:
stdClass Object
(
[data] => Array
(
[0] => stdClass Object
(
[Id] => 9fe29051-31e6-4bfa-a2f1-194d70c0aab9
[NrtId] => 930ec525-2199-44a9-bc27-c1b28524c9bf
[RtoId] => 0e69a479-63e3-4d64-9340-f2e9cc8d84df
[TrainingComponentType] => 2
[Code] => TLI41210
[Title] => Certificate IV in Transport and Logistics (Road Transport - Car Driving Instruction)
[IsImplicit] =>
[ExtentId] => 01
[Extent] => Deliver and assess
[DeliveryNsw] => 1
[DeliveryVic] => 1
[DeliveryQld] => 1
[DeliverySa] => 1
[DeliveryWa] => 1
[DeliveryTas] => 1
[DeliveryNt] => 1
[DeliveryAct] => 1
[ScopeDecisionType] => 0
[ScopeDecision] => Deliver and assess
)
)
[total] => 1
)