我是屏幕抓取和卷曲的新手。我打算创建一个类似http://www.skyscanner.com.my/正在做的网站,允许用户从http://airasia.com网站获取来源,目的地和日期。然后,网站将航班时刻表和票价返回给用户。以下是我目前的代码:
代码:
<?php
$post_data['Origin']=$_POST['origin'];
$post_data['Destination']=$_POST['destination'];
$post_data['From']=$_POST['departDate'];
$post_data['To']=$_POST['returnDate'];
foreach ($post_data as $key => $value)
{
$post_items[] = $key . '=' . $value;
}
$post_string = implode ('&', $post_items);
$curl_connection = curl_init('https://booking.airasia.com/search.aspx');
curl_setopt($curl_connection, CURLOPT_CONNECTTIMEOUT, 30);
curl_setopt($curl_connection, CURLOPT_USERAGENT, "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)");
curl_setopt($curl_connection, CURLOPT_RETURNTRANSFER, true);
curl_setopt($curl_connection, CURLOPT_SSL_VERIFYPEER, False);
curl_setopt($curl_connection, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($curl_connection, CURLOPT_POSTFIELDS, $post_string);
$result = curl_exec($curl_connection);
print_r(curl_getinfo($curl_connection));
echo curl_errno($curl_connection) . '-' .
curl_error($curl_connection);
curl_close($curl_connection);
echo $result;
?>
上述内容并未归还亚航的任何结果。所以我需要一些指导来继续我的任务。谢谢
答案 0 :(得分:0)
您的查询字符串$post_string
是正确的,但在发送curl之前,您将错过在?
之前添加它。请尝试以下方法:
curl_setopt($curl_connection, CURLOPT_POSTFIELDS, "?".$post_string);
答案 1 :(得分:0)
此工作:
$request = array();
$request[] = "Host: mobile.airasia.com";
$request[] = "Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8";
$request[] = "User-Agent: MOT-V9mm/00.62 UP.Browser/6.2.3.4.c.1.123 (GUI) MMP/2.0";
$request[] = "Accept-Language: en-US,en;q=0.5";
$request[] = "Connection: keep-alive";
$request[] = "Cache-Control: no-cache";
$request[] = "Pragma: no-cache";
$post = 'hash=61582ddd1b6ab8782ad63f1a6c6c1e46&trip-type=round-trip&origin=PEK&destination=SGN&date-depart-d=25&date-depart-my=2015-04&date-return-d=30&date-return-my=2015-04&passenger-count=1&child-count=0&infant-count=0¤cy=MYR&depart-sellkey=&return-sellkey=&depart-details-index=&return-details-index=&depart-faretype=&return-faretype=&action=search&btnSearch=Search';
$url = 'https://mobile.airasia.com/en/search';
$ch = curl_init($url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_POST, true);
curl_setopt($ch, CURLOPT_POSTFIELDS, $post);
curl_setopt($ch, CURLOPT_HTTPHEADER, $request);
curl_setopt($ch, CURLOPT_ENCODING,"");
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 10);
curl_setopt($ch, CURLOPT_TIMEOUT,10);
curl_setopt($ch, CURLOPT_FAILONERROR,true);
curl_setopt($ch, CURLOPT_ENCODING,"");
curl_setopt($ch, CURLOPT_VERBOSE, true);
curl_setopt($ch, CURLOPT_HTTPHEADER, $request);
curl_setopt($ch, CURLINFO_HEADER_OUT, true);
curl_setopt($ch, CURLOPT_HEADER, true);
$data = curl_exec($ch);
if (curl_errno($ch)){
$data .= 'Retreive Base Page Error: ' . curl_error($ch);
}
else {
$info = rawurldecode(var_export(curl_getinfo($ch),true));
// Get the cookies:
$skip = intval(curl_getinfo($ch, CURLINFO_HEADER_SIZE));
$requestHeader= substr($data,0,$skip);
$data = substr($data,$skip);
echo $data
请求标题。
POST /en/search HTTP/1.1
Accept-Encoding: deflate, gzip
Host: mobile.airasia.com
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
User-Agent: MOT-V9mm/00.62 UP.Browser/6.2.3.4.c.1.123 (GUI) MMP/2.0
Accept-Language: en-US,en;q=0.5
Connection: keep-alive
Cache-Control: no-cache
Pragma: no-cache
Content-Length: 366
Content-Type: application/x-www-form-urlencoded
回应标题:
HTTP/1.1 200 OK
Cache-Control: no-store, no-cache, must-revalidate, post-check=0, pre-check=0
Cache-control: no-cache="set-cookie"
Content-Encoding: gzip
Content-Type: text/html; charset=utf-8
Date: Mon, 20 Apr 2015 04:02:14 GMT
Expires: Thu, 19 Nov 1981 08:52:00 GMT
P3P: CP="IDC DSP COR ADM DEVi TAIi PSA PSD IVAi IVDi CONi HIS OUR IND CNT"
Server: redishot
Set-Cookie: locale=en; expires=Mon, 20-Apr-2015 05:02:11 GMT; path=/; secure
Set-Cookie: currency=MYR; expires=Mon, 20-Apr-2015 05:02:11 GMT; path=/; secure
Set-Cookie: PHPSESSID=p8mjtiiga4615pnhuu6vl1htiqkqsn7v; path=/; HttpOnly
Set-Cookie: AWSELB=CDFDE3A70C862943856FF6079178A94249700C674BDFF1E117C02BF52443FE13448AB71BEA2EA3F41C01293A39C3579A0A03905034DA565F71B4820BD1807C5558B22ED5E0;PATH=/;MAX-AGE=1800
Vary: Accept-Encoding
transfer-encoding: chunked
Connection: keep-alive
这是您返回的HTML
所需的数据
Results","position":1}]}}' data-disabled><div class="smallFont farelist no-discount "><div class=flight-no>D7 317</div><div class=flight-time>02:15<br>12:10</div><div class=flight-info><div class=box><div class=total-price>MYR 2,091.42</div>
Results","position":2}]}}' data-disabled><div class="smallFont farelist no-discount "><div class=flight-no>D7 317</div><div class=flight-time>02:15<br>12:55</div><div class=flight-info><div class=box><div class=total-price>MYR 2,106.26</div>
Results","position":3}]}}' data-disabled><div class="smallFont farelist no-discount "><div class=flight-no>D7 317</div><div class=flight-time>02:15<br>15:50</div><div class=flight-info><div class=box><div class=total-price>MYR 2,483.82</div>
更新结束
不要添加“?”在帖子数据中。
这是php手册中的格式:
$post = 'key1=value1&key2=value2&key3=value3';
你不能卷曲http://booking.airasia.com/search.aspx
,因为它需要javaScript。
您必须使用移动网站。使用浏览器查看HTTP请求和响应标头时,在浏览器上禁用JavaScript时执行此操作。
使用:
https://mobile.airasia.com/en/search
问题是移动网站目前无法正常运行,并表示稍后再尝试。所以我再也找不到了。
关于帖子
这是搜索发布的内容:
Content-Type: application/x-www-form-urlencoded
Content-Length: 366
hash=26edce4024c5611451a2a95a74e2bf01
&trip-type=round-trip
&origin=KUL
&destination=OOL&date-depart-d=20
&date-depart-my=2015-04&date-return-d=25
&date-return-my=2015-04
&passenger-count=1
&child-count=0&infant-count=0
¤cy=MYR
&depart-sellkey=
&return-sellkey=
&depart-details-index=
&return-details-index=
&depart-faretype=
&return-faretype=
&action=search
&btnSearch=Search
因为他们的表单是application/x-www-form-urlencoded
,所以你几乎正在正确地执行$ post_string。您可以将数组用于发布数据,但如果value是数组,则Content-Type标头将设置为multipart/form-data
,这应该没问题。
因为它是application/x-www-form-urlencoded
,您必须urlencode $post_string
:
$post_string` = urlencode(implode ('&', $post_items));
要获取cookie,您不需要,也可能永远不需要:
curl_setopt($curl_connection, CURLOPT_SSL_VERIFYPEER, False);
将其删除。
您将获得重定向并可能需要cookie jar:
curl_setopt($ch,CURLOPT_COOKIEFILE, "/tmp/cookie.txt")
您可能需要设置请求标头以匹配浏览器请求:
创建一个数组以放置请求标头键值
使用您上传的Request标题中的内容填写Request数组。
示例:强>
$request = array();
$request[] = "Host: www.example.com";
$request[] = "Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8";
$request[] = "User-Agent: MOT-V9mm/00.62 UP.Browser/6.2.3.4.c.1.123 (GUI) MMP/2.0";
$request[] = "Accept-Language: en-US,en;q=0.5";
$request[] = "Connection: keep-alive";
$request[] = "Cache-Control: no-cache";
$request[] = "Pragma: no-cache";
添加到卷曲:
curl_setopt($ch, CURLOPT_HTTPHEADER, $request);
检查浏览器中的标题
然后我使用FireFox Inspector或Chrome开发工具。
我转到网络标签页
在FireFox中,我转到“设置”并启用“启用持久日志”
在Chrome中,我点击网络标签页上的“保留日志”
然后我使用浏览器去任何我想要卷曲的地方。
现在我可以看到每个请求和响应,包括重定向,并将它们与保存标题进行比较。
检查:一步一步
我使用FireFox与用户代理切换器使用旧的Motorola用户代理来检索标头和HTML。然后我在curl的HTTPHEADER
:
request[] = 'User-Agent: MOT-V9mm/00.62 UP.Browser/6.2.3.4.c.1.123 (GUI) MMP/2.0
当我尝试
时,上述情况可能不太可能导致错误