PHP - 如何在另一个网站中自动发布表单并解析结果

时间:2015-04-16 07:37:53

标签: php curl

我是屏幕抓取和卷曲的新手。我打算创建一个类似http://www.skyscanner.com.my/正在做的网站,允许用户从http://airasia.com网站获取来源,目的地和日期。然后,网站将航班时刻表和票价返回给用户。以下是我目前的代码:

代码:

<?php

$post_data['Origin']=$_POST['origin'];
$post_data['Destination']=$_POST['destination'];
$post_data['From']=$_POST['departDate'];
$post_data['To']=$_POST['returnDate'];


foreach ($post_data as $key => $value)
{ 
    $post_items[] = $key . '=' . $value;
}
$post_string = implode ('&', $post_items);


$curl_connection = curl_init('https://booking.airasia.com/search.aspx');

curl_setopt($curl_connection, CURLOPT_CONNECTTIMEOUT, 30);
curl_setopt($curl_connection, CURLOPT_USERAGENT, "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)");
curl_setopt($curl_connection, CURLOPT_RETURNTRANSFER, true);
curl_setopt($curl_connection, CURLOPT_SSL_VERIFYPEER, False);
curl_setopt($curl_connection, CURLOPT_FOLLOWLOCATION, 1);

curl_setopt($curl_connection, CURLOPT_POSTFIELDS, $post_string);

$result = curl_exec($curl_connection);

print_r(curl_getinfo($curl_connection));
echo curl_errno($curl_connection) . '-' . 
                curl_error($curl_connection);
curl_close($curl_connection);
echo $result;
?>

上述内容并未归还亚航的任何结果。所以我需要一些指导来继续我的任务。谢谢

2 个答案:

答案 0 :(得分:0)

您的查询字符串$post_string是正确的,但在发送curl之前,您将错过在?之前添加它。请尝试以下方法:

curl_setopt($curl_connection, CURLOPT_POSTFIELDS, "?".$post_string);

答案 1 :(得分:0)

更新

此工作:

$request = array();
$request[] = "Host: mobile.airasia.com";
$request[] = "Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8";
$request[] = "User-Agent: MOT-V9mm/00.62 UP.Browser/6.2.3.4.c.1.123 (GUI) MMP/2.0";
$request[] = "Accept-Language: en-US,en;q=0.5";
$request[] = "Connection: keep-alive";
$request[] = "Cache-Control: no-cache";
$request[] = "Pragma: no-cache";

$post = 'hash=61582ddd1b6ab8782ad63f1a6c6c1e46&trip-type=round-trip&origin=PEK&destination=SGN&date-depart-d=25&date-depart-my=2015-04&date-return-d=30&date-return-my=2015-04&passenger-count=1&child-count=0&infant-count=0&currency=MYR&depart-sellkey=&return-sellkey=&depart-details-index=&return-details-index=&depart-faretype=&return-faretype=&action=search&btnSearch=Search';
$url = 'https://mobile.airasia.com/en/search';
$ch = curl_init($url);

curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);

curl_setopt($ch, CURLOPT_POST, true);
curl_setopt($ch, CURLOPT_POSTFIELDS, $post);
curl_setopt($ch, CURLOPT_HTTPHEADER, $request);
curl_setopt($ch, CURLOPT_ENCODING,"");

curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 10);
curl_setopt($ch, CURLOPT_TIMEOUT,10);
curl_setopt($ch, CURLOPT_FAILONERROR,true);
curl_setopt($ch, CURLOPT_ENCODING,"");

curl_setopt($ch, CURLOPT_VERBOSE, true);
curl_setopt($ch, CURLOPT_HTTPHEADER, $request);
curl_setopt($ch, CURLINFO_HEADER_OUT, true);
curl_setopt($ch, CURLOPT_HEADER, true);


$data = curl_exec($ch);
if (curl_errno($ch)){
    $data .= 'Retreive Base Page Error: ' . curl_error($ch);
}
else {
  $info = rawurldecode(var_export(curl_getinfo($ch),true));
 // Get the cookies:

  $skip = intval(curl_getinfo($ch, CURLINFO_HEADER_SIZE)); 
  $requestHeader= substr($data,0,$skip);
  $data = substr($data,$skip);
  echo $data

请求标题。

POST /en/search HTTP/1.1
Accept-Encoding: deflate, gzip
Host: mobile.airasia.com
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
User-Agent: MOT-V9mm/00.62 UP.Browser/6.2.3.4.c.1.123 (GUI) MMP/2.0
Accept-Language: en-US,en;q=0.5
Connection: keep-alive
Cache-Control: no-cache
Pragma: no-cache
Content-Length: 366
Content-Type: application/x-www-form-urlencoded

回应标题:

HTTP/1.1 200 OK
Cache-Control: no-store, no-cache, must-revalidate, post-check=0, pre-check=0
Cache-control: no-cache="set-cookie"
Content-Encoding: gzip
Content-Type: text/html; charset=utf-8
Date: Mon, 20 Apr 2015 04:02:14 GMT
Expires: Thu, 19 Nov 1981 08:52:00 GMT
P3P: CP="IDC DSP COR ADM DEVi TAIi PSA PSD IVAi IVDi CONi HIS OUR IND CNT"
Server: redishot
Set-Cookie: locale=en; expires=Mon, 20-Apr-2015 05:02:11 GMT; path=/; secure
Set-Cookie: currency=MYR; expires=Mon, 20-Apr-2015 05:02:11 GMT; path=/; secure
Set-Cookie: PHPSESSID=p8mjtiiga4615pnhuu6vl1htiqkqsn7v; path=/; HttpOnly
Set-Cookie: AWSELB=CDFDE3A70C862943856FF6079178A94249700C674BDFF1E117C02BF52443FE13448AB71BEA2EA3F41C01293A39C3579A0A03905034DA565F71B4820BD1807C5558B22ED5E0;PATH=/;MAX-AGE=1800
Vary: Accept-Encoding
transfer-encoding: chunked
Connection: keep-alive

这是您返回的HTML

所需的数据
  • 航班号
  • 飞行时间
  • 费用

Results","position":1}]}}' data-disabled><div class="smallFont farelist no-discount "><div class=flight-no>D7  317</div><div class=flight-time>02:15<br>12:10</div><div class=flight-info><div class=box><div class=total-price>MYR 2,091.42</div>

Results","position":2}]}}' data-disabled><div class="smallFont farelist no-discount "><div class=flight-no>D7  317</div><div class=flight-time>02:15<br>12:55</div><div class=flight-info><div class=box><div class=total-price>MYR 2,106.26</div>

Results","position":3}]}}' data-disabled><div class="smallFont farelist no-discount "><div class=flight-no>D7  317</div><div class=flight-time>02:15<br>15:50</div><div class=flight-info><div class=box><div class=total-price>MYR 2,483.82</div>

更新结束


不要添加“?”在帖子数据中。

这是php手册中的格式:

 $post = 'key1=value1&key2=value2&key3=value3';

你不能卷曲http://booking.airasia.com/search.aspx,因为它需要javaScript。

您必须使用移动网站。使用浏览器查看HTTP请求和响应标头时,在浏览器上禁用JavaScript时执行此操作。

使用:

https://mobile.airasia.com/en/search

问题是移动网站目前无法正常运行,并表示稍后再尝试。所以我再也找不到了。

关于帖子

这是搜索发布的内容:

Content-Type: application/x-www-form-urlencoded
Content-Length: 366
hash=26edce4024c5611451a2a95a74e2bf01
&trip-type=round-trip
&origin=KUL
&destination=OOL&date-depart-d=20
&date-depart-my=2015-04&date-return-d=25
&date-return-my=2015-04
&passenger-count=1
&child-count=0&infant-count=0
&currency=MYR
&depart-sellkey=
&return-sellkey=
&depart-details-index=
&return-details-index=
&depart-faretype=
&return-faretype=
&action=search
&btnSearch=Search

因为他们的表单是application/x-www-form-urlencoded,所以你几乎正在正确地执行$ post_string。您可以将数组用于发布数据,但如果value是数组,则Content-Type标头将设置为multipart/form-data,这应该没问题。

因为它是application/x-www-form-urlencoded,您必须urlencode $post_string

$post_string` = urlencode(implode ('&', $post_items));

要获取cookie,您不需要,也可能永远不需要:

curl_setopt($curl_connection, CURLOPT_SSL_VERIFYPEER, False);

将其删除。

您将获得重定向并可能需要cookie jar:

curl_setopt($ch,CURLOPT_COOKIEFILE, "/tmp/cookie.txt") 

您可能需要设置请求标头以匹配浏览器请求:

创建一个数组以放置请求标头键值
使用您上传的Request标题中的内容填写Request数组。

示例:

$request = array();
$request[] = "Host: www.example.com";
$request[] = "Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8";
$request[] = "User-Agent: MOT-V9mm/00.62 UP.Browser/6.2.3.4.c.1.123 (GUI) MMP/2.0";
$request[] = "Accept-Language: en-US,en;q=0.5";
$request[] = "Connection: keep-alive";
$request[] = "Cache-Control: no-cache";
$request[] = "Pragma: no-cache";

添加到卷曲:

curl_setopt($ch, CURLOPT_HTTPHEADER, $request);

检查浏览器中的标题

然后我使用FireFox Inspector或Chrome开发工具。

我转到网络标签页

在FireFox中,我转到“设置”并启用“启用持久日志”
在Chrome中,我点击网络标签页上的“保留日志”

然后我使用浏览器去任何我想要卷曲的地方。

现在我可以看到每个请求和响应,包括重定向,并将它们与保存标题进行比较。

检查:一步一步

  • 右键单击选择Inspect Element
  • 选择网络标签
  • 刷新页面
  • 选择文档(chrome)或HTML(firefox)
  • 清除列表
  • 发布您的上传
  • 在“请求”列表中选择上传请求

我使用FireFox与用户代理切换器使用旧的Motorola用户代理来检索标头和HTML。然后我在curl的HTTPHEADER

中使用相同的用户代理
request[] = 'User-Agent: MOT-V9mm/00.62 UP.Browser/6.2.3.4.c.1.123 (GUI) MMP/2.0

当我尝试

时,上述情况可能不太可能导致错误