用户使用curl登录后,php获取html

时间:2017-06-13 09:06:45

标签: php curl

我想从网页上获取内容,但用户会看到不同的信息,具体取决于他是否已登录。我想用curl发送头信息来模拟用户登录。 我检查了网络,这些是响应头:

Cache-Control:no-store, no-cache, must-revalidate, post-check=0, pre-check=0
Connection:close
Content-Type:text/html
Date:Tue, 13 Jun 2017 08:08:52 GMT
Expires:Thu, 19 Nov 1981 08:52:00 GMT
Location:http://dims-92.com/ClientNewsPage
Pragma:no-cache
Server:Apache/2.2.3 (CentOS)
Transfer-Encoding:chunked
X-Powered-By:PHP/5.5.30

还有这个请求有效载荷:

------WebKitFormBoundaryaSWkHLJeD9EymCJb
Content-Disposition: form-data; name="SubmitControlId"

Auto_CAuthenticate_LogIn_LogIn_Standart
------WebKitFormBoundaryaSWkHLJeD9EymCJb
Content-Disposition: form-data; name="ParameterInfo"

undefined
------WebKitFormBoundaryaSWkHLJeD9EymCJb
Content-Disposition: form-data; name="FC_CEShop_SearchControl_SearchInput"


------WebKitFormBoundaryaSWkHLJeD9EymCJb
Content-Disposition: form-data; name="FC_CAuthenticate_LogIn_UsernameInput"

user
------WebKitFormBoundaryaSWkHLJeD9EymCJb
Content-Disposition: form-data; name="FC_CAuthenticate_LogIn_PasswordInput"

password
------WebKitFormBoundaryaSWkHLJeD9EymCJb--

我试过这个:

$url = "http://dims-92.com/ClientDisplayProductFolder?param=4553686f703a434e493d3935343b434e494c3d3b5649443d3b543d42473b";
$ch = curl_init($url);
curl_setopt($ch, CURLOPT_HTTPHEADER, array(
    'Content-Type: multipart/form-data; boundary=----WebKitFormBoundaryaSWkHLJeD9EymCJb',
    'Content-Length: 671',
    '------WebKitFormBoundaryaSWkHLJeD9EymCJb
Content-Disposition: form-data; name="SubmitControlId"

Auto_CAuthenticate_LogIn_LogIn_Standart
------WebKitFormBoundaryaSWkHLJeD9EymCJb
Content-Disposition: form-data; name="ParameterInfo"

undefined
------WebKitFormBoundaryaSWkHLJeD9EymCJb
Content-Disposition: form-data; name="FC_CEShop_SearchControl_SearchInput"


------WebKitFormBoundaryaSWkHLJeD9EymCJb
Content-Disposition: form-data; name="FC_CAuthenticate_LogIn_UsernameInput"

user
------WebKitFormBoundaryaSWkHLJeD9EymCJb
Content-Disposition: form-data; name="FC_CAuthenticate_LogIn_PasswordInput"

password
------WebKitFormBoundaryaSWkHLJeD9EymCJb--'
));
$content = curl_exec($ch);
echo $content;

但我看到的页面只是说:错误请求

您的浏览器发送了此服务器无法理解的请求。 请求标题字段缺失':'分隔器。 ------ WebKitFormBoundaryaSWkHLJeD9EymCJb

2 个答案:

答案 0 :(得分:1)

你不能像这样张贴标题,它们必须是这样的数组:

WITH order_set AS
     (SELECT 1 orderid, 'O1' ordername, 2000 orderprice, 'Open' orderstatus
        FROM DUAL
      UNION
      SELECT 2 orderid, 'O2' ordername, 4000 orderprice, 'Closed' orderstatus
        FROM DUAL),
     prod_set AS
     (SELECT '11' productid, '1' orderidref, 'P1' productname,
             10 productprice
        FROM DUAL
      UNION
      SELECT '12' productid, '1' orderidref, 'P2' productname,
             10 productprice
        FROM DUAL),
     supp_set AS
     (SELECT '111' supplierid, '11' productidref, 'S1' suppliername,
             100 supplierprice
        FROM DUAL
      UNION
      SELECT '112' supplierid, '11' productidref, 'S2' suppliername,
             200 supplierprice
        FROM DUAL)
SELECT *
  FROM order_set JOIN prod_set ON (orderid = orderidref)
       JOIN supp_set ON (productid = productidref)
 WHERE filter_condition;

问题是您发布了无效的整个有效负载(例如curl_setopt($ch, CURLOPT_HTTPHEADER, array( 'Content-Type: multipart/form-data', 'Content-Length: 671', 'Content-Disposition: form-data', .... )); )。

答案 1 :(得分:1)

您的代码混淆了请求的HTTP HEADERS和HTTP BODY

Content-Type: multipart/form-data; boundary=----WebKitFormBoundaryaSWkHLJeD9EymCJb
Content-Length: 671

这些是HTTP REQUEST HEADERS的一部分,实际上是进入CURLOPT_HTTP HEADER

Content-Disposition: form-data; name="SubmitControlId"

Auto_CAuthenticate_LogIn_LogIn_Standart
------WebKitFormBoundaryaSWkHLJeD9EymCJb
Content-Disposition: form-data; name="ParameterInfo"

undefined

这是HTTP REQUEST BODY的一部分,正文不会进入CURLOPT_HTTP HEADER

现在,与Julien Lachal在https://stackoverflow.com/a/44517070/1067003中所说的相反,您实际上可以自己编码整个请求正文(使用CURLOPT_POSTCURLOPT_INFILE),但在使用multipart/form-data时或application/x-www-form-urlencoded编码,更容易,更安全,更不容易出错,让curl为您编码。 (自己编码的常见原因是,当POST到需要content-type: application/json的JSON API时,curl不支持自动编码为JSON。)

告诉curl为您完成此操作,只需使用CURLOPT_POSTCURLOPT_POSTFIELDS,就像这样:

curl_setopt_array ( $ch, array (
        CURLOPT_POST => true,
        CURLOPT_POSTFIELDS => array (
                'SubmitControlId' => 'Auto_CAuthenticate_LogIn_LogIn_Standart',
                'ParameterInfo' => 'undefined',
                'FC_CEShop_SearchControl_SearchInput' => '',
                'FC_CAuthenticate_LogIn_UsernameInput' => 'user',
                'FC_CAuthenticate_LogIn_PasswordInput' => 'password' 
        ) 
) );

现在libcurl将自动multipart/form-data - 对其进行编码,并设置正确的content-type,并设置正确的content-length标头,实际的HTTP请求将如下所示:

Http请求标题:

POST / HTTP/1.1
Host: 127.0.0.1:8080
Accept: */*
Content-Length: 686
Expect: 100-continue
Content-Type: multipart/form-data; boundary=------------------------b6890d3827808ee1

Http Request Body:

--------------------------b6890d3827808ee1
Content-Disposition: form-data; name="SubmitControlId"

Auto_CAuthenticate_LogIn_LogIn_Standart
--------------------------b6890d3827808ee1
Content-Disposition: form-data; name="ParameterInfo"

undefined
--------------------------b6890d3827808ee1
Content-Disposition: form-data; name="FC_CEShop_SearchControl_SearchInput"


--------------------------b6890d3827808ee1
Content-Disposition: form-data; name="FC_CAuthenticate_LogIn_UsernameInput"

user
--------------------------b6890d3827808ee1
Content-Disposition: form-data; name="FC_CAuthenticate_LogIn_PasswordInput"

password
--------------------------b6890d3827808ee1--

但请注意,许多网站不支持multipart/form-data和/或更喜欢application/x-www-form-urlencoded编码。要使用它,请对CURLOPT_POSTFIELDS的数据使用http_build_query,如下所示:

curl_setopt_array ( $ch, array (
        CURLOPT_POST => true,
        CURLOPT_POSTFIELDS => http_build_query ( array (
                'SubmitControlId' => 'Auto_CAuthenticate_LogIn_LogIn_Standart',
                'ParameterInfo' => 'undefined',
                'FC_CEShop_SearchControl_SearchInput' => '',
                'FC_CAuthenticate_LogIn_UsernameInput' => 'user',
                'FC_CAuthenticate_LogIn_PasswordInput' => 'password' 
        ) ),
        CURLOPT_URL => 'http://127.0.0.1:8080' 
) );

现在实际的HTTP请求如下所示:

HTTP请求标头:

POST / HTTP/1.1
Host: 127.0.0.1:8080
Accept: */*
Content-Length: 204
Content-Type: application/x-www-form-urlencoded

HTTP请求正文:

SubmitControlId=Auto_CAuthenticate_LogIn_LogIn_Standart&ParameterInfo=undefined&FC_CEShop_SearchControl_SearchInput=&FC_CAuthenticate_LogIn_UsernameInput=user&FC_CAuthenticate_LogIn_PasswordInput=password