php curl + post + multipart / form-data + ASP.NET session

时间:2017-09-05 17:35:14

标签: php curl post web-scraping multipartform-data

我正在尝试废弃此网站https://www.machinemart.co.uk/,我需要将产品添加到购物车以获取特定数据。该网站使用发布请求将产品添加到购物车。

我要添加的产品的网址:https://www.machinemart.co.uk/p/clarke-amf-panel-for-kc6-and-kc10/

这是标题和正文请求的示例:

Request Headers:
Host: www.machinemart.co.uk
Agent: Mozilla/5.0 (X11; Linux x86_64; rv:56.0) Gecko/20100101 Firefox/56.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-US,en;q=0.5
Accept-Encoding: gzip, deflate, br
Referer: https://www.machinemart.co.uk/p/clarke-amf-panel-for-kc6-and-kc10/
Content-Type: multipart/form-data; boundary=---------------------------209892343219764726031397980914
Content-Length: 1224
Cookie: ASP.NET_SessionId=f3p1y0ib1za3vch4exbue4l3; 
DNT: 1
Connection: keep-alive
Upgrade-Insecure-Requests: 1
-------------------------------------
Request Body
-----------------------------209892343219764726031397980914
Content-Disposition: form-data; name="__RequestVerificationToken"

2MpklOgIiis94EbGlIIoF5N_bzOrFegTpV_YEHTlZysKZrGxeAwBaFg5S4xtnGi8Jth5CEGRn9ETlK_g55jb6k9DcHGO-RR-LXug2roZEQg1
-----------------------------209892343219764726031397980914
Content-Disposition: form-data; name="ProductId"

4aaad9a5-ad65-4842-a24a-5f455b263933
-----------------------------209892343219764726031397980914
Content-Disposition: form-data; name="ProductSku"

010629550
-----------------------------209892343219764726031397980914
Content-Disposition: form-data; name="Quantity"

1
-----------------------------209892343219764726031397980914
Content-Disposition: form-data; name="SubmitButton"

Home delivery
-----------------------------209892343219764726031397980914
Content-Disposition: form-data; name="ufprt"

DA81438F81A7BE767B068EED46F4A4CAC24A05FC23BEAFE8B1A4B536FA6EC79AA5C17510979DB132CAC8C33C62E03A07E766C55C45DAE114A63B816F7CADEE9AB165197FBCF088E0FEBAAD9E6D8145291AB9984B8764A82C56C33D9D20394A22D1E148BF3EF97DC02EC48E5C4C491B3368B66B0A750BA6815B049A13F590BC8D6A6F05D3B96F81E0308742BD37D92E81
-----------------------------209892343219764726031397980914--

我能够获得每个产品所需的数据(id,sku,token,ufp),并且我也尝试获取会话ID。这是代码:

           $data = '---------------------------17064761399835087311752471201\r\nContent-Disposition: form-data; name=\"__RequestVerificationToken\"\r\n\r\n'.$request_dataToken.'\r\n---------------------------17064761399835087311752471201\r\nContent-Disposition: form-data; name=\"ProductId\"\r\n\r\n'.$request_dataProductId.'\r\n---------------------------17064761399835087311752471201\r\nContent-Disposition: form-data; name=\"ProductSku\"\r\n\r\n'.$request_dataProductSku.'\r\n---------------------------17064761399835087311752471201\r\nContent-Disposition: form-data; name=\"Quantity\"\r\n\r\n1\r\n---------------------------17064761399835087311752471201\r\nContent-Disposition: form-data; name=\"SubmitButton\"\r\n\r\nHome delivery\r\n---------------------------17064761399835087311752471201\r\nContent-Disposition: form-data; name=\"ufprt\"\r\n\r\n'.$request_dataUFPRT.'\r\n---------------------------17064761399835087311752471201';

//Get the session id
         $session_id = '';
                $curl_handle = curl_init();
                curl_setopt($curl_handle, CURLOPT_URL, $request_url);
                curl_setopt($curl_handle, CURLOPT_POST, TRUE);
                curl_setopt($curl_handle, CURLOPT_POSTFIELDS, $data);
                curl_setopt($curl_handle, CURLOPT_HEADER, 1);
                curl_setopt($curl_handle, CURLOPT_RETURNTRANSFER, 1);
                curl_setopt($curl_handle, CURLOPT_AUTOREFERER, TRUE);
                curl_setopt($curl_handle, CURLOPT_FOLLOWLOCATION, 1);
                curl_setopt($curl_handle, CURLOPT_SSL_VERIFYPEER, false);
                $response = curl_exec($curl_handle);

                // get cookies
                $cookies = array();
                preg_match_all('/Set-Cookie:(\s{0,}.*)$/im', $response, $cookies);

                curl_close($curl_handle);

                foreach ($cookies[1] as $cookie){

                if(preg_match('/Id\=([a-z0-9]+)\;/', $cookie, $out)){

                    $session_id = $out[1];
                }
            }

    //Try to add the product with the session id
        $curl_handle = curl_init();
                curl_setopt($curl_handle,CURLOPT_COOKIE,'ASP.NET_SessionId='.$session_id);
                curl_setopt($curl_handle, CURLOPT_URL, $request_url);
                curl_setopt($curl_handle, CURLOPT_POST, TRUE);
                curl_setopt($curl_handle, CURLOPT_POSTFIELDS, $data);
               // curl_setopt($curl_handle, CURLOPT_HEADER, 1);
                curl_setopt($curl_handle, CURLOPT_RETURNTRANSFER, 1);
                curl_setopt($curl_handle, CURLOPT_AUTOREFERER, TRUE);
                curl_setopt($curl_handle, CURLOPT_FOLLOWLOCATION, 1);
                curl_setopt($curl_handle, CURLOPT_SSL_VERIFYPEER, false);
                $response = curl_exec($curl_handle);
                curl_close($curl_handle);


        //Get the cart page -- Actually empty returns
                $curl_handle = curl_init();
                  curl_setopt($curl_handle,CURLOPT_COOKIE,'ASP.NET_SessionId='.$session_id);
                curl_setopt($curl_handle, CURLOPT_URL, $request_urlCart);
                //curl_setopt($curl_handle, CURLOPT_POST, TRUE);
                //curl_setopt($curl_handle, CURLOPT_POSTFIELDS, $data);
                //curl_setopt($curl_handle, CURLOPT_HEADER, 1);
                curl_setopt($curl_handle, CURLOPT_RETURNTRANSFER, 1);
                curl_setopt($curl_handle, CURLOPT_AUTOREFERER, TRUE);
                curl_setopt($curl_handle, CURLOPT_FOLLOWLOCATION, 1);
                curl_setopt($curl_handle, CURLOPT_SSL_VERIFYPEER, false);
                $response = curl_exec($curl_handle);
                curl_close($curl_handle);

                print $response;

以下是变量$ data的输出示例:

  

--------------------------- 17064761399835087311752471201 \ r \ nConContent-Disposition:form-data;名= \ “__ RequestVerificationToken \” \ r \ n \ r \ nx9RTuiKcC0IJR0OwNFicu6XxPXoOt5dtgaXVEdQxpwRGGv52fdv7IJ9zcjkz1HnYaLX5yaz2e0pXCZFLi_judYwQLT - vCg2_xUMzsaT5Rc1 \ r \ n --------------------------- 17064761399835087311752471201 \ r \ nConContent-Disposition:form-data;名= \ “产品编号\” \ r \ n \ r \ n4aaad9a5-ad65-4842-A24A-5f455b263933 \ r \ n ----------------------- ---- 17064761399835087311752471201 \ r \ nConContent-Disposition:form-data;名= \ “ProductSku \” \ r \ n \ r \ n010629550 \ r \ n --------------------------- 17064761399835087311752471201 \ r \ nContent-Disposition:form-data;名= \ “数量\” \ r \ n \ r \ N1 \ r \ n --------------------------- 17064761399835087311752471201 \ r \ nContent-Disposition:form-data; name = \“SubmitButton \”\ r \ n \ r \ nHome delivery \ r \ n --------------------------- 17064761399835087311752471201 \ r \ n \ nContent-Disposition:form-data;名= \ “ufprt \” \ r \ n \ r \ nF5CA6BAC0C5C12E3B885CE69FE5E0D24480EA23E895AD3DA72BFDF6832B56CD8A70F1183BFA03F61AD353FA86DCDD71CA105A86A0274A27152E68A66449191BD8167B6E06A2982B326BBC1E47C7C9AB3984A7BB17ECB9E153496542F7DE8B00D97FEFAE8B6120A6C3B87CAA74E875E68BE894586468FD0704B11346A6E1BC902BC538D64CA23DD87068DCA52CC5AC19F \ r \ n --------------------------- 17064761399835087311752471201

我尝试添加标头来请求,将参数作为数组发送......但仍然不起作用,我做得不好?

1 个答案:

答案 0 :(得分:0)

一个解决方案,虽然我不得不说代码需要一些额外的工作才能更加扎实(我在代码中用注释标记容易出错的部分)

1.输入任何产品页面网址(不在产品列表中),也可以从产品列表中找到解决方案,但此示例适用于产品页面。

$ch = curl_init();
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($ch,CURLOPT_URL,"https://www.machinemart.co.uk/p/clarke-ctj2qlp-2-tonne-quick-lift-low-profile/"); //type your url here as explained above
curl_setopt($ch,CURLOPT_RETURNTRANSFER,1);
curl_setopt($ch,CURLOPT_USERAGENT, "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US) AppleWebKit/525.13 (KHTML, like Gecko) Chrome/0.A.B.C Safari/525.13");
curl_setopt($ch,CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($ch,CURLOPT_FOLLOWLOCATION, true ); 
   curl_setopt($ch,CURLOPT_COOKIEJAR,$_SERVER['DOCUMENT_ROOT'].'/extra path here/'.$_COOKIE['PHPSESSID'].'.txt');
curl_setopt($ch,CURLOPT_COOKIEFILE,$_SERVER['DOCUMENT_ROOT'].'/extra path here/'.$_COOKIE['PHPSESSID'].'.txt');
$data = curl_exec($ch);
curl_close($ch);

上一个片段将获取网址并设置COOKIE

2.接下来,我们将使用PHP DOM library从表单中获取数据,所有必需的字段以便购物车功能在表单元素中。在我的例子中,我假设第二个表单元素实际上是产品形式,但你必须仔细检查哪一个是正确的

libxml_use_internal_errors(true);
$siteData = new DOMDocument();
$siteData->loadHTML($data);

$forms = $siteData->getElementsByTagName("form");
$inputs = $forms->item(1)->getElementsByTagName("input");
$search = array();
for($i=0;$i<$inputs->length;$i++){
    if($inputs->item($i)->getAttribute("class")!="greyBtn"){
        $search[$inputs->item($i)->getAttribute("name")] = $inputs->item($i)->getAttribute("value");
    }
}

$submitURL = "https://www.machinemart.co.uk".$forms->item(1)->getAttribute("action");

代码获取所有名称和值并将它们添加到名为$ search的数组中,还创建另一个名为$ submitURL的变量,该变量是步骤3中的curl将用作参数的URL。

3.我们再次调用curl,将$ submitURL变量和$ search数组作为目标url作为post参数。

$ch = curl_init();
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($ch,CURLOPT_URL,$submitURL);
curl_setopt($ch,CURLOPT_RETURNTRANSFER,1);
curl_setopt($ch,CURLOPT_POSTFIELDS,http_build_query($search));
curl_setopt($ch,CURLOPT_POST, true);
curl_setopt($ch,CURLOPT_USERAGENT, "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US) AppleWebKit/525.13 (KHTML, like Gecko) Chrome/0.A.B.C Safari/525.13");
curl_setopt($ch,CURLOPT_HTTPHEADER, array('Content-Type: application/x-www-form-urlencoded'));
curl_setopt($ch,CURLOPT_HEADER, 0);
curl_setopt($ch,CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($ch,CURLOPT_FOLLOWLOCATION, true );
curl_setopt($ch,CURLOPT_COOKIEJAR,LOG_DIR.'/'.$_COOKIE['PHPSESSID'].'.txt');
curl_setopt($ch,CURLOPT_COOKIEFILE,LOG_DIR.'/'.$_COOKIE['PHPSESSID'].'.txt');

$data = curl_exec($ch);
curl_close($ch);

echo $data;

$ data变量保存页面,如果你回应它(虽然丑陋,因为缺少css)你会看到产品在购物车中。