Guru.com刮板

时间:2012-08-21 21:45:01

标签: php curl web-crawler

我正在尝试取消Guru.com (http://www.guru.com/pro/search.aspx?#&&page=1&budget=1000-30000&cid=800&sort=PostedDateDescending)的所有工作。如果您研究此链接,您会注意到它正在执行ajax请求以获取作业列表并且只是在加载时填充虚拟html。它是一个ASP.NET服务器,我试图模仿这个ajax请求来获取作业列表。这是我的代码

$file_contents = parent::fetchPage($url);
        $string = tidy_repair_string($file_contents);
        $xmldoc = new DOMDocument();
        $xmldoc->loadHTML($string);
        $xpathvar = new Domxpath($xmldoc);

        $titleResult = $xpathvar->query('//input[@name="__VIEWSTATE"]');
        $postvars = 'ctl00%24scriptMgr=ctl00%24scriptMgr%7Cctl00%24scriptMgr&__EVENTTARGET=ctl00%24scriptMgr&__EVENTARGUMENT=page%3D1%26budget%3D1000-30000%26cid%3D800%26sort%3DPostedDateDescending&ctl00_scriptMgr_HiddenField=&__LASTFOCUS=&ctl00%24procnt%24hdnSearchId=&ctl00%24procnt%24ddlBudgetMin=0&ctl00%24procnt%24ddlBudgetMax=0&ctl00%24procnt%24Location=rbworldwide&ctl00%24procnt%24ddlCountries=Select%20One&ctl00%24procnt%24ddlCities=Select%20One&ctl00%24procnt%24txtKeyWords=Keywords%20or%20Project%20ID&ctl00%24procnt%24hdnKeyWords=&ctl00%24procnt%24ucLeadResults%24hdnSrc=&ctl00%24procnt%24ucLeadResults%24hdnPageNo=&ctl00%24procnt%24ucLeadResults%24hdnSearchResults=&ctl00%24procnt%24ucLeadResults%24hdnProjectID=&ctl00%24procnt%24txtSearchName=&ctl00%24procnt%24txtSearchName_reqd=true&ctl00%24procnt%24hdnSelectedTab=0&ctl00%24procnt%24hdnSelectSvdSearch=&ctl00%24hdnGuid=GUID&__ASYNCPOST=true&';
        $postvars .= '__VIEWSTATE='.urlencode($titleResult->item(0)->attributes->getNamedItem('value')->textContent);
        parse_str($postvars, $query);

        print_r($query);

        $ch = curl_init("http://www.guru.com/pro/search.aspx");
        curl_setopt($ch, CURLOPT_POST      ,1);
        curl_setopt($ch, CURLOPT_POSTFIELDS    ,$query);
        curl_setopt($ch, CURLOPT_FOLLOWLOCATION  ,1); 
        curl_setopt($ch, CURLOPT_HEADER      ,1);  // DO NOT RETURN HTTP HEADERS 
        curl_setopt($ch, CURLOPT_RETURNTRANSFER  ,1);  // RETURN THE CONTENTS OF THE CALL
        curl_setopt($ch, CURLOPT_HTTP_VERSION, CURL_HTTP_VERSION_1_1);
        curl_setopt($ch, CURLOPT_AUTOREFERER, TRUE);
        curl_setopt($ch, CURLOPT_HTTPHEADER, array('Host: www.guru.com',
            'User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.7; rv:14.0) Gecko/20100101 Firefox/14.0.1',
            'Accept: text/plain, text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
            'Accept-Language: en-us,en;q=0.5',
            'Accept-Encoding: gzip, deflate',
            'Connection: keep-alive',
            'X-MicrosoftAjax: Delta=true',
            'Cache-Control: no-cache, no-cache',
            'Content-Type: application/x-www-form-urlencoded; charset=utf-8',
            'Referer: http://www.guru.com/pro/search.aspx?',
            'Content-Length: 15818',
            'Pragma: no-cache'
        ));
        echo $Rec_Data = curl_exec($ch);
        $Rec_Data = gzinflate(substr($Rec_Data, 10, -8));
        echo $Rec_Data;

我正在发送相同的标头,因为我在Live HTTP Headers firefox插件上看到它们,这是为ajax请求发送的。我收到以下回复标题。

HTTP/1.1 100 Continue

HTTP/1.1 200 OK
Cache-Control: private
Content-Type: text/html; charset=utf-8
Content-Encoding: gzip
Vary: Accept-Encoding
Server: Microsoft-IIS/7.0
Set-Cookie: ASP.NET_SessionId=eip5l545vkyk3aqhq5kliijj; path=/; HttpOnly
X-AspNet-Version: 2.0.50727
X-Powered-By: ASP.NET
Date: Tue, 21 Aug 2012 21:28:57 GMT
Content-Length: 44051

而我在guru.com完成ajax时收到的Live HTTP Headers插件中看到的响应头是

HTTP/1.1 200 OK
Cache-Control: private
Content-Type: text/plain; charset=utf-8
Content-Encoding: gzip
Vary: Accept-Encoding
Server: Microsoft-IIS/7.0
Set-Cookie: ASP.NET_SessionId=bnlurvq3crot1r55juykpt3y; path=/; HttpOnly
X-AspNet-Version: 2.0.50727
X-Powered-By: ASP.NET
Date: Tue, 21 Aug 2012 19:25:19 GMT
Content-Length: 43290

Guru.com正在接收工作列表,而我收到的虚拟html不包含任何工作。任何人都可以告诉我我的代码有什么问题。以下是Guru.com发送到其服务器的请求标头,用于ajax调用。

POST /pro/search.aspx HTTP/1.1
Host: www.guru.com
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.7; rv:14.0) Gecko/20100101 Firefox/14.0.1
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-us,en;q=0.5
Accept-Encoding: gzip, deflate
Connection: keep-alive
X-MicrosoftAjax: Delta=true
Cache-Control: no-cache, no-cache
Content-Type: application/x-www-form-urlencoded; charset=utf-8
Referer: http://www.guru.com/pro/search.aspx?
Content-Length: 15818
Pragma: no-cache
ctl00%24scriptMgr=ctl00%24scriptMgr%7Cctl00%24scriptMgr&__EVENTTARGET=ctl00%24scriptMgr&__EVENTARGUMENT=page%3D1%26budget%3D1000-30000%26cid%3D800%26sort%3DPostedDateDescending&ctl00_scriptMgr_HiddenField=&__LASTFOCUS=&__VIEWSTATE=%2FwEPDwULLTE0NDg0NjI5MzIPZBYCZg9kFgQCBQ8PFgIeB1Zpc2libGVoZGQCBw9kFgICBw9kFhICBA9kFgJmD2QWAgIBD2QWAgIBDzwrAAkAZAIHDxYCHg5Qb3N0QmFja1NjcmlwdAUsX19kb1Bvc3RCYWNrKCdjdGwwMCRwcm9jbnQkbGlua0NuZnJtRGVsJywnJylkAgsPZBYCZg9kFgICAw88KwAJAGQCDA9kFgJmD2QWBAIBDxBkZBYBZmQCAw8QZGQWAWZkAg0PZBYCZg9kFgICAg8WBB4FY2xhc3MFOXBhZGRpbmdCb3R0b202IG1hcmdpblRvcDUgYm9yZGVyR3JleUNDQyBiYWNrTHRHcmF5IHR4dDk5OR4Fc3R5bGUFDmNvbG9yOiNjY2NjY2M7FgYCAQ8WAh8CBRJwYWRkaW5nVG9wNSB0eHQ2NjZkAgMPFgIfAgUScGFkZGluZ1RvcDUgdHh0NjY2ZAIFDxYCHwIFEnBhZGRpbmdUb3A1IHR4dDY2NmQCDg9kFgJmD2QWCmYPEA9kFgIeB29uY2xpY2sFH2phdmFzY3JpcHQ6Q2hlY2tMaXN0Q291bnQodGhpcylkZGQCAQ8QD2QWAh8EBR9qYXZhc2NyaXB0OkNoZWNrTGlzdENvdW50KHRoaXMpZGRkAgIPEA8WBh4NRGF0YVRleHRGaWVsZAUETmFtZR4ORGF0YVZhbHVlRmllbGQFAklEHgtfIURhdGFCb3VuZGdkEBXIAQpTZWxlY3QgT25lC0FmZ2hhbmlzdGFuB0FsYmFuaWEHQWxnZXJpYQdBbmRvcnJhBkFuZ29sYQhBbmd1aWxsYRNBbnRpZ3VhIGFuZCBCYXJidWRhCEFudGlsbGVzCUFyZ2VudGluYQdBcm1lbmlhBUFydWJhCUF1c3RyYWxpYQdBdXN0cmlhCkF6ZXJiYWlqYW4HQmFoYW1hcwdCYWhyYWluCkJhbmdsYWRlc2gIQmFyYmFkb3MHQmVsZ2l1bQZCZWxpemUFQmVuaW4HQmVybXVkYQZCaHV0YW4HQm9saXZpYRdCb3NuaWEgQW5kIEhlcnplZ292aW5hIAhCb3Rzd2FuYQ1Cb3V2ZXQgSXNsYW5kBkJyYXppbBZCcml0aXNoIFZpcmdpbiBJc2xhbmRzBkJydW5laQhCdWxnYXJpYQxCdXJraW5hIEZhc28HQnVydW5kaQhDYW1ib2RpYQhDYW1lcm9vbgZDYW5hZGEKQ2FwZSBWZXJkZQ5DYXltYW4gSXNsYW5kcxhDZW50cmFsIEFmcmljYW4gUmVwdWJsaWMEQ2hhZAVDaGlsZQVDaGluYQhDb2xvbWJpYQdDb21vcm9zDENvb2sgSXNsYW5kcwpDb3N0YSBSaWNhB0Nyb2F0aWEGQ3lwcnVzDkN6ZWNoIFJlcHVibGljB0Rlbm1hcmsIRGppYm91dGkIRG9taW5pY2ESRG9taW5pY2FuIFJlcHVibGljCkVhc3QgVGltb3IHRWN1YWRvcgVFZ3lwdAtFbCBTYWx2YWRvchFFcXVhdG9yaWFsIEd1aW5lYQdFcml0cmVhB0VzdG9uaWEIRXRoaW9waWEQRmFsa2xhbmQgSXNsYW5kcw1GYXJvZSBJc2xhbmRzBEZpamkHRmlubGFuZAZGcmFuY2UQRnJlbmNoIFBvbHluZXNpYQVHYWJvbgZHYW1iaWEHR2VvcmdpYQdHZXJtYW55BUdoYW5hCUdpYnJhbHRhcgZHcmVlY2UJR3JlZW5sYW5kB0dyZW5hZGEJR3VhdGVtYWxhCEd1ZXJuc2V5Bkd1aW5lYQ1HdWluZWEtQmlzc2F1Bkd1eWFuYQVIYWl0aQhIb2x5IFNlZQhIb25kdXJhcwlIb25nIEtvbmcHSHVuZ2FyeQdJY2VsYW5kBUluZGlhCUluZG9uZXNpYQdJcmVsYW5kBklzcmFlbAVJdGFseQdKYW1haWNhBUphcGFuBkpvcmRhbglLYXpha3N0YW4FS2VueWEIS2lyaWJhdGkGS29zb3ZvBkt1d2FpdApLeXJneXpzdGFuBExhb3MGTGF0dmlhB0xlc290aG8NTGllY2h0ZW5zdGVpbglMaXRodWFuaWEKTHV4ZW1ib3VyZwVNYWNhdQlNYWNlZG9uaWEKTWFkYWdhc2NhcgZNYWxhd2kITWFsYXlzaWEITWFsZGl2ZXMETWFsaQVNYWx0YQpNYXVyaXRhbmlhCU1hdXJpdGl1cwZNZXhpY28GTW9uYWNvCE1vbmdvbGlhCk1vbnRlbmVncm8KTW9udHNlcnJhdAdNb3JvY2NvCk1vemFtYmlxdWUHTXlhbm1hcgdOYW1pYmlhBU5hdXJ1BU5lcGFsC05ldGhlcmxhbmRzC05ldyBaZWFsYW5kCU5pY2FyYWd1YQVOaWdlcgdOaWdlcmlhBk5vcndheQRPbWFuCFBha2lzdGFuBlBhbmFtYRBQYXB1YSBOZXcgR3VpbmVhCFBhcmFndWF5BFBlcnULUGhpbGlwcGluZXMGUG9sYW5kCFBvcnR1Z2FsBVFhdGFyE1JlcHVibGljIG9mIE1vbGRvdmEHUm9tYW5pYRJSdXNzaWFuIEZlZGVyYXRpb24GUndhbmRhBVNhbW9hClNhbiBNYXJpbm8IU2FvIFRvbWUMU2F1ZGkgQXJhYmlhB1NlbmVnYWwGU2VyYmlhClNleWNoZWxsZXMMU2llcnJhIExlb25lCVNpbmdhcG9yZQhTbG92YWtpYQhTbG92ZW5pYQ9Tb2xvbW9uIElzbGFuZHMHU29tYWxpYQxTb3V0aCBBZnJpY2ENU291dGggR2VvcmdpYQtTb3V0aCBLb3JlYQVTcGFpbglTcmkgTGFua2EKU3QuIEhlbGVuYQlTdC4gS2l0dHMJU3QuIEx1Y2lhC1N0LiBWaW5jZW50CFN1cmluYW1lCVN3YXppbGFuZAZTd2VkZW4LU3dpdHplcmxhbmQGVGFpd2FuClRhamlraXN0YW4IVGFuemFuaWEIVGhhaWxhbmQEVG9nbwVUb25nYRNUcmluaWRhZCBhbmQgVG9iYWdvB1R1bmlzaWEGVHVya2V5DFR1cmttZW5pc3RhbhhUdXJrcyBhbmQgQ2FpY29zIElzbGFuZHMGVHV2YWx1BlVnYW5kYQdVa3JhaW5lFFVuaXRlZCBBcmFiIEVtaXJhdGVzDlVuaXRlZCBLaW5nZG9tDVVuaXRlZCBTdGF0ZXMHVXJ1Z3VheQpVemJla2lzdGFuB1ZhbnVhdHUJVmVuZXp1ZWxhB1ZpZXRuYW0OV2VzdGVybiBTYWhhcmEFWWVtZW4GWmFtYmlhFcgBClNlbGVjdCBPbmUBMgEzATQBNgE3ATgBOQIxMAIxMQIxMgIxMwIxNAIxNQIxNgIxNwIxOAIxOQIyMAIyMgIyMwIyNAIyNQIyNgIyNwMyMTgCMjkCMzACMzECMzICMzMCMzQCMzUCMzYCMzcCMzgCMzkCNDACNDECNDICNDMCNDQCNDUCNDYCNDcCNDkCNTACNTECNTMCNTQCNTYCNTcCNTgCNTkCNjACNjECNjICNjMCNjQCNjUCNjYCNjcCNjgCNjkCNzACNzECNzIDMjI0AjczAjc0Ajc1Ajc2Ajc3Ajc4Ajc5AjgwAjgxAjgyAzIyMwI4MwI4NAI4NQI4NgI4NwI4OAI4OQI5MAI5MQI5MgI5MwI5NgI5NwI5OAI5OQMxMDADMTAxAzEwMgMxMDMDMTA0AzIyMQMxMDUDMTA2AzEwNwMxMDgDMTEwAzExMwMxMTQDMTE1AzExNgMxMTcDMTE4AzExOQMxMjADMTIxAzEyMgMxMjMDMTI1AzEyNgMxMjcDMTI5AzEzMAMyMjIDMTMxAzEzMgMxMzMDMTM0AzEzNwMxMzgDMTM5AzE0MAMxNDEDMTQyAzE0MwMxNDQDMTQ1AzE0NgMxNDcDMTQ5AzE1MAMxNTEDMTUyAzE1MwMxNTQDMTU1AzE1NgMxNTcDMTU4AzE1OQMxNjADMTY0AzE2NQMxNjYDMTY3AzE2OAMyMjADMTY5AzE3MAMxNzEDMTcyAzE3MwMxNzQDMTc1AzE3NgMxNzcDMTYxAzE3OAMxNzkDMTYyAzE4MAMxNjMDMTgxAzE4MwMxODQDMTg1AzE4NgMxODgDMTg5AzE5MAMxOTEDMTkyAzE5MwMxOTQDMTk1AzE5NgMxOTcDMTk4AzE5OQMyMDIDMjAzAzIwMAMyMDEBMQMyMDQDMjA1AzIwNgMyMDcDMjA4AzIwOQMyMTADMjEzFCsDyAFnZ2dnZ2dnZ2dnZ2dnZ2dnZ2dnZ2dnZ2dnZ2dnZ2dnZ2dnZ2dnZ2dnZ2dnZ2dnZ2dnZ2dnZ2dnZ2dnZ2dnZ2dnZ2dnZ2dnZ2dnZ2dnZ2dnZ2dnZ2dnZ2dnZ2dnZ2dnZ2dnZ2dnZ2dnZ2dnZ2dnZ2dnZ2dnZ2dnZ2dnZ2dnZ2dnZ2dnZ2dnZ2dnZ2dnZ2dnZ2dnZ2dnZ2dnZ2dnZ2dnZ2dnZ2dnZ2dnZ2dnZ2dnZ2dnZ2dnZ2dnZ2dnZ2dnZ2dnZ2dnZ2dnZ2dnZxYBZmQCAw8QD2QWAh8EBR9qYXZhc2NyaXB0OkNoZWNrTGlzdENvdW50KHRoaXMpZGRkAgQPEA8WBh8FBQROYW1lHwYFAklEHwdnZBAV1wIKU2VsZWN0IE9uZQtBYmlsZW5lLCBUWAlBa3JvbiwgT0gPQWxiYW55IEFyZWEsIE5ZCkFsYmFueSwgR0EPQWxidXF1ZXJxdWUsIE5NDkFsZXhhbmRyaWEsIExBF0FsbGVudG93bi1CZXRobGVoZW0sIFBBC0FsdG9vbmEsIFBBDEFtYXJpbGxvLCBUWBJBbWVyaWNhbiBTYW1vYSwgQVMVQW5haGVpbS1TYW50YSBBbmEsIENBDUFuY2hvcmFnZSwgQUsMQW5kZXJzb24sIElODEFuZGVyc29uLCBTQw1Bbm4gQXJib3IsIE1JDEFubmlzdG9uLCBBTAxBcHBsZXRvbiwgV0kNQXNoZXZpbGxlLCBOQwpBdGhlbnMsIEdBC0F0bGFudGEsIEdBEUF0bGFudGljIENpdHksIE5KC0F1Z3VzdGEsIEdBEEF1cm9yYS1FbGdpbiwgSUwKQXVzdGluLCBUWA9CYWtlcnNmaWVsZCwgQ0ENQmFsdGltb3JlLCBNRApCYW5nb3IsIE1FD0JhdG9uIFJvdWdlLCBMQRBCYXR0bGUgQ3JlZWssIE1JGEJlYXVtb250LVBvcnQgQXJ0aHVyLCBUWBFCZWF2ZXIgQ291bnR5LCBQQQ5CZWxsaW5naGFtLCBXQRFCZW50b24gSGFyYm9yLCBNSRJCZXJnZW4tUGFzc2FpYywgTkoMQmlsbGluZ3MsIE1UE0JpbG94aS1HdWxmcG9ydCwgTVMOQmluZ2hhbXRvbiwgTlkOQmlybWluZ2hhbSwgQUwMQmlzbWFyY2ssIE5ED0Jsb29taW5ndG9uLCBJThZCbG9vbWluZ3Rvbi1Ob3JtYWwsIElMCUJvaXNlLCBJRApCb3N0b24sIE1BFEJvdWxkZXItTG9uZ21vbnQsIENPDUJyYWRlbnRvbiwgRkwMQnJhem9yaWEsIFRYDUJyZW1lcnRvbiwgV0EWQnJpZGdlcG9ydC1NaWxmb3JkLCBDVAtCcmlzdG9sLCBDVAxCcm9ja3RvbiwgTUEZQnJvd25zdmlsbGUtSGFybGluZ2VuLCBUWBlCcnlhbi1Db2xsZWdlIFN0YXRpb24sIFRYC0J1ZmZhbG8sIE5ZDkJ1cmxpbmd0b24sIE5DDkJ1cmxpbmd0b24sIFZUCkNhbnRvbiwgT0gKQ2FzcGVyLCBXWRBDZWRhciBSYXBpZHMsIElBDkNoYXJsZXN0b24sIFNDDkNoYXJsZXN0b24sIFdWDUNoYXJsb3R0ZSwgTkMTQ2hhcmxvdHRlc3ZpbGxlLCBWQQ9DaGF0dGFub29nYSwgVE4MQ2hleWVubmUsIFdZC0NoaWNhZ28sIElMCUNoaWNvLCBDQQ5DaW5jaW5uYXRpLCBPSA9DbGFya3N2aWxsZSwgVE4NQ2xldmVsYW5kLCBPSBRDb2xvcmFkbyBTcHJpbmdzLCBDTwxDb2x1bWJpYSwgTU8MQ29sdW1iaWEsIFNDDENvbHVtYnVzLCBHQQxDb2x1bWJ1cywgT0gSQ29ycHVzIENocmlzdGksIFRYDkN1bWJlcmxhbmQsIE1ECkRhbGxhcywgVFgLRGFuYnVyeSwgQ1QMRGFudmlsbGUsIFZBDURhdmVucG9ydCwgSUEWRGF5dG9uLVNwcmluZ2ZpZWxkLCBPSBFEYXl0b25hIEJlYWNoLCBGTAtEZWNhdHVyLCBBTAtEZWNhdHVyLCBJTApEZW52ZXIsIENPDkRlcyBNb2luZXMsIElBC0RldHJvaXQsIE1JCkRvdGhhbiwgQUwLRHVidXF1ZSwgSUEKRHVsdXRoLCBNTg5FYXUgQ2xhaXJlLCBXSQtFbCBQYXNvLCBUWBJFbGtoYXJ0LUdvc2hlbiwgSU4KRWxtaXJhLCBOWQhFbmlkLCBPSwhFcmllLCBQQRZFdWdlbmUtU3ByaW5nZmllbGQsIE9SDkV2YW5zdmlsbGUsIElODkZhbGwgUml2ZXIsIE1BEkZhcmdvLU1vb3JoZWFkLCBORBBGYXlldHRldmlsbGUsIE5DIkZlZGVyYXRlZCBTdGF0ZXMgb2YgTWljcm9uZXNpYSwgRk0YRml0Y2hidXJnLUxlb21pbnN0ZXIsIE1BDUZsYWdzdGFmZiwgQVoJRmxpbnQsIE1JDEZsb3JlbmNlLCBBTAxGbG9yZW5jZSwgU0MZRm9ydCBDb2xsaW5zLUxvdmVsYW5kLCBDTxlGb3J0IE15ZXJzLUNhcGUgQ29yYWwsIEZMD0ZvcnQgUGllcmNlLCBGTA5Gb3J0IFNtaXRoLCBBUhVGb3J0IFdhbHRvbiBCZWFjaCwgRkwORm9ydCBXYXluZSwgSU4YRm9ydCBXb3J0aC1Bcmxpbmd0b24sIFRYCkZyZXNubywgQ0EXRnQuIExhdWRlcmRhbGUgQXJlYSwgRkwLR2Fkc2RlbiwgQUwPR2FpbmVzdmlsbGUsIEZMGEdhbHZlc3Rvbi1UZXhhcyBDaXR5LCBUWBBHYXJ5LUhhbW1vbmQsIElOD0dsZW5zIEZhbGxzLCBOWQ9HcmFuZCBGb3JrcywgTkQQR3JhbmQgUmFwaWRzLCBNSQ9HcmVhdCBGYWxscywgTVQLR3JlZWxleSwgQ08NR3JlZW4gQmF5LCBXSRNHcmVlbnNib3JvIEFyZWEsIE5DGkdyZWVudmlsbGUtU3BhcnRhbmJ1cmcsIFNDCEd1YW0sIEdVDkhhZ2Vyc3Rvd24sIE1EF0hhbWlsdG9uLU1pZGRsZXRvd24sIE9IE0hhcnJpc2J1cmcgQXJlYSwgUEEMSGFydGZvcmQsIENUC0hpY2tvcnksIE5DDEhvbm9sdWx1LCBISRNIb3VtYS1UaGlib2RhdXgsIExBC0hvdXN0b24sIFRYFkh1bnRpbmd0b24tQXNobGFuZCwgV1YOSHVudHN2aWxsZSwgQUwQSW5kaWFuYXBvbGlzLCBJTg1Jb3dhIENpdHksIElBC0phY2tzb24sIE1JC0phY2tzb24sIE1TC0phY2tzb24sIFROEEphY2tzb252aWxsZSwgRkwQSmFja3NvbnZpbGxlLCBOQxVKYW5lc3ZpbGxlLUJlbG9pdCwgV0kPSmVyc2V5IENpdHksIE5KEEpvaG5zb24gQ2l0eSwgVE4NSm9obnN0b3duLCBQQQpKb2xpZXQsIElMCkpvcGxpbiwgTU8NS2FsYW1hem9vLCBNSQxLYW5rYWtlZSwgSUwPS2Fuc2FzIENpdHksIE1PC0tlbm9zaGEsIFdJEktpbGxlZW4tVGVtcGxlLCBUWA1Lbm94dmlsbGUsIFROCktva29tbywgSU4NTGEgQ3Jvc3NlLCBXSQ1MYWZheWV0dGUsIExBGUxhZmF5ZXR0ZS1XLkxhZmF5ZXR0ZSwgSU4QTGFrZSBDaGFybGVzLCBMQQ9MYWtlIENvdW50eSwgSUwYTGFrZWxhbmQtV2ludGVySGF2ZW4sIEZMDUxhbmNhc3RlciwgUEEVTGFuc2luZy1FLkxhbnNpbmcsIE1JCkxhcmVkbywgVFgOTGFzIENydWNlcywgTk0NTGFzIFZlZ2FzLCBOVgxMYXdyZW5jZSwgS1MWTGF3cmVuY2UtSGF2ZXJoaWxsLCBNQQpMYXd0b24sIE9LE0xld2lzdG9uLUF1YnVybiwgTUUVTGV4aW5ndG9uLUZheWV0dGUsIEtZCExpbWEsIE9IC0xpbmNvbG4sIE5FD0xpdHRsZSBSb2NrLCBBUhVMb25ndmlldy1NYXJzaGFsbCwgVFgQTG9yYWluLUVscmlhLCBPSBpMb3MgQW5nZWxlcy1Mb25nIEJlYWNoLCBDQQ5Mb3Vpc3ZpbGxlLCBLWQpMb3dlbGwsIE1BC0x1YmJvY2ssIFRYDUx5bmNoYnVyZywgVkEYTWFjb24tV2FybmVyIFJvYmJpbnMsIEdBC01hZGlzb24sIFdJDk1hbmNoZXN0ZXIsIE5IDU1hbnNmaWVsZCwgT0gUTWFyc2hhbGwgSXNsYW5kcywgTUgLTWNBbGxlbiwgVFgLTWVkZm9yZCwgT1ILTWVtcGhpcywgVE4KTWVyY2VkLCBDQRFNaWFtaS1IaWFsZWFoLCBGTA5NaWRkbGV0b3duLCBDVAtNaWRsYW5kLCBUWA1NaWx3YXVrZWUsIFdJGE1pbm5lYXBvbGlzLVN0LiBQYXVsLCBNTgpNb2JpbGUsIEFMC01vZGVzdG8sIENBEk1vbm1vdXRoLU9jZWFuLCBOSgpNb25yb2UsIExBDk1vbnRnb21lcnksIEFMCk11bmNpZSwgSU4MTXVza2Vnb24sIE1JCk5hcGxlcywgRkwKTmFzaHVhLCBOSA1OYXNodmlsbGUsIFROEk5hc3NhdS1TdWZmb2xrLCBOWQ9OZXcgQmVkZm9yZCwgTUEPTmV3IEJyaXRhaW4sIENUFU5ldyBIYXZlbi1NZXJpZGVuLCBDVBZOZXcgTG9uZG9uLU5vcndpY2gsIENUD05ldyBPcmxlYW5zLCBMQQxOZXcgWW9yaywgTlkKTmV3YXJrLCBOShBOaWFncmEgRmFsbHMsIE5ZEE5vcmZvbGsgQXJlYSwgVkEcTm9ydGhlcm4gTWFyaWFuYSBJc2xhbmRzLCBNUAtOb3J3YWxrLCBDVAtPYWtsYW5kLCBDQQlPY2FsYSwgRkwKT2Rlc3NhLCBUWBFPa2xhaG9tYSBDaXR5LCBPSwtPbHltcGlhLCBXQQlPbWFoYSwgTkURT3JhbmdlIENvdW50eSwgTlkLT3JsYW5kbywgRkwNT3dlbnNib3JvLCBLWRJPeG5hcmQtVmVudHVyYSwgQ0EJUGFsYXUsIFBXEVBhbG0gQmF5IEFyZWEsIEZMD1BhbmFtYSBDaXR5LCBGTBhQYXJrZXJzYnVyZy1NYXJpZXR0YSwgV1YOUGFzY2Fnb3VsYSwgTVMNUGF3dHVja2V0LCBSSQ1QZW5zYWNvbGEsIEZMClBlb3JpYSwgSUwQUGhpbGFkZWxwaGlhLCBQQQtQaG9lbml4LCBBWg5QaW5lIEJsdWZmLCBBUg5QaXR0c2J1cmdoLCBQQQ5QaXR0c2ZpZWxkLCBNQQxQb3J0bGFuZCwgTUUMUG9ydGxhbmQsIE9SDlBvcnRzbW91dGgsIE5IEFBvdWdoa2VlcHNpZSwgTlkOUHJvdmlkZW5jZSwgUkkOUHJvdm8tT3JlbSwgVVQKUHVlYmxvLCBDTwpSYWNpbmUsIFdJElJhbGVpZ2gtRHVyaGFtLCBOQw5SYXBpZCBDaXR5LCBTRAtSZWFkaW5nLCBQQQtSZWRkaW5nLCBDQQhSZW5vLCBOVgxSaWNobGFuZCwgV0EXUmljaG1vbmQtUGV0ZXJzYnVyZywgVkELUm9hbm9rZSwgVkENUm9jaGVzdGVyLCBNTg1Sb2NoZXN0ZXIsIE5ZDFJvY2tmb3JkLCBJTA5TYWNyYW1lbnRvLCBDQRBTYWdpbmF3IEFyZWEsIE1JCVNhbGVtLCBPUhRTYWxlbS1HbG91Y2VzdGVyLCBNQRRTYWxpbmFzLU1vbnRlcmV5LCBDQRhTYWx0IExha2UgQ2l0eS1PZ2RlbiwgVVQOU2FuIEFuZ2VsbywgVFgPU2FuIEFudG9uaW8sIFRYElNhbiBCZXJuYXJkaW5vLCBDQQ1TYW4gRGllZ28sIENBEVNhbiBGcmFuY2lzY28sIENBDFNhbiBKb3NlLCBDQQxTYW4gSnVhbiwgUFIRU2FudGEgQmFyYmFyYSwgQ0EOU2FudGEgQ3J1eiwgQ0EMU2FudGEgRmUsIE5NF1NhbnRhIFJvc2EtUGV0YWx1bWEsIENBDFNhcmFzb3RhLCBGTAxTYXZhbm5haCwgR0EZU2NyYW50b24tV2lsa2VzIEJhcnJlLCBQQQtTZWF0dGxlLCBXQQpTaGFyb24sIFBBDVNoZWJveWdhbiwgV0kTU2hlcm1hbi1EZW5pc29uLCBUWA5TaHJldmVwb3J0LCBMQQ5TaW91eCBDaXR5LCBJQQ9TaW91eCBGYWxscywgU0QMU29tZXJzZXQsIE5KGFNvdXRoIEJlbmQtTWlzaGF3YWthLCBJTgtTcG9rYW5lLCBXQQ9TcHJpbmdmaWVsZCwgSUwPU3ByaW5nZmllbGQsIE1BD1NwcmluZ2ZpZWxkLCBNTw9TcHJpbmdmaWVsZCwgT1INU3QuIENsb3VkLCBNTg5TdC4gSm9zZXBoLCBNTw1TdC4gTG91aXMsIE1PDFN0YW1mb3JkLCBDVBFTdGF0ZSBDb2xsZWdlLCBQQRhTdGV1YmVudmlsbGUtV2VpcnRvbiwgT0gMU3RvY2t0b24sIENBDFN5cmFjdXNlLCBOWQpUYWNvbWEsIFdBD1RhbGxhaGFzc2VlLCBGTA5UYW1wYSBBcmVhLCBGTA9UZXJyYSBIYXV0ZSwgSU4NVGV4YXJrYW5hLCBUWApUb2xlZG8sIE9IClRvcGVrYSwgS1MLVHJlbnRvbiwgTkoKVHVjc29uLCBBWglUdWxzYSwgT0sOVHVzY2Fsb29zYSwgQUwJVHlsZXIsIFRYFFVyYmFuYSBDaGFtcGFpZ24sIElMDlV0aWNhLVJvbWUsIE5ZGlZhbGxlam8tRmFpcmZpZWxkLU5hcGEsIENBDVZhbmNvdXZlciwgV0EMVmljdG9yaWEsIFRYDFZpbmVsYW5kLCBOShJWaXJnaW4gSXNsYW5kcywgVkkLVmlzYWxpYSwgQ0EIV2FjbywgVFgOV2FzaGluZ3RvbiwgREMNV2F0ZXJidXJ5LCBDVBhXYXRlcmxvby1DZWRhciBGYWxscywgSUEKV2F1c2F1LCBXSRhXZXN0IFBhbG0gQmVhY2ggQXJlYSwgRkwMV2hlZWxpbmcsIFdWEVdpY2hpdGEgRmFsbHMsIFRYC1dpY2hpdGEsIEtTEFdpbGxpYW1zcG9ydCwgUEEOV2lsbWluZ3RvbiwgREUOV2lsbWluZ3RvbiwgTkMNV29yY2VzdGVyLCBNQQpZYWtpbWEsIFdBCFlvcmssIFBBFVlvdW5nc3Rvd24tV2FycmVuLCBPSA1ZdWJhIENpdHksIENBFdcCClNlbGVjdCBPbmUBMQEyATQBMwE1ATYBNwE4ATkDMzM1AjEwAjExAjEyAjEzAjE0AjE1AjE2AjE3AjE4AjE5AjIwAjIxAjIyAjIzAjI0AjI1AjI2AjI3AjI4AjI5AjMwAjMxAjMyAjMzAjM0AjM1AjM2AjM3AjM4AjM5AjQwAjQxAjQyAjQzAjQ0AjQ1AjQ2AjQ3AjQ4AjQ5AjUwAjUxAjUyAjUzAjU0AjU1AjU2AjU3AjU5AjYwAjYxAjYyAjYzAjY0AjY1AjY2AjY3AjY4AjY5AjcwAjcxAjcyAjczAjc0Ajc1Ajc2Ajc3Ajc4Ajc5AjgwAjgxAjgyAjgzAjg0Ajg1Ajg2Ajg3Ajg4Ajg5AjkwAjkxAjkyAjkzAjk0Ajk1Ajk2Ajk3Ajk4Ajk5AzEwMAMxMDEDMzM2AzEwMwMzNDIDMTA0AzEwNQMxMDYDMTA3AzEwOQMxMTADMTExAzExMgMxMTMDMTE0AzExNQMxMDgDMTE2AzExNwMxMTgDMTE5AzEyMAMxMjEDMTIyAzEyMwMxMjQDMTI1AzEyNgMxMjcDMzM3AzEyOAMxMjkDMTMwAzEzMQMxMzIDMTMzAzEzNAMxMzUDMTM2AzEzNwMxMzgDMTM5AzE0MAMxNDEDMTQyAzE0MwMxNDQDMTQ1AzE0NgMxNDcDMTQ4AzE0OQMxNTADMTUxAzE1MgMxNTMDMTU0AzE1NQMxNTYDMTU3AzE1OAMxNTkDMTYwAzE2MQMxNjIDMTYzAzE2NAMxNjUDMTY2AzE2NwMxNjgDMTY5AzE3MAMxNzEDMTcyAzE3MwMxNzQDMTc1AzE3NgMxNzcDMTc4AzE3OQMxODADMTgxAzE4MgMxODMDMTg0AzE4NQMxODYDMTg3AzMzOAMxODgDMTg5AzE5MQMxOTIDMTkzAzE5NQMxOTYDMTk3AzE5OAMxOTkDMjAwAzIwMQMyMDIDMjAzAzIwNAMyMDUDMjA2AzIwNwMyMDgDMjA5AzIxMAMyMTEDMjEyAzIxMwMyMTQDMjE1AzIxNgMyMTcDMjE4AzMzOQMyMTkDMjIwAzIyMQMyMjIDMjIzAzIyNAMyMjUDMjI2AzIyNwMyMjgDMjI5AzM0MAMxOTADMjMwAzIzMQMyMzIDMjMzAzIzNAMyMzUDMjM2AzIzNwMyMzgDMjM5AzI0MAMyNDEDMjQyAzI0MwMyNDQDMjQ1AzI0NgMyNDcDMjQ4AzI0OQMyNTADMjUxAzI1MgMyNTMDMjU0AzI1NQMyNTcDMjU4AzI1OQMyNjADMjYxAzI2MgMyNjYDMjY3AzI2OAMyNjkDMjcwAzI3MQMyNTYDMjcyAzI3MwMyNzQDMzM0AzI3NQMyNzYDMjc3AzI3OAMyNzkDMjgwAzI4MQMyODIDMjgzAzI4NAMyODUDMjg2AzI4NwMyODgDMTk0AzI4OQMyOTADMjkxAzI5MwMyOTIDMTAyAzI2MwMyNjQDMjY1AzI5NAMyOTUDMjk2AzI5NwMyOTgDMjk5AzMwMAMzMDEDMzAyAzMwMwMzMDQDMzA1AzMwNgMzMDcDMzA4AzMwOQMzMTACNTgDMzExAzMxMgMzMTMDMzE0AzMxNQMzNDEDMzE2AzMxNwMzMTgDMzE5AzMyMAMzMjEDMzIyAzMyMwMzMjUDMzI0AzMyNgMzMjcDMzI4AzMyOQMzMzADMzMxAzMzMgMzMzMUKwPXAmdnZ2dnZ2dnZ2dnZ2dnZ2dnZ2dnZ2dnZ2dnZ2dnZ2dnZ2dnZ2dnZ2dnZ2dnZ2dnZ2dnZ2dnZ2dnZ2dnZ2dnZ2dnZ2dnZ2dnZ2dnZ2dnZ2dnZ2dnZ2dnZ2dnZ2dnZ2dnZ2dnZ2dnZ2dnZ2dnZ2dnZ2dnZ2dnZ2dnZ2dnZ2dnZ2dnZ2dnZ2dnZ2dnZ2dnZ2dnZ2dnZ2dnZ2dnZ2dnZ2dnZ2dnZ2dnZ2dnZ2dnZ2dnZ2dnZ2dnZ2dnZ2dnZ2dnZ2dnZ2dnZ2dnZ2dnZ2dnZ2dnZ2dnZ2dnZ2dnZ2dnZ2dnZ2dnZ2dnZ2dnZ2dnZ2dnZ2dnZ2dnZ2dnZ2dnZ2dnZ2dnZ2dnZ2dnZ2dnZ2dnZ2dnZ2dnZ2dnZ2dnZ2dnZ2dnZ2dnZ2dnZ2dnZ2dnZ2dnZ2dnZ2dnZ2dnZ2dnZ2dnZ2dnZ2dnZ2dnZ2dnZ2dnZ2dnZ2dnZ2dnZ2dnZ2cWAWZkAhAPZBYCZg9kFgICAQ9kFgQCAg9kFgQCAQ8QZGQWAGQCAw8PFgIeCkJ1dHRvbk5hbWUFBkRlbGV0ZWRkAgYPZBYEAgEPEGRkFgBkAgMPDxYCHwgFBkRlbGV0ZWRkAhMPFgIeCk9uT2tTY3JpcHQFQVZhbGlkYXRlU2F2ZVBvcFVwKCk7X19kb1Bvc3RCYWNrKCdjdGwwMCRwcm9jbnQkYnRuU2F2ZVNlYXJjaCcsJycpZAIUD2QWBAIBDw8WAh4MSGFzVmFsaWRhdG9yZxYCHgZvbkJsdXIFkwFqYXZhc2NyaXB0OmlmKGRvY3VtZW50LmdldEVsZW1lbnRCeUlkKCdjdGwwMF9wcm9jbnRfbGJsRXJyb1NhdmUnKSE9bnVsbCl7IGRvY3VtZW50LmdldEVsZW1lbnRCeUlkKCdjdGwwMF9wcm9jbnRfbGJsRXJyb1NhdmUnKS5zdHlsZS5kaXNwbGF5PSdub25lJ31kAgcPDxYCHg1PbkNsaWVudENsaWNrBZMBamF2YXNjcmlwdDppZihkb2N1bWVudC5nZXRFbGVtZW50QnlJZCgnY3RsMDBfcHJvY250X2xibEVycm9TYXZlJykhPW51bGwpeyBkb2N1bWVudC5nZXRFbGVtZW50QnlJZCgnY3RsMDBfcHJvY250X2xibEVycm9TYXZlJykuc3R5bGUuZGlzcGxheT0nbm9uZSd9ZGQYAQUeX19Db250cm9sc1JlcXVpcmVQb3N0QmFja0tleV9fFgcFFmN0bDAwJHByb2NudCRjaGtPblNpdGUFGmN0bDAwJHByb2NudCRjaGtWZW5kb3JPbmx5BRhjdGwwMCRwcm9jbnQkcmJ3b3JsZHdpZGUFFmN0bDAwJHByb2NudCRyYkNvdW50cnkFFmN0bDAwJHByb2NudCRyYkNvdW50cnkFE2N0bDAwJHByb2NudCRyYkNpdHkFE2N0bDAwJHByb2NudCRyYkNpdHn8aeALBWdIWf7e1179CNtoWvy2eg%3D%3D&ctl00%24procnt%24hdnSearchId=&ctl00%24procnt%24ddlBudgetMin=0&ctl00%24procnt%24ddlBudgetMax=0&ctl00%24procnt%24Location=rbworldwide&ctl00%24procnt%24ddlCountries=Select%20One&ctl00%24procnt%24ddlCities=Select%20One&ctl00%24procnt%24txtKeyWords=Keywords%20or%20Project%20ID&ctl00%24procnt%24hdnKeyWords=&ctl00%24procnt%24ucLeadResults%24hdnSrc=&ctl00%24procnt%24ucLeadResults%24hdnPageNo=&ctl00%24procnt%24ucLeadResults%24hdnSearchResults=&ctl00%24procnt%24ucLeadResults%24hdnProjectID=&ctl00%24procnt%24txtSearchName=&ctl00%24procnt%24txtSearchName_reqd=true&ctl00%24procnt%24hdnSelectedTab=0&ctl00%24procnt%24hdnSelectSvdSearch=&ctl00%24hdnGuid=GUID&__ASYNCPOST=true&

这是我收到的回复的屏幕截图。

enter image description here

2 个答案:

答案 0 :(得分:1)

<强>更新

他们显然使用JavaScript作为呈现页面所需的功能,或者防止僵尸程序。您提供的HTTP客户端也必须能够解释JavaScript。

原始回复

您正在为每个请求发送不同的ASP.Net会话cookie。

据推测,Guru.com在返回数据之前会检查有效的,经过身份验证的会话。

您必须首先以编程方式登录网站并获取有效的会话ID,然后在后续的Ajax请求中使用该ID。

答案 1 :(得分:-2)

有问题的网站正在跟踪POST变量的位置。我只是根据firebug(或Live Http Headers)重新安排它,然后砰然一声。这里有数据。我的最终代码是(查看$ postvars变量)

$file_contents = parent::fetchPage($url);
        $string = tidy_repair_string($file_contents);
        $xmldoc = new DOMDocument();
        $xmldoc->loadHTML($string);
        $xpathvar = new Domxpath($xmldoc);

        $titleResult = $xpathvar->query('//input[@name="__VIEWSTATE"]');
        $postvars = 'ctl00%24scriptMgr=ctl00%24scriptMgr%7Cctl00%24scriptMgr&__EVENTTARGET=ctl00%24scriptMgr&__EVENTARGUMENT=page%3D1%26budget%3D1000-30000%26cid%3D800%26sort%3DPostedDateDescending&ctl00_scriptMgr_HiddenField=&__LASTFOCUS=&'.'__VIEWSTATE='.urlencode($titleResult->item(0)->attributes->getNamedItem('value')->textContent)."&".'ctl00%24procnt%24hdnSearchId=&ctl00%24procnt%24ddlBudgetMin=0&ctl00%24procnt%24ddlBudgetMax=0&ctl00%24procnt%24Location=rbworldwide&ctl00%24procnt%24ddlCountries=Select%20One&ctl00%24procnt%24ddlCities=Select%20One&ctl00%24procnt%24txtKeyWords=Keywords%20or%20Project%20ID&ctl00%24procnt%24hdnKeyWords=&ctl00%24procnt%24ucLeadResults%24hdnSrc=&ctl00%24procnt%24ucLeadResults%24hdnPageNo=&ctl00%24procnt%24ucLeadResults%24hdnSearchResults=&ctl00%24procnt%24ucLeadResults%24hdnProjectID=&ctl00%24procnt%24txtSearchName=&ctl00%24procnt%24txtSearchName_reqd=true&ctl00%24procnt%24hdnSelectedTab=0&ctl00%24procnt%24hdnSelectSvdSearch=&ctl00%24hdnGuid=GUID&__ASYNCPOST=true&';
        parse_str($postvars, $query);

        print_r($query);

        $ch = curl_init("http://www.guru.com/pro/search.aspx");
        curl_setopt($ch, CURLOPT_POST      ,1);
        curl_setopt($ch, CURLOPT_POSTFIELDS    ,$postvars);
        curl_setopt($ch, CURLOPT_FOLLOWLOCATION  ,1); 
        curl_setopt($ch, CURLOPT_HEADER      ,0);  // DO NOT RETURN HTTP HEADERS 
        curl_setopt($ch, CURLOPT_RETURNTRANSFER  ,1);  // RETURN THE CONTENTS OF THE CALL
        curl_setopt($ch, CURLOPT_HTTP_VERSION, CURL_HTTP_VERSION_1_1);
        curl_setopt($ch, CURLOPT_AUTOREFERER, TRUE);
        curl_setopt($ch, CURLOPT_ENCODING, 'gzip,deflate');         
        curl_setopt($ch, CURLOPT_HTTPHEADER, array('Host: www.guru.com',
            'User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.7; rv:14.0) Gecko/20100101 Firefox/14.0.1',
            'Accept: text/plain, text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
            'Accept-Language: en-us,en;q=0.5',
            'Accept-Encoding: gzip, deflate',
            'Connection: keep-alive',
            'X-MicrosoftAjax: Delta=true',
            'Cache-Control: no-cache, no-cache',
            'Content-Type: application/x-www-form-urlencoded; charset=utf-8',
            'Referer: http://www.guru.com/pro/search.aspx?',
            'Content-Length: 15818',
        ));
        echo $Rec_Data = curl_exec($ch);
        $Rec_Data = gzinflate(substr($Rec_Data, 10, -8));