如何抓取这个ajax网站并取得结果

时间:2015-08-02 23:17:27

标签: jquery python ajax python-2.7 scrapy

有这个网站

http://www.dubailand.gov.ae/English/pages/Daily-Transactions.aspx

如果您选择日期并点击搜索,然后转到每日单位交易,您将获得结果。

我想从scrapy获得结果。换句话说,从scrapy做ajax调用。

我这样做了:

from scrapy.spider import Spider
from scrapy.http import FormRequest
from scrapy.selector import Selector
from scrapy.http import Request
from scrapy.shell import inspect_response
class MySpider(Spider):
    name = 'MySpider'
    start_urls = ['http://www.dubailand.gov.ae/English/pages/Daily-Transactions.aspx']
    def start_requests(self):
        for url in self.start_urls:
            yield self.make_requests_from_url(url)

    def make_requests_from_url(self, url):
        return Request(url, callback=self.parse, meta={'cookiejar': 1})

    def parse (self, response):
          return [FormRequest(url="http://www.dubailand.gov.ae/English/pages/Daily-Transactions.aspx",
                    formdata={'ctl00$ctl69$g_6ffada3a_2cbc_43a0_9034_f48a864a8873$txtDate': '30/07/2015', 'id' : 'ctl00_ctl69_g_6ffada3a_2cbc_43a0_9034_f48a864a8873_ntbSearch'},
                    callback=self.page_parse,
                    meta={'cookiejar': response.meta['cookiejar']},
                    method='POST', headers = {'X-Requested-With': 'XMLHttpRequest', 'X-MicrosoftAjax' : 'Delta=true',
                    'Accept-Encoding':'gzip, deflate'}
                    )]
    def page_parse(self,response):
        sel = Selector(response)
        import json
        with open('data.html', 'w') as outfile:
            json.dump(response.body, outfile)
        #inspect_response(response)
        pass

我启用了cookie,并进行了ajax调用。我使用谷歌浏览器f12检查了请求的标题,我将这些参数添加到标题中。

但是,在点击搜索按钮之前,结果是主页

我做错了什么?

1 个答案:

答案 0 :(得分:2)

我相信你缺少一些额外的必需标题和表格参数。

如果您转到Chrome的网络标签,请右键单击Daily-Transactions.aspx请求,然后选择“复制为cURL”,您将获得以下命令(我添加了新行以便于阅读):

curl
    "http://www.dubailand.gov.ae/English/pages/Daily-Transactions.aspx"
    -H "Cookie: _gat=1; ReadSpeakerSettings=enlarge=enlargeoff; _ga=GA1.3.469029184.1438564062; __zlcmid=W2eqfhsJvJwJFB; WSS_FullScreenMode=false"
    -H "Origin: http://www.dubailand.gov.ae"
    -H "Accept-Encoding: gzip, deflate"
    -H "Accept-Language: en-US,en;q=0.8"
    -H "User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/43.0.2357.134 Safari/537.36"
    -H "Content-Type: application/x-www-form-urlencoded; charset=UTF-8"
    -H "Accept: */*"
    -H "Cache-Control: no-cache"
    -H "X-Requested-With: XMLHttpRequest"
    -H "Connection: keep-alive"
    -H "X-MicrosoftAjax: Delta=true"
    -H "Referer: http://www.dubailand.gov.ae/English/pages/Daily-Transactions.aspx"
    --data "ctl00"%"24ScriptManager=ctl00"%"24ctl69"%"24g_6ffada3a_2cbc_43a0_9034_f48a864a8873"%"24pnlUpdate"%"7Cctl00"%"24ctl69"%"24g_6ffada3a_2cbc_43a0_9034_f48a864a8873"%"24ntbSearch&_wpcmWpid=&wpcmVal=&MSOWebPartPage_PostbackSource=&MSOTlPn_SelectedWpId=&MSOTlPn_View=0&MSOTlPn_ShowSettings=False&MSOGallery_SelectedLibrary=&MSOGallery_FilterString=&MSOTlPn_Button=none&__EVENTTARGET=&__EVENTARGUMENT=&__REQUESTDIGEST=0x308F37DAA08525D735F224FEC60294E8FAF5A7EA8995A9C50DAAF8BF58C40A1C4DB1B09B26C2ED5C6EB93603D43549D8FD5590FBC4FCCAB1CC3D1AEA5DA431D8"%"2C03"%"20Aug"%"202015"%"2001"%"3A09"%"3A24"%"20-0000&MSOSPWebPartManager_DisplayModeName=Browse&MSOSPWebPartManager_ExitingDesignMode=false&MSOWebPartPage_Shared=&MSOLayout_LayoutChanges=&MSOLayout_InDesignMode=&_wpSelected=&_wzSelected=&MSOSPWebPartManager_OldDisplayModeName=Browse&MSOSPWebPartManager_StartWebPartEditingName=false&MSOSPWebPartManager_EndWebPartEditing=false&__LASTFOCUS=&_maintainWorkspaceScrollPosition=0&__VIEWSTATE="%"2FwEPDwUBMA9kFgJmD2QWAgIBD2QWBAIBD2QWBAIID2QWAmYPZBYCAgMPFgIeE1ByZXZpb3VzQ29udHJvbE1vZGULKYgBTWljcm9zb2Z0LlNoYXJlUG9pbnQuV2ViQ29udHJvbHMuU1BDb250cm9sTW9kZSwgTWljcm9zb2Z0LlNoYXJlUG9pbnQsIFZlcnNpb249MTUuMC4wLjAsIEN1bHR1cmU9bmV1dHJhbCwgUHVibGljS2V5VG9rZW49NzFlOWJjZTExMWU5NDI5YwFkAh8PZBYCAgMPZBYCZg9kFgJmDzwrAAYAZAIJD2QWGAIBD2QWAgIBD2QWBAUmZ185MDIyOTE0ZV84NDBmXzQ3NmNfODNhOF81M2FkNGE1MWI0YzMPZBYCZg9kFgICAQ8UKwACDxYEHgtfIURhdGFCb3VuZGceC18hSXRlbUNvdW50AgFkZBYCZg9kFgRmDxUBDFRyYW5zYWN0aW9uc2QCAQ8UKwACDxYEHwFnHwICA2RkFgZmD2QWAmYPFQJBaHR0cDovL3d3dy5kdWJhaWxhbmQuZ292LmFlL0VuZ2xpc2gvcGFnZXMvRGFpbHktVHJhbnNhY3Rpb25zLmFzcHgSRGFpbHkgVHJhbnNhY3Rpb25zZAIBD2QWAmYPFQJDaHR0cDovL3d3dy5kdWJhaWxhbmQuZ292LmFlL0VuZ2xpc2gvUGFnZXMvTW9udGhseS1UcmFuc2FjdGlvbnMuYXNweBRNb250aGx5IFRyYW5zYWN0aW9uc2QCAg9kFgJmDxUCQmh0dHA6Ly93d3cuZHViYWlsYW5kLmdvdi5hZS9FbmdsaXNoL1BhZ2VzL0FubnVhbC1UcmFuc2FjdGlvbnMuYXNweBNBbm51YWwgVHJhbnNhY3Rpb25zZAUmZ182ZmZhZGEzYV8yY2JjXzQzYTBfOTAzNF9mNDhhODY0YTg4NzMPZBYCAgEPZBYCZg9kFh4CAQ8PFgIeBFRleHQFD1Byb2NlZHVyZSBUeXBlOmRkAgMPEA8WBh4ORGF0YVZhbHVlRmllbGQFC1Byb2NlZHVyZUlEHg1EYXRhVGV4dEZpZWxkBRRQcm9jZWR1cmVFbmdsaXNoTmFtZR8BZ2QQFQIVTW9ydGdhZ2UgUmVnaXN0cmF0aW9uBFNlbGwVAgIxMwIxMRQrAwJnZxYBAgFkAgUPDxYCHwMFBERhdGVkZAIJDw8WAh4MRXJyb3JNZXNzYWdlBSRJbnZhbGlkIGRhdGUgZm9ybWF0IChpZS4gZGQvbW0veXl5eSlkZAILDw8WAh8DBQZTZWFyY2hkZAIPDw8WAh8DBRJEYWlseSBUcmFuc2FjdGlvbnNkZAIRDw8WAh8DBRdEYWlseSBsYW5kIHRyYW5zYWN0aW9uc2RkAhMPPCsAEQMADxYEHwFnHwJmZAEQFgVmAgECAgIDAgQWBTwrAAUBABYCHgpIZWFkZXJUZXh0BQZSZWdpb248KwAFAQAWAh8HBQRBcmVhPCsABQEAFgIfBwULRGVzY3JpcHRpb248KwAFAQAWAh8HBQ5TcS5NZXRlciBQcmljZTwrAAUBABYCHwcFC1RvdGFsIFdvcnRoFgVmZmZmZgwUKwAAZAIVDw8WBB8DBQtUb3RhbCA6MC4wMB4HVmlzaWJsZWhkZAIXDw8WAh8DBRdEYWlseSBVbml0IHRyYW5zYWN0aW9uc2RkAhkPPCsAEQMADxYEHwFnHwJmZAEQFgRmAgECAgIDFgQ8KwAFAQAWAh8HBQZSZWdpb248KwAFAQAWAh8HBQRBcmVhPCsABQEAFgIfBwULRGVzY3JpcHRpb248KwAFAQAWAh8HBQtUb3RhbCBXb3J0aBYEZmZmZgwUKwAAZAIbDw8WBB8DBQtUb3RhbCA6MC4wMB8IaGRkAh0PDxYCHwMFGERhaWx5IFZpbGxhIHRyYW5zYWN0aW9uc2RkAh8PPCsAEQMADxYEHwFnHwJmZAEQFgRmAgECAgIDFgQ8KwAFAQAWAh8HBQZSZWdpb248KwAFAQAWAh8HBQRBcmVhPCsABQEAFgIfBwULRGVzY3JpcHRpb248KwAFAQAWAh8HBQtUb3RhbCBXb3J0aBYEZmZmZgwUKwAAZAIhDw8WBB8DBQtUb3RhbCA6MC4wMB8IaGRkAhUPZBYCAgEPZBYCZg8PFgIfCGhkZAIXD2QWAgIDDxYCHwhoFgJmD2QWBAICD2QWBgIBDxYCHwhoZAIDDxYCHwhoZAIFDxYCHwhoZAIDDw8WAh4JQWNjZXNzS2V5BQEvZGQCGQ9kFgICAw9kFgICAQ9kFgQCAw9kFgJmDw8WBB4EXyFTQgICHghDc3NDbGFzcwUXbXMtcHJvbW90ZWRBY3Rpb25CdXR0b25kZAIFDw8WBh8IaB8KAgIfCwUXbXMtcHJvbW90ZWRBY3Rpb25CdXR0b25kZAIhD2QWAgIBDw8WBB4LTmF2aWdhdGVVcmwFCC9FbmdsaXNoHgdUb29sVGlwBQRIb21lZGQCJQ9kFgJmD2QWBAIBDw8WBB8DBQVMb2dpbh8MBUBqYXZhc2NyaXB0Ok9wZW5Qb3B1cFdpbmRvdygnL0VuZ2xpc2gvUGFnZXMvTG9naW4uYXNweD9pc2RsZz0xJyk7ZGQCAw9kFgYCAw8PFgIfDAVIamF2YXNjcmlwdDpPcGVuUG9wdXBXaW5kb3coJy9FbmdsaXNoL1BhZ2VzL0NoYW5nZVBhc3N3b3JkLmFzcHg"%"2FaXNkbGc9MScpZBYCAgEPFgIfAwUPQ2hhbmdlIFBhc3N3b3JkZAIFDw8WAh8MBUpqYXZhc2NyaXB0Ok9wZW5Qb3B1cFdpbmRvdygnL0VuZ2xpc2gvUGFnZXMvUGFzc3dvcmRSZWNvdmVyeS5hc3B4P2lzZGxnPTEnKWQWAgIBDxYCHwMFEVBhc3N3b3JkIFJlY292ZXJ5ZAIHD2QWAgIBDxYCHwMFBkxvZ291dGQCJw9kFgJmDw8WAh8IaGQWBAIBDw8WBB8DBQVMb2dpbh8MBUxqYXZhc2NyaXB0Ok9wZW5Qb3B1cFdpbmRvdygnL0VuZ2xpc2gvRFJFSS9QYWdlcy9TdHVkZW50TG9naW4uYXNweD9pc2RsZz0xJyk7ZGQCAw9kFgQCAw8WAh8DBQxVc2VyIFByb2ZpbGVkAgUPZBYCAgEPFgIfAwUGTG9nb3V0ZAIrD2QWDGYPDxYCHwMFD0xhdGVzdCBEZWFscyAgPmRkAgEPFgIfAwUfTGF0ZXN0IERlYWxzIG5vdCBhdmFpbGFibGUgbm93LmQCAg8PFgIfDAUpaHR0cDovL3d3dy5kdWJhaS5hZS9lbi9wYWdlcy9kZWZhdWx0LmFzcHhkZAIDDw8WAh8MBQgvRW5nbGlzaGRkAgQPDxYCHwwFvQFodHRwczovL2FwcC5yZWFkc3BlYWtlci5jb20vY2dpLWJpbi9yc2VudD9jdXN0b21lcmlkPTc4MDYmYW1wO2xhbmc9ZW5fdXMmYW1wO3ZvaWNlPWVuX3VzX2pvZXkmYW1wO3JlYWRpZD1jb250ZW50Um93JmFtcDt1cmw9aHR0cDovL3d3dy5kdWJhaWxhbmQuZ292LmFlL0VuZ2xpc2gvcGFnZXMvRGFpbHktVHJhbnNhY3Rpb25zLmFzcHhkFgICAQ8PFgIfAwUGTGlzdGVuZGQCBQ8PFgQfAwUI2LnYsdio2YofDAVAaHR0cDovL3d3dy5kdWJhaWxhbmQuZ292LmFlL0FyYWJpYy9wYWdlcy9EYWlseS1UcmFuc2FjdGlvbnMuYXNweGRkAi0PZBYEZg8PFgIfDAWDAWh0dHA6Ly85NC41Ni40Ni40MS9FcmVzU3VydmV5LkNsaWVudC5TaXRlL1N1cnZleVBhZ2VzL1N1cnZleT81djFtZzZETTY2YkJrajVMRFBiUTdhdU5hWCUyZklaOGMlMmJSNWZuNjB3MyUyYmRsR1B6emZCMU85SkNJRURNbFNqd3UwZBYCZg8PFgIeCEltYWdlVXJsBScvU3R5bGUgTGlicmFyeS9JbWFnZXMvRW5nbGlzaFN1cnZleS5qcGdkZAIBD2QWBgIDDxYCHwMFM0hvdyBkbyB5b3UgcmF0ZSB0aGUgb3ZlcmFsbCBkZXNpZ24gb2YgdGhpcyB3ZWJzaXRlP2QCBQ9kFgJmD2QWBAIBD2QWBgIBDxAPFgIfAWdkEBUFCUV4Y2VsbGVudAlWZXJ5IEdvb2QER29vZApBY2NlcHRhYmxlBFBvb3IVBQlFeGNlbGxlbnQJVmVyeSBHb29kBEdvb2QKQWNjZXB0YWJsZQRQb29yFCsDBWdnZ2dnZGQCAw8PFgIfAwUEVm90ZWRkAgUPDxYCHwMFBlJlc3VsdGRkAgMPZBYCAgEPFCsAAmRkZAIHDw8WAh8DBQRQb2xsZGQCMQ9kFgICAQ9kFgICAQ88KwAFAQAPFgIeFVBhcmVudExldmVsc0Rpc3BsYXllZGZkZAI1D2QWAgIBD2QWAgINDw8WAh8IaGQWAgIBD2QWAmYPZBYCAgMPZBYCAgUPZBYCAgEPPCsACQEADxYEHg1QYXRoU2VwYXJhdG9yBAgeDU5ldmVyRXhwYW5kZWRnZGQCWw9kFhRmDw8WBB8DBQpDb250YWN0IFVTHwwFHi9FbmdsaXNoL1BhZ2VzL0NvbnRhY3QtVXMuYXNweGRkAgEPDxYEHwMFA0ZBUR8MBRcvRW5nbGlzaC9QYWdlcy9GQVEuYXNweGRkAgIPDxYEHwMFB1NpdGVtYXAfDAUbL0VuZ2xpc2gvcGFnZXMvc2l0ZW1hcC5hc3B4ZGQCAw8PFgQfAwUOUHJpdmFjeSBQb2xpY3kfDAUkL0VuZ2xpc2gvcGFnZXMvcHJpdmFjeXN0YXRlbWVudC5hc3B4ZGQCBA8PFgQfAwUSVGVybXMgJiBDb25kaXRpb25zHwwFJi9FbmdsaXNoL1BhZ2VzL1Rlcm1zQW5kQ29uZGl0aW9ucy5hc3B4ZGQCBQ8PFgIfDAUdaHR0cHM6Ly9lam9iLmR1YmFpLmdvdi5hZS9ETERkZAIGDw8WAh8OBSYvc3R5bGUgbGlicmFyeS9JbWFnZXMvY2FsbENlbnRlcmVuLmpwZ2RkAggPDxYCHwMFjwFGb3IgYmVzdCB2aWV3IG9mIHRoZSB3ZWJzaXRlLCBzY3JlZW4gcmVzb2x1dGlvbiBtdXN0IGJlIDEwMjR4NzY4IA0KU3VwcG9ydHMgTWljcm9zb2Z0IEludGVybmV0IEV4cGxvcmVyIDkuMCssIEZpcmVmb3ggMi4wKywgR29vZ2xlIENocm9tZSAxMi4wK2RkAgkPDxYCHwMFKkxhbmQgRGVwYXJ0bWVudCBBbGwgUmlnaHQgUmVzZXJ2ZWQgwqkgMjAxNGRkAgoPDxYCHwMFHFNpdGUgTGFzdCB1cGRhdGVkIDAyLzA4LzIwMTVkZBgGBTZjdGwwMCRnX2Q3MWZkMjFiXzY3YzlfNDdlMl84NWE1X2Y5ZjQ3ZDk3MTViOCRsdlJlc3VsdHMPZ2QFQ2N0bDAwJGN0bDY5JGdfOTAyMjkxNGVfODQwZl80NzZjXzgzYThfNTNhZDRhNTFiNGMzJGx2TWFpbkNhdGVnb3JpZXMPFCsADmRkZGRkZGQUKwABZAIBZGRkZgL"%"2F"%"2F"%"2F"%"2F"%"2FD2QFPmN0bDAwJGN0bDY5JGdfNmZmYWRhM2FfMmNiY180M2EwXzkwMzRfZjQ4YTg2NGE4ODczJGdyZFZpZXdVbml0DzwrAAwBCGZkBT9jdGwwMCRjdGw2OSRnXzZmZmFkYTNhXzJjYmNfNDNhMF85MDM0X2Y0OGE4NjRhODg3MyRncmRWaWV3QmxkbmcPPCsADAEIZmQFUWN0bDAwJGN0bDY5JGdfOTAyMjkxNGVfODQwZl80NzZjXzgzYThfNTNhZDRhNTFiNGMzJGx2TWFpbkNhdGVnb3JpZXMkY3RybDAkbHZMaW5rcw8UKwAOZGRkZGRkZBQrAANkZGQCA2RkZGYC"%"2F"%"2F"%"2F"%"2F"%"2Fw9kBT5jdGwwMCRjdGw2OSRnXzZmZmFkYTNhXzJjYmNfNDNhMF85MDM0X2Y0OGE4NjRhODg3MyRncmRWaWV3TGFuZA88KwAMAQhmZA3Q4oZ3a"%"2B92Cd50AyWMNB"%"2BzZ"%"2Bx8Od"%"2FIdEKgSKHDgVTC&__VIEWSTATEGENERATOR=BAB98CB3&__SCROLLPOSITIONX=0&__SCROLLPOSITIONY=0&__EVENTVALIDATION="%"2FwEdABCCxEPo2yov8gXTDMAr0HjTMTZk9qLGncDjImqxppn1M"%"2F7gS7wH"%"2BvpamDGfzy44XjyxkHzOu"%"2FxxyB7As77WBg8gEdMiJi30Yb"%"2FSJbq8IdHRHbd"%"2BxSrwqnSzovEFhJ6JjNMPhWvp97LZaEb1vZ0oO794R4wg9ONQreMC"%"2B8UcJgrQpzPbDgjSZtU9BtyCPpwiGzX87ucANZWdjHKJz"%"2FK3OhlwlY0P5F8PFJvhxERQ"%"2Bz36ey293w7ajGDIYPfB9mELN3R6jWtt8"%"2Ft9xwUMGjMceqnsbVLrcbnqRUQF149eGC5OSkZ3vsVDiMzw"%"2BNbVY6xij"%"2FIXiqU9Dl4X6SIpJKVjRYzpK8uLqTW"%"2BY9ph7yJUlgmkbIBXgM9IDgmfbKUbP8pZlSk"%"3D&ctl00"%"24g_d71fd21b_67c9_47e2_85a5_f9f47d9715b8"%"24hdnPollID=1&ctl00"%"24ctl69"%"24g_6ffada3a_2cbc_43a0_9034_f48a864a8873"%"24ddlProcedure=11&ctl00"%"24ctl69"%"24g_6ffada3a_2cbc_43a0_9034_f48a864a8873"%"24txtDate=10"%"2F01"%"2F2015&__ASYNCPOST=true&ctl00"%"24ctl69"%"24g_6ffada3a_2cbc_43a0_9034_f48a864a8873"%"24ntbSearch=Search"
    --compressed

如果我运行这个curl命令,我会得到正确的响应,如果我改变了这一点:

txtDate=10"%"2F01"%"2F2015

要:

txtDate=11"%"2F01"%"2F2015

然后我得到第二天的数据。

由于他们使用的是ASP.net并且它不是其他人想要与你集成的公共API,因此会复制网站作为AJAX请求的一部分发送的所有狡猾/奇怪的参数。首先,ASP.net使用这种“视图状态”的想法,其中整个会话存储在客户端并包含在每个请求中。这就是表单数据如此庞大和令人讨厌的原因。

另外需要注意的是,如果您的请求突然停止工作,那么他们是否会在会话中设置过期。您可能需要向/Daily-Transactions.aspx发出新的GET请求才能获得新会话。

尝试将所有标题和形式参数添加到您的scrapy请求中,它应该可以正常工作。

最后一点,看起来您在json方法中使用page_parse,但此调用返回的数据不是JSON。它是一个“部分”HTML文档,因此您需要为正在查找的数据解析该HTML。