如何从ajax的combobox获取值和文本字段?

时间:2014-01-18 22:11:22

标签: python scrapy ajax-request

我试图通过使用scrapy为m-ati.su编写解析器。在第一步,我必须从组合框中获取值和文本字段,其名称为" From"和" To"对于不同的城市。我看了萤火虫的请求并写了

class spider(BaseSpider):
    name = 'ati_su'
    start_urls = ['http://m-ati.su/Tables/Default.aspx?EntityType=Load']
    allowed_domains = ["m-ati.su"]

    def parse(self, response):
        yield FormRequest('http://m-ati.su/Services/ATIGeoService.asmx/GetGeoCompletionList', 
                        callback=self.ati_from, 
                        formdata={'prefixText': 'moscow', 'count': '10','contextKey':'All_0$Rus'})
    def ati_from(self, response):
        json = response.body
        open('results.txt', 'wb').write(json)

我有" 500内部服务器错误"对于这个请求。我做错了什么?抱歉英文不好。 感谢

1 个答案:

答案 0 :(得分:0)

我认为您可能需要在POST请求中添加X-Requested-With: XMLHttpRequest标头,因此您可以尝试这样做:

    def parse(self, response):
        yield FormRequest('http://m-ati.su/Services/ATIGeoService.asmx/GetGeoCompletionList', 
                          callback=self.ati_from, 
                          formdata={'prefixText': 'moscow', 'count': '10','contextKey':'All_0$Rus'},
                          headers={"X-Requested-With": "XMLHttpRequest"})

编辑:我试过运行蜘蛛并带来了这个:

(当我用Firefox检查时,请求体是JSON编码所以我使用Request并强制使用“POST”方法,我得到的响应在“windows-1251”中被记录了

from scrapy.spider import BaseSpider
from scrapy.http import Request
import json

class spider(BaseSpider):
    name = 'ati_su'
    start_urls = ['http://m-ati.su/Tables/Default.aspx?EntityType=Load']
    allowed_domains = ["m-ati.su"]

    def parse(self, response):
        yield Request('http://m-ati.su/Services/ATIGeoService.asmx/GetGeoCompletionList',
                      callback=self.ati_from,
                      method="POST",
                      body=json.dumps({
                            'prefixText': 'moscow',
                            'count': '10',
                            'contextKey':'All_0$Rus'
                      }),
                      headers={
                            "X-Requested-With": "XMLHttpRequest",
                            "Accept": "application/json, text/javascript, */*; q=0.01",
                            "Content-Type": "application/json; charset=utf-8",
                            "Pragma": "no-cache",
                            "Cache-Control": "no-cache",
                      })
    def ati_from(self, response):
        jsondata = response.body
        print json.loads(jsondata, encoding="windows-1251")