Question

我试图通过使用scrapy为m-ati.su编写解析器。在第一步，我必须从组合框中获取值和文本字段，其名称为＆＃34; From＆＃34;和＆＃34; To＆＃34;对于不同的城市。我看了萤火虫的请求并写了

class spider(BaseSpider):
    name = 'ati_su'
    start_urls = ['http://m-ati.su/Tables/Default.aspx?EntityType=Load']
    allowed_domains = ["m-ati.su"]

    def parse(self, response):
        yield FormRequest('http://m-ati.su/Services/ATIGeoService.asmx/GetGeoCompletionList', 
                        callback=self.ati_from, 
                        formdata={'prefixText': 'moscow', 'count': '10','contextKey':'All_0$Rus'})
    def ati_from(self, response):
        json = response.body
        open('results.txt', 'wb').write(json)

我有＆＃34; 500内部服务器错误＆＃34;对于这个请求。我做错了什么？抱歉英文不好。感谢

Answer 1

我认为您可能需要在POST请求中添加X-Requested-With: XMLHttpRequest标头，因此您可以尝试这样做：

    def parse(self, response):
        yield FormRequest('http://m-ati.su/Services/ATIGeoService.asmx/GetGeoCompletionList', 
                          callback=self.ati_from, 
                          formdata={'prefixText': 'moscow', 'count': '10','contextKey':'All_0$Rus'},
                          headers={"X-Requested-With": "XMLHttpRequest"})

编辑：我试过运行蜘蛛并带来了这个：

（当我用Firefox检查时，请求体是JSON编码所以我使用Request并强制使用“POST”方法，我得到的响应在“windows-1251”中被记录了

from scrapy.spider import BaseSpider
from scrapy.http import Request
import json

class spider(BaseSpider):
    name = 'ati_su'
    start_urls = ['http://m-ati.su/Tables/Default.aspx?EntityType=Load']
    allowed_domains = ["m-ati.su"]

    def parse(self, response):
        yield Request('http://m-ati.su/Services/ATIGeoService.asmx/GetGeoCompletionList',
                      callback=self.ati_from,
                      method="POST",
                      body=json.dumps({
                            'prefixText': 'moscow',
                            'count': '10',
                            'contextKey':'All_0$Rus'
                      }),
                      headers={
                            "X-Requested-With": "XMLHttpRequest",
                            "Accept": "application/json, text/javascript, */*; q=0.01",
                            "Content-Type": "application/json; charset=utf-8",
                            "Pragma": "no-cache",
                            "Cache-Control": "no-cache",
                      })
    def ati_from(self, response):
        jsondata = response.body
        print json.loads(jsondata, encoding="windows-1251")

如何从ajax的combobox获取值和文本字段？

1 个答案: