Question

我正在尝试使用The Guardian的高级搜索表单来搜索某些关键字的结果。

from scrapy.spider import Spider
from scrapy.http import FormRequest, Request
from scrapy.selector import HtmlXPathSelector

class IndependentSpider(Spider):
    name = "IndependentSpider"
    start_urls= ["http://www.independent.co.uk/advancedsearch"]

    def parse(self, response):
        yield [FormRequest.from_response(response, formdata={"all": "Science"}, callback=self.parse_results)]

    def parse_results(self):
        hxs = HtmlXPathSelector(response)
        print hxs.select('//h3').extract()

表单将我重定向到

DEBUG: Redirecting (301) to <GET http://www.independent.co.uk/ind/advancedsearch/> from <GET http://www.independent.co.uk/advancedsearch>

这是一个似乎不存在的页面。

你知道我做错了吗？

谢谢！

Answer 1

您似乎需要一个尾随/。

尝试start_urls= ["http://www.independent.co.uk/advancedsearch/"]

Scrapy搜索表单遵循不存在的页面

1 个答案: