scrapy form-filling when form posts to a second web page

时间:2015-05-04 19:39:13

标签: python scrapy

New to scrapy and wondering if anyone can point me to a sample project using scrapy to submit to HTML forms that have hidden fields in cases where the action page of the form is not the same address as where the form itself is presented.

What is the easiest way to do this in Scrapy? I can see that you could write two spiders - one first to get the html with the form and pick out all the hidden fields and then a second one to use the info with the hidden fields to submit the form.

I am wondering if there is a 1-step process for this instead (the Scrapy request documentation seems to assume it's all on the same page when it says using FormRequest.from_response will take care of hidden fields). If so, can someone tell me where I can find the steps of the 1 step process?

1 个答案:

答案 0 :(得分:1)

FormRequest扩展了Request个对象。因此,您可以使用formdata获取FormRequest.from_response包含隐藏值的内容,如果需要,可以在此之后更改url

演示伪代码:

class ExampleSpider(scrapy.Spider):
    name = 'example.com'
    start_urls = ['http://www.example.com/FormPage.php']

    def parse(self, response):
        request = scrapy.FormRequest.from_response(
            response,
            callback=self.parse_response_from_Form
        )
        request.replace(url='http://www.other-site.com/')
        return request

    def parse_response_from_Form(self, response):
        # go on here...
        pass