Scrapy FormRequest重定向不需要链接

时间:2016-10-22 05:28:50

标签: scrapy

我遵循了基本的Scrapy登录。它总是有效,但在这种情况下,我遇到了一些问题。 FormRequest.from_response没有请求 https://www.crowdfunder.com/user/validateLogin ,而是始终将有效负载发送到 https://www.crowdfunder.com /用户/注册即可。我尝试直接请求带有效负载的validateLogin,但它以404错误响应。有什么想法可以帮我解决这个问题吗?在此先感谢!!!

Id,Name,CompanyId(FK->Company.Id(nullable))

这是通过在浏览器中单击登录发送的有效负载和请求链接:

Learn Git Branching

以下是日志信息:

class CrowdfunderSpider(InitSpider):
    name = "crowdfunder"
    allowed_domains = ["crowdfunder.com"]
    start_urls = [
        'http://www.crowdfunder.com/',
    ]

    login_page = 'https://www.crowdfunder.com/user/login/'
    payload = {}

    def init_request(self):
        """This function is called before crawling starts."""
        return scrapy.Request(url=self.login_page, callback=self.login)

    def login(self, response):
        """Generate a login request."""
        self.payload = {'email': 'my_email',
                        'password': 'my_password'}

        # scrapy login
        return scrapy.FormRequest.from_response(response, formdata=self.payload, callback=self.check_login_response)

    def check_login_response(self, response):
        """Check the response returned by a login request to see if we are
        successfully logged in.
        """
        if 'https://www.crowdfunder.com/user/settings' == response.url:
            self.log("Successfully logged in. :) :) :)")
            # start the crawling
            return self.initialized()
        else:
            # login fail
            self.log("login failed :( :( :(")

1 个答案:

答案 0 :(得分:1)

默认情况下,

FormRequest.from_response(response)使用它找到的第一个表单。如果您检查页面的哪些格式,请查看:

In : response.xpath("//form")
Out: 
[<Selector xpath='//form' data='<form action="/user/signup" method="post'>,
 <Selector xpath='//form' data='<form action="/user/login" method="POST"'>,
 <Selector xpath='//form' data='<form action="/user/login" method="post"'>]

所以你要找的形式不是第一个。修复它的方法是使用许多from_response方法参数之一来指定要使用的表单。

使用formxpath是最灵活的,也是我个人的最爱:

In : FormRequest.from_response(response, formxpath='//*[contains(@action,"login")]')
Out: <POST https://www.crowdfunder.com/user/login>