我遵循了基本的Scrapy登录。它总是有效,但在这种情况下,我遇到了一些问题。 FormRequest.from_response没有请求 https://www.crowdfunder.com/user/validateLogin ,而是始终将有效负载发送到 https://www.crowdfunder.com /用户/注册即可。我尝试直接请求带有效负载的validateLogin,但它以404错误响应。有什么想法可以帮我解决这个问题吗?在此先感谢!!!
Id,Name,CompanyId(FK->Company.Id(nullable))
这是通过在浏览器中单击登录发送的有效负载和请求链接:
以下是日志信息:
class CrowdfunderSpider(InitSpider):
name = "crowdfunder"
allowed_domains = ["crowdfunder.com"]
start_urls = [
'http://www.crowdfunder.com/',
]
login_page = 'https://www.crowdfunder.com/user/login/'
payload = {}
def init_request(self):
"""This function is called before crawling starts."""
return scrapy.Request(url=self.login_page, callback=self.login)
def login(self, response):
"""Generate a login request."""
self.payload = {'email': 'my_email',
'password': 'my_password'}
# scrapy login
return scrapy.FormRequest.from_response(response, formdata=self.payload, callback=self.check_login_response)
def check_login_response(self, response):
"""Check the response returned by a login request to see if we are
successfully logged in.
"""
if 'https://www.crowdfunder.com/user/settings' == response.url:
self.log("Successfully logged in. :) :) :)")
# start the crawling
return self.initialized()
else:
# login fail
self.log("login failed :( :( :(")
答案 0 :(得分:1)
FormRequest.from_response(response)
使用它找到的第一个表单。如果您检查页面的哪些格式,请查看:
In : response.xpath("//form")
Out:
[<Selector xpath='//form' data='<form action="/user/signup" method="post'>,
<Selector xpath='//form' data='<form action="/user/login" method="POST"'>,
<Selector xpath='//form' data='<form action="/user/login" method="post"'>]
所以你要找的形式不是第一个。修复它的方法是使用许多from_response
方法参数之一来指定要使用的表单。
使用formxpath
是最灵活的,也是我个人的最爱:
In : FormRequest.from_response(response, formxpath='//*[contains(@action,"login")]')
Out: <POST https://www.crowdfunder.com/user/login>