scrapy登录无法正常工作

时间:2017-08-22 09:23:37

标签: python scrapy

我正在尝试登录,但它甚至没有输入表单数据。 这是我的代码的登录部分:

start_urls = ['https://stmforum.com/amember/login.php']

def parse(self, response):
        return FormRequest(url="https://stmforum.com/amember/protect/new-rewrite?f=2&url=/forum/forum.php&host=stmforum.com&ssl=on",
                                formdata={'amember_login': 'username','amember_pass':'password'},
                                callback=self.after_login)

    def after_login(self,response):
       if "incorrect" in response.body:
               self.logger.error("Login failed")
               return

       elif "Login to your Account" in response.body:
                self.logger.error("Try again")
                return
       else:
                pass

这是网站html代码的一部分:

 <form name="login" method="post" action="/amember/login">
<fieldset>
<legend>Login to your Account</legend>
<div id="recaptcha-row" class="row" style="display: none;" data-recaptcha-theme="light" data-recaptcha-size="normal">
<div class="row">
<div class="element-title">
<div class="element">
<input id="amember-login" name="amember_login" size="15" value="" autofocus="autofocus" placeholder="Username/Email" type="text"/>
</div>
</div>
<div class="row">
<div class="element-title">
<div class="element">
<input id="amember-pass" class="am-pass-reveal" name="amember_pass" size="15" placeholder="Password" type="password"/>
<span class="am-switch-reveal am-switch-reveal-off" title="Toggle Password Visibility"/>
<label id="am-form-login-remember" class="element-title" for="remember_login">
</div>
</div>
<div class="row">
</fieldset>
<input name="login_attempt_id" value="1503392293" type="hidden"/>
<input name="amember_redirect_url" value="https://stmforum.com/forum/forum.php" type="hidden"/>

爬虫的结果:

[seeker] ERROR: Try again
[scrapy.core.engine] INFO: Closing spider (finished)

它转到after_login elif语句,这意味着它没有更改页面。基本上说它甚至没有在表单中输入数据,也没有点击登录。 我试图输入formdata“用户名”和“密码”我也尝试将id“amember-login”和“amember-pass”。 还尝试将clickdata = {'submit':'commit') 还尝试了FormRequest.from_response

使用硒,效果很好。 我想用scrapy实现selenium但它在服务器上不起作用。

有人可以帮助我吗?

更新:

 start_urls = ['https://stmforum.com/amember/login.php']

    def parse(self, response):
        return FormRequest.from_response(response,
                                formdata={'amember_login':'user','amember_pass':'pass'},
                                callback=self.after_login)

    def after_login(self,response):
       if "incorrect" in response.body:
               self.logger.error("Login failed")
               return

       elif "Login to your Account" in response.body:
                self.logger.error("Try again")
                return
       else:
                return FormRequest(url="https://stmforum.com/forum/",
                                        formdata={'query': 'AdCombo'},
                                        callback=self.parse_page)

我得到的回应:

[scrapy.core.engine] DEBUG: Crawled (200) <GET https://stmforum.com/amember/login.php> (referer: None)
[scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (302) to <GET https://stmforum.com/amember/member> from <POST https://stmforum.com/amember/login>
[scrapy.core.engine] DEBUG: Crawled (200) <GET https://stmforum.com/amember/member> (referer: https://stmforum.com/amember/login.php)
[seeker] ERROR: Try again
[scrapy.core.engine] INFO: Closing spider (finished)

每次发出请求的'login_attempt_id'都会发生变化...我如何在formdata中实现该隐藏值。或者我还能做什么?

2 个答案:

答案 0 :(得分:1)

我发现我在scrapy设置中禁用了cookie。现在它工作正常。 非常感谢你

答案 1 :(得分:0)

我检查网站,你的帖子是错误的网址

def parse(self, response):
        return FormRequest.from_response(response, 
                                formdata={'amember_login': 'username','amember_pass':'password'},
                                callback=self.after_login)

原因是需要发送其他隐藏变量。这就是您需要使用from_response

的原因

Login hidden variables