我要抓取的网站的登录页面是这样的(表单摘要):
<table style="margin-top: 1em;">
<tbody>
<tr>
<td><label for="loginPage:SiteTemplate:siteLogin:loginComponent:loginForm:username" class="stylizedLabel">
Email</label>
</td>
<td><input id="loginPage:SiteTemplate:siteLogin:loginComponent:loginForm:username"
type="text"
name="loginPage:SiteTemplate:siteLogin:loginComponent:loginForm:username"
value=""
class="stylizedInput1"
aria-describedby="ui-tooltip-0">
</td>
</tr>
<tr>
<td><label for="loginPage:SiteTemplate:siteLogin:loginComponent:loginForm:password" class="stylizedLabel">
Password</label>
</td>
<td><input id="loginPage:SiteTemplate:siteLogin:loginComponent:loginForm:password"
type="password"
name="loginPage:SiteTemplate:siteLogin:loginComponent:loginForm:password"
value="">
</td>
</tr>
<tr>
<td><input id="loginPage:SiteTemplate:siteLogin:loginComponent:loginForm:loginButton"
type="submit"
name="loginPage:SiteTemplate:siteLogin:loginComponent:loginForm:loginButton"
value="Login"
accesskey=""
onclick="javascript:captureResponse(this);"
style="float:left;"
class="button ui-button ui-widget ui-state-default ui-corner-all"
role="button"
aria-disabled="false">
</td>
<td></td>
</tr>
</tbody>
</table>
这是我的蜘蛛登录并抓取该网站:
import scrapy
class ExampleSpider(scrapy.Spider):
name = 'example'
start_urls = ['https://www.example.com']
user_key = 'username'
pwd_key = 'password'
user_value = 'myusername'
pwd_value = 'mypassword'
def parse(self, response):
return scrapy.FormRequest.from_response(
response,
formdata = {self.user_key:self.user_value, self.pwd_key: self.pwd_value},
callback = self.after_login
)
def after_login(self, response):
...
我的登录尝试失败,我想这是因为用户名和密码的密钥在formdata中不正确。然后我尝试了以下但仍然没有好运:
user_key = 'loginPage:SiteTemplate:siteLogin:loginComponent:loginForm:username'
pwd_key = 'loginPage:SiteTemplate:siteLogin:loginComponent:loginForm:password'
我应该在formdata中使用什么密钥?