从previous post中获取线索和想法,我试图提出自己的代码。
然而,使用我的代码我发现它并没有真正刮掉任何东西,可能根本不会超出身份验证级别。我这样说是因为即使我输入了错误的密码,我也看不到任何错误日志。
我最好的猜测是,身份验证字段的HTML不包含在"表单"标签,因此formdata可能会忽略它。可能是错的。
我的代码到目前为止:
class LoginSpider(BaseSpider):
name = 'auth1'
start_urls = ['http://www.example.com/administration']
def parse(self, response):
return [FormRequest.from_response(response,
formdata={'employee[email]': 'xyz@abc.com', 'employee[password]': 'XYZ'},
formxpath='//div[@class="form-row"]',
callback=self.after_login)]
def after_login(self, response):
if "authentication failed" in response.body:
self.log("Login failed", level=log.ERROR)
return
# We've successfully authenticated, let's have some fun!
else:
return Request(url="http://www.liveyoursport.com/administration/customers",
callback=self.parse_tastypage)
def parse_tastypage(self, response):
sel = Selector(response)
item = Item()
item ["Test"] = sel.xpath("//h1/text()").extract()
yield item
这是HTML部分:
<div class="content-row">
<div class="special-header-title span_full">
<h3><span class="blue-text">Sign </span>In</h3>
</div>
</div>
<div class="content-row">
<div class="form-section checkout-address-edit span_80" id="sign-in-form" >
<form accept-charset="UTF-8" action="/employees/sign_in" class="new_employee" id="new_employee" method="post"><div style="margin:0;padding:0;display:inline"><input name="utf8" type="hidden" value="✓" /><input name="authenticity_token" type="hidden" value="HQYZa0hNZ2Y+UvtbIk9OxI48Hlsnt+MiYOeV9ql2yWo=" /></div>
<div>
<div class="form-row">
<div class="form-col-1"><label for="employee_email">Email</label></div>
<div class="form-col-2">
<input id="employee_email" name="employee[email]" size="30" type="email" value="" />
</div>
</div>
<div class="form-row">
<div class="form-col-1"><label for="employee_password">Password</label></div>
<div class="form-col-2">
<input id="employee_password" name="employee[password]" size="30" type="password" />
</div>
</div>
</div>
<div class="form-row form-row-controls">
<div class="form-col-1"></div>
<div class="form-col-2">
<input class="sign-in-button f-right" name="commit" type="submit" value="Sign in" />
</div>
</div>
</form> <br>
<a href="/employees/password/new">Forgot your password?</a><br />
<a href="/employees/unlock/new">Didn't receive unlock instructions?</a><br />
</div>
答案 0 :(得分:1)
来自docs:
formxpath(string) - 如果给定,则匹配xpath的第一个表单 将被使用。
但似乎您不匹配form
,而是匹配父div
试试这样:
return [FormRequest.from_response(response,
formdata={'employee[email]': 'xyz@abc.com', 'employee[password]': 'XYZ'},
formxpath='//form[@id="new_employee"]',
callback=self.after_login)]
此外,如果您在网页上只有一个form
元素,则无需定义formxpath
。