我正在尝试使用FormRequest.from_response让Scrapy填充以下HTML表单:
<form class="form-horizontal" method="POST" role="form">
<div class="form-group">
<label class="col-sm-3 control-label" for="inputEmail3"> Username </label>
<div class="col-sm-9">
<input class="form-control" value="" maxlength="32" name="pun" />
</div>
</div>
<div class="form-group">
<label class="col-sm-3 control-label" for="inputEmail3"> Passphrase </label>
<div class="col-sm-9">
<input class="form-control" type="password" value="" maxlength="10000" name="ak" />
</div>
</div>
</form>
</div>
<div align="right">
<input id="send" type="submit" value="Login" name="login" />
</div>
我按照教程here进行了操作,但是那里的字段“ak”和“pun”的代码无效。任何想法或建议?谢谢。 编辑:这是我到目前为止所得到的
class TestSpider(CrawlSpider):
name = "test1"
allowed_domains = ['...']
start_urls = [
'...'
]
rules = {Rule(LinkExtractor(), callback='parse_items', follow=True),}
def parse_items(self, response):
return [FormRequest.from_response(response,
formdata={"pun": '...', "ak": '...'},
callback=self.after_login)]
def after_login(self, link):
# Check login succeed before going on
if "authentication failed" in response.body:
self.log("Login failed", level=log.ERROR)
return
# Crawl contents ...
答案 0 :(得分:1)
我解决了这个问题。所需要的只是写作:
formdata={"pun": '...', "ak": '...', "Login" = 'login'}
然而,我仍然怀疑其背后的原因。有人可以解释一下吗?
答案 1 :(得分:0)
submit
按钮必须位于<form>
标记
尝试这个
<form class="form-horizontal" method="POST" role="form">
<div class="form-group">
<label class="col-sm-3 control-label" for="inputEmail3"> Username </label>
<div class="col-sm-9">
<input class="form-control" value="" maxlength="32" name="pun" />
</div>
</div>
<div class="form-group">
<label class="col-sm-3 control-label" for="inputEmail3"> Passphrase </label>
<div class="col-sm-9">
<input class="form-control" type="password" value="" maxlength="10000" name="ak" />
</div>
</div>
<div align="right">
<input id="send" type="submit" value="Login" name="login" />
</div>
</form>