我正在尝试根据教程和文档使用scrapy框架来获取提交搜索表单的结果:
<form id="id500" method="post" action="./dbcv?11-1.IFormSubmitListener-resultListView-searchPanel-theForm"><input name="_actiontokenf0fc" id="_actiontokenf0fc" value="f0fc4e66-dcf1-42a6-9c59-039471e75b20" type="hidden"><div style="width:0px;height:0px;position:absolute;left:-100px;top:-100px;overflow:hidden"><input name="id500_hf_0" id="id500_hf_0" type="hidden"></div>
<div class="filter-panel hidden-print">
<fieldset class="no-hoffset-bot">
<div>
<div class="input-addition input-addition-holder">
<label class="label-above" id="id510-w-lbl" for="id510">
<strong>Keyword</strong>
</label>
<br>
<label class="hide" id="id4fd-w-lbl" for="id4fd">e.g. profession, knowledge, skill, CV ID</label>
<input autocomplete="off" class="largew input-flat midw-xxl-2col" value="python" name="primaryFilters:0:filter:textField" id="id4fd" placeholder="e.g. profession, knowledge, skill, CV ID" type="text">
</div>
<div class="input-addition input-addition-holder">
<label class="label-above" id="id511-w-lbl" for="id511">
<strong>Job location</strong>
</label>
<br>
<div class="input-addition-holder">
<input autocomplete="off" class="input-flat midw-xxl" value="" name="primaryFilters:1:filter:textfield" id="id4fe" placeholder="e.g. Brno" type="text">
<span id="id512">
</span>
</div>
</div>
<div class="input-addition input-addition-holder filter-more" id="id513">
<a href="javascript:void(0)" id="id505" class="show">
Other criteria
</a>
</div>
<span class="clear clear-mic"></span>
<div id="id506" class="hide">
<div class="input-addition input-addition-holder choice-panel">
<label class="label-above" id="id514"><strong>Industry</strong></label><br>
<div class="select2-container select2-container-multi midw-xxl-2col" id="s2id_id4ff"><ul class="select2-choices"> <li class="select2-search-field"> <label for="s2id_autogen7" class="select2-offscreen"></label> <input autocomplete="off" autocorrect="off" autocapitalize="off" spellcheck="false" class="select2-input select2-default" id="s2id_autogen7" style="width: 540px;" placeholder="" type="text"> </li></ul><div class="select2-drop select2-drop-multi select2-display-none select2-flat"> <ul class="select2-results"> </ul></div></div><input class="midw-xxl-2col" autocomplete="off" value="" name="additionalFilters:filters:0:filter:input" id="id4ff" tabindex="-1" style="display: none;" type="hidden">
</div>
<div class="input-addition input-addition-holder choice-panel">
<label class="label-above" id="id515"><strong>Type of employment</strong></label><br>
<div class="select2-container midw-xxl" id="s2id_id501"><a href="javascript:void(0)" class="select2-choice" tabindex="-1"> <span class="select2-chosen select2-default" id="select2-chosen-8">e.g. Part-time</span><abbr class="select2-search-choice-close"></abbr> <span class="select2-arrow" role="presentation"><b role="presentation"></b></span></a><label for="s2id_autogen8" class="select2-offscreen"></label><input class="select2-focusser select2-offscreen" aria-haspopup="true" role="button" aria-labelledby="select2-chosen-8" id="s2id_autogen8" type="text"><div class="select2-drop select2-display-none select2-flat"> <div class="select2-search select2-search-hidden select2-offscreen"> <label for="s2id_autogen8_search" class="select2-offscreen"></label> <input autocomplete="off" autocorrect="off" autocapitalize="off" spellcheck="false" class="select2-input" role="combobox" aria-expanded="true" aria-autocomplete="list" aria-owns="select2-results-8" id="s2id_autogen8_search" placeholder="" type="text"> </div> <ul class="select2-results" role="listbox" id="select2-results-8"> </ul></div></div><input class="midw-xxl" autocomplete="off" value="-1" name="additionalFilters:filters:1:filter:input" id="id501" tabindex="-1" title="" style="display: none;" type="hidden">
</div>
<span class="clear clear-mic"></span>
<div class="input-addition input-addition-holder choice-panel">
<label class="label-above" id="id516"><strong>Years of experience</strong></label><br>
<div class="select2-container midw-xxl" id="s2id_id502"><a href="javascript:void(0)" class="select2-choice" tabindex="-1"> <span class="select2-chosen select2-default" id="select2-chosen-9">e.g. 1-5 years experience</span><abbr class="select2-search-choice-close"></abbr> <span class="select2-arrow" role="presentation"><b role="presentation"></b></span></a><label for="s2id_autogen9" class="select2-offscreen"></label><input class="select2-focusser select2-offscreen" aria-haspopup="true" role="button" aria-labelledby="select2-chosen-9" id="s2id_autogen9" type="text"><div class="select2-drop select2-display-none select2-flat"> <div class="select2-search select2-search-hidden select2-offscreen"> <label for="s2id_autogen9_search" class="select2-offscreen"></label> <input autocomplete="off" autocorrect="off" autocapitalize="off" spellcheck="false" class="select2-input" role="combobox" aria-expanded="true" aria-autocomplete="list" aria-owns="select2-results-9" id="s2id_autogen9_search" placeholder="" type="text"> </div> <ul class="select2-results" role="listbox" id="select2-results-9"> </ul></div></div><input class="midw-xxl" autocomplete="off" value="-1" name="additionalFilters:filters:2:filter:input" id="id502" tabindex="-1" title="" style="display: none;" type="hidden">
</div>
<div class="input-addition input-addition-holder g-2-slider-with-label">
<label class="label-above"><strong>Education</strong></label>
<span class="ui-slider-filter-value" id="id4f3">not set</span>
<br>
<div class="input-slider-holder">
<span class="ui-slider-filter-wrapper midw-xxl">
<span class="ui-slider-filter ui-slider ui-slider-horizontal ui-widget ui-widget-content ui-corner-all ui-slider-pips g-2-slider" aria-disabled="false" id="id4f6">
<input value="0" name="additionalFilters:filters:3:filter:slider:model:input" id="id4f5" type="hidden">
<div id="id4f4" class="ui-slider ui-slider-horizontal ui-widget ui-widget-content ui-corner-all ui-slider-pips" aria-disabled="false"><a class="ui-slider-handle ui-state-default ui-corner-all" href="#" style="left: 0%;"></a><span class="ui-slider-pip ui-slider-pip-first ui-slider-pip-0 ui-slider-pip-selected-initial" style="left: 0%"><span class="ui-slider-line"></span><span class="ui-slider-label" data-value="0">0</span></span><span class="ui-slider-pip ui-slider-pip-1" style="left: 25.0000%"><span class="ui-slider-line"></span><span class="ui-slider-label" data-value="1">1</span></span><span class="ui-slider-pip ui-slider-pip-2" style="left: 50.0000%"><span class="ui-slider-line"></span><span class="ui-slider-label" data-value="2">2</span></span><span class="ui-slider-pip ui-slider-pip-3" style="left: 75.0000%"><span class="ui-slider-line"></span><span class="ui-slider-label" data-value="3">3</span></span><span class="ui-slider-pip ui-slider-pip-last ui-slider-pip-4" style="left: 100%"><span class="ui-slider-line"></span><span class="ui-slider-label" data-value="4">4</span></span></div>
</span>
</span>
</div>
</div>
<div class="input-addition input-addition-holder choice-panel">
<label class="label-above" id="id517"><strong>Required salary</strong></label><br>
<div class="select2-container midw-xxl" id="s2id_id503"><a href="javascript:void(0)" class="select2-choice" tabindex="-1"> <span class="select2-chosen select2-default" id="select2-chosen-10">e.g. max. 40 000 CZK</span><abbr class="select2-search-choice-close"></abbr> <span class="select2-arrow" role="presentation"><b role="presentation"></b></span></a><label for="s2id_autogen10" class="select2-offscreen"></label><input class="select2-focusser select2-offscreen" aria-haspopup="true" role="button" aria-labelledby="select2-chosen-10" id="s2id_autogen10" type="text"><div class="select2-drop select2-display-none select2-flat"> <div class="select2-search select2-search-hidden select2-offscreen"> <label for="s2id_autogen10_search" class="select2-offscreen"></label> <input autocomplete="off" autocorrect="off" autocapitalize="off" spellcheck="false" class="select2-input" role="combobox" aria-expanded="true" aria-autocomplete="list" aria-owns="select2-results-10" id="s2id_autogen10_search" placeholder="" type="text"> </div> <ul class="select2-results" role="listbox" id="select2-results-10"> </ul></div></div><input class="midw-xxl" autocomplete="off" value="-1" name="additionalFilters:filters:4:filter:input" id="id503" tabindex="-1" title="" style="display: none;" type="hidden">
</div>
<span class="clear clear-mic"></span>
<div class="input-addition input-addition-holder langs-panel" id="id518">
<div class="input-addition input-addition-holder" id="id596">
<label class="label-form label-above"><strong>Language skills</strong></label><br>
<div class="select2-container input-flat midw-xxl" id="s2id_id594"><a href="javascript:void(0)" class="select2-choice select2-default" tabindex="-1"> <span class="select2-chosen" id="select2-chosen-11">e.g. English</span><abbr class="select2-search-choice-close"></abbr> <span class="select2-arrow" role="presentation"><b role="presentation"></b></span></a><label for="s2id_autogen11" class="select2-offscreen"></label><input class="select2-focusser select2-offscreen" aria-haspopup="true" role="button" aria-labelledby="select2-chosen-11" id="s2id_autogen11" type="text"><div class="select2-drop select2-display-none select2-flat select2-with-searchbox"> <div class="select2-search"> <label for="s2id_autogen11_search" class="select2-offscreen"></label> <input autocomplete="off" autocorrect="off" autocapitalize="off" spellcheck="false" class="select2-input" role="combobox" aria-expanded="true" aria-autocomplete="list" aria-owns="select2-results-11" id="s2id_autogen11_search" placeholder="" type="text"> </div> <ul class="select2-results" role="listbox" id="select2-results-11"> </ul></div></div><input autocomplete="off" class="input-flat midw-xxl" value="" name="additionalFilters:filters:5:filter:langs:0:lang:language" id="id594" placeholder="e.g. English" tabindex="-1" title="" style="display: none;" type="hidden">
</div>
<div id="id597" style="display:none"></div>
<div class="input-addition input-addition-holder filter-more">
</div>
<div id="id51c" style="display:none"></div>
</div>
</div>
<span class="clear clear-large"></span>
</div>
<input class="btn btn-primary btn-large scroll-anchor" name="searchSubmit" id="id507" value="Find CVs" type="submit">
</fieldset>
</div>
</form>
我的蜘蛛看起来像这样:
class CandidatesSpider(scrapy.Spider):
name = '[domain]'
db_url = 'https://[domain]/recruit/dbcv'
allowed_domains = [name]
start_urls = [db_url]
def parse(self, response):
# Create request from automatically pre-populated form and only override login and password
return scrapy.FormRequest.from_response(
response,
formdata={
'signInPanel:login': '[login]',
'signInPanel:password': '[password]'
},
callback=self.submit_profile
)
def submit_profile(self, response):
# Extract the hidden token value
xpath = '//*[starts-with(@id, "_action")]'
token_id = response.xpath(xpath + '/@id').extract_first()
token_value = response.xpath(xpath + '/@value').extract_first()
data = {
'additionalFilters:filters:0:filter:input':'',
'additionalFilters:filters:1:filter:input': '-1',
'additionalFilters:filters:2:filter:input': '-1',
'additionalFilters:filters:3:filter:slider:model:input': '0',
'additionalFilters:filters:4:filter:input': '-1',
'additionalFilters:filters:5:filter:langs:0:lang:language': '',
'id145e_hf_0': '',
'primaryFilters:1:filter:textfield': '',
'searchSubmit': '1',
token_id: token_value,
'primaryFilters:0:filter:textField': '[keyword]'
}
# Submit ideal candidate profile to search after logged in
return scrapy.FormRequest.from_response(
response,
formdata=data,
callback=self.parse_candidates
)
def parse_candidates(self, response):
# Parse the CVs list of candidates
for row in response.xpath('//*[@class="cv-list-overview"]//tbody/tr'):
res = {
'id': row.xpath('td[1]//text()').extract_first()
}
yield res
我以
运行蜘蛛之后scrapy runspider spiders/candidates.py
它在inspect_response(response,self)行显示了外壳,因此我可以检查响应。它不包含任何结果表,因为我可以通过view(response)进行确认-只是带有字段的表单。
我还尝试了
形式的SplashRequest和FormRequestreturn scrapy.FormRequest(url=self.db_url, formdata=data, callback=self.parse_candidates)
也没有结果。