Scrapy提交表单不返回列表

时间:2019-07-02 22:29:14

标签: python web-scraping scrapy

我正在尝试根据教程和文档使用scrapy框架来获取提交搜索表单的结果:

<form id="id500" method="post" action="./dbcv?11-1.IFormSubmitListener-resultListView-searchPanel-theForm"><input name="_actiontokenf0fc" id="_actiontokenf0fc" value="f0fc4e66-dcf1-42a6-9c59-039471e75b20" type="hidden"><div style="width:0px;height:0px;position:absolute;left:-100px;top:-100px;overflow:hidden"><input name="id500_hf_0" id="id500_hf_0" type="hidden"></div>
        <div class="filter-panel hidden-print">
            <fieldset class="no-hoffset-bot">
                <div>

                        <div class="input-addition input-addition-holder">
                            <label class="label-above" id="id510-w-lbl" for="id510">
                                <strong>Keyword</strong>
                            </label>

                            <br>

    <label class="hide" id="id4fd-w-lbl" for="id4fd">e.g. profession, knowledge, skill, CV ID</label>
    <input autocomplete="off" class="largew input-flat midw-xxl-2col" value="python" name="primaryFilters:0:filter:textField" id="id4fd" placeholder="e.g. profession, knowledge, skill, CV ID" type="text">

                        </div>

                        <div class="input-addition input-addition-holder">
                            <label class="label-above" id="id511-w-lbl" for="id511">
                                <strong>Job location</strong>
                            </label>

                            <br>

    <div class="input-addition-holder">
    <input autocomplete="off" class="input-flat midw-xxl" value="" name="primaryFilters:1:filter:textfield" id="id4fe" placeholder="e.g. Brno" type="text">
        <span id="id512">

        </span>
    </div>

                        </div>

                    <div class="input-addition input-addition-holder filter-more" id="id513">
                        <a href="javascript:void(0)" id="id505" class="show">
                            Other criteria
                        </a>
                    </div>
                    <span class="clear clear-mic"></span>
                    <div id="id506" class="hide">


                            <div class="input-addition input-addition-holder choice-panel">
    <label class="label-above" id="id514"><strong>Industry</strong></label><br>
    <div class="select2-container select2-container-multi midw-xxl-2col" id="s2id_id4ff"><ul class="select2-choices">  <li class="select2-search-field">    <label for="s2id_autogen7" class="select2-offscreen"></label>    <input autocomplete="off" autocorrect="off" autocapitalize="off" spellcheck="false" class="select2-input select2-default" id="s2id_autogen7" style="width: 540px;" placeholder="" type="text">  </li></ul><div class="select2-drop select2-drop-multi select2-display-none select2-flat">   <ul class="select2-results">   </ul></div></div><input class="midw-xxl-2col" autocomplete="off" value="" name="additionalFilters:filters:0:filter:input" id="id4ff" tabindex="-1" style="display: none;" type="hidden">
</div>


                            <div class="input-addition input-addition-holder choice-panel">
    <label class="label-above" id="id515"><strong>Type of employment</strong></label><br>
    <div class="select2-container midw-xxl" id="s2id_id501"><a href="javascript:void(0)" class="select2-choice" tabindex="-1">   <span class="select2-chosen select2-default" id="select2-chosen-8">e.g. Part-time</span><abbr class="select2-search-choice-close"></abbr>   <span class="select2-arrow" role="presentation"><b role="presentation"></b></span></a><label for="s2id_autogen8" class="select2-offscreen"></label><input class="select2-focusser select2-offscreen" aria-haspopup="true" role="button" aria-labelledby="select2-chosen-8" id="s2id_autogen8" type="text"><div class="select2-drop select2-display-none select2-flat">   <div class="select2-search select2-search-hidden select2-offscreen">       <label for="s2id_autogen8_search" class="select2-offscreen"></label>       <input autocomplete="off" autocorrect="off" autocapitalize="off" spellcheck="false" class="select2-input" role="combobox" aria-expanded="true" aria-autocomplete="list" aria-owns="select2-results-8" id="s2id_autogen8_search" placeholder="" type="text">   </div>   <ul class="select2-results" role="listbox" id="select2-results-8">   </ul></div></div><input class="midw-xxl" autocomplete="off" value="-1" name="additionalFilters:filters:1:filter:input" id="id501" tabindex="-1" title="" style="display: none;" type="hidden">
</div>

                            <span class="clear clear-mic"></span>
                            <div class="input-addition input-addition-holder choice-panel">
    <label class="label-above" id="id516"><strong>Years of experience</strong></label><br>
    <div class="select2-container midw-xxl" id="s2id_id502"><a href="javascript:void(0)" class="select2-choice" tabindex="-1">   <span class="select2-chosen select2-default" id="select2-chosen-9">e.g. 1-5 years experience</span><abbr class="select2-search-choice-close"></abbr>   <span class="select2-arrow" role="presentation"><b role="presentation"></b></span></a><label for="s2id_autogen9" class="select2-offscreen"></label><input class="select2-focusser select2-offscreen" aria-haspopup="true" role="button" aria-labelledby="select2-chosen-9" id="s2id_autogen9" type="text"><div class="select2-drop select2-display-none select2-flat">   <div class="select2-search select2-search-hidden select2-offscreen">       <label for="s2id_autogen9_search" class="select2-offscreen"></label>       <input autocomplete="off" autocorrect="off" autocapitalize="off" spellcheck="false" class="select2-input" role="combobox" aria-expanded="true" aria-autocomplete="list" aria-owns="select2-results-9" id="s2id_autogen9_search" placeholder="" type="text">   </div>   <ul class="select2-results" role="listbox" id="select2-results-9">   </ul></div></div><input class="midw-xxl" autocomplete="off" value="-1" name="additionalFilters:filters:2:filter:input" id="id502" tabindex="-1" title="" style="display: none;" type="hidden">
</div>


                            <div class="input-addition input-addition-holder g-2-slider-with-label">
    <label class="label-above"><strong>Education</strong></label>
    <span class="ui-slider-filter-value" id="id4f3">not set</span>
    <br>
    <div class="input-slider-holder">
        <span class="ui-slider-filter-wrapper midw-xxl">
            <span class="ui-slider-filter ui-slider ui-slider-horizontal ui-widget ui-widget-content ui-corner-all ui-slider-pips g-2-slider" aria-disabled="false" id="id4f6">

            <input value="0" name="additionalFilters:filters:3:filter:slider:model:input" id="id4f5" type="hidden">

            <div id="id4f4" class="ui-slider ui-slider-horizontal ui-widget ui-widget-content ui-corner-all ui-slider-pips" aria-disabled="false"><a class="ui-slider-handle ui-state-default ui-corner-all" href="#" style="left: 0%;"></a><span class="ui-slider-pip ui-slider-pip-first ui-slider-pip-0 ui-slider-pip-selected-initial" style="left: 0%"><span class="ui-slider-line"></span><span class="ui-slider-label" data-value="0">0</span></span><span class="ui-slider-pip ui-slider-pip-1" style="left: 25.0000%"><span class="ui-slider-line"></span><span class="ui-slider-label" data-value="1">1</span></span><span class="ui-slider-pip ui-slider-pip-2" style="left: 50.0000%"><span class="ui-slider-line"></span><span class="ui-slider-label" data-value="2">2</span></span><span class="ui-slider-pip ui-slider-pip-3" style="left: 75.0000%"><span class="ui-slider-line"></span><span class="ui-slider-label" data-value="3">3</span></span><span class="ui-slider-pip ui-slider-pip-last ui-slider-pip-4" style="left: 100%"><span class="ui-slider-line"></span><span class="ui-slider-label" data-value="4">4</span></span></div>
        </span>
        </span>
    </div>
</div>


                            <div class="input-addition input-addition-holder choice-panel">
    <label class="label-above" id="id517"><strong>Required salary</strong></label><br>
    <div class="select2-container midw-xxl" id="s2id_id503"><a href="javascript:void(0)" class="select2-choice" tabindex="-1">   <span class="select2-chosen select2-default" id="select2-chosen-10">e.g. max. 40 000 CZK</span><abbr class="select2-search-choice-close"></abbr>   <span class="select2-arrow" role="presentation"><b role="presentation"></b></span></a><label for="s2id_autogen10" class="select2-offscreen"></label><input class="select2-focusser select2-offscreen" aria-haspopup="true" role="button" aria-labelledby="select2-chosen-10" id="s2id_autogen10" type="text"><div class="select2-drop select2-display-none select2-flat">   <div class="select2-search select2-search-hidden select2-offscreen">       <label for="s2id_autogen10_search" class="select2-offscreen"></label>       <input autocomplete="off" autocorrect="off" autocapitalize="off" spellcheck="false" class="select2-input" role="combobox" aria-expanded="true" aria-autocomplete="list" aria-owns="select2-results-10" id="s2id_autogen10_search" placeholder="" type="text">   </div>   <ul class="select2-results" role="listbox" id="select2-results-10">   </ul></div></div><input class="midw-xxl" autocomplete="off" value="-1" name="additionalFilters:filters:4:filter:input" id="id503" tabindex="-1" title="" style="display: none;" type="hidden">
</div>

                            <span class="clear clear-mic"></span>
                            <div class="input-addition input-addition-holder langs-panel" id="id518">


    <div class="input-addition input-addition-holder" id="id596">
        <label class="label-form label-above"><strong>Language skills</strong></label><br>
        <div class="select2-container input-flat midw-xxl" id="s2id_id594"><a href="javascript:void(0)" class="select2-choice select2-default" tabindex="-1">   <span class="select2-chosen" id="select2-chosen-11">e.g. English</span><abbr class="select2-search-choice-close"></abbr>   <span class="select2-arrow" role="presentation"><b role="presentation"></b></span></a><label for="s2id_autogen11" class="select2-offscreen"></label><input class="select2-focusser select2-offscreen" aria-haspopup="true" role="button" aria-labelledby="select2-chosen-11" id="s2id_autogen11" type="text"><div class="select2-drop select2-display-none select2-flat select2-with-searchbox">   <div class="select2-search">       <label for="s2id_autogen11_search" class="select2-offscreen"></label>       <input autocomplete="off" autocorrect="off" autocapitalize="off" spellcheck="false" class="select2-input" role="combobox" aria-expanded="true" aria-autocomplete="list" aria-owns="select2-results-11" id="s2id_autogen11_search" placeholder="" type="text">   </div>   <ul class="select2-results" role="listbox" id="select2-results-11">   </ul></div></div><input autocomplete="off" class="input-flat midw-xxl" value="" name="additionalFilters:filters:5:filter:langs:0:lang:language" id="id594" placeholder="e.g. English" tabindex="-1" title="" style="display: none;" type="hidden">
    </div>

    <div id="id597" style="display:none"></div>


    <div class="input-addition input-addition-holder filter-more">

    </div>
    <div id="id51c" style="display:none"></div>
</div>

                    </div>
                    <span class="clear clear-large"></span>
                </div>
                <input class="btn btn-primary btn-large scroll-anchor" name="searchSubmit" id="id507" value="Find CVs" type="submit">

            </fieldset>
        </div>


    </form>

我的蜘蛛看起来像这样:

class CandidatesSpider(scrapy.Spider):
    name = '[domain]'
    db_url = 'https://[domain]/recruit/dbcv'
    allowed_domains = [name]
    start_urls = [db_url]

    def parse(self, response):
        # Create request from automatically pre-populated form and only override login and password
        return scrapy.FormRequest.from_response(
            response,
            formdata={ 
                'signInPanel:login': '[login]', 
                'signInPanel:password': '[password]' 
                },
            callback=self.submit_profile
        )

    def submit_profile(self, response):
        # Extract the hidden token value
        xpath = '//*[starts-with(@id, "_action")]'
        token_id = response.xpath(xpath + '/@id').extract_first()
        token_value = response.xpath(xpath + '/@value').extract_first()
        data = { 
                'additionalFilters:filters:0:filter:input':'',  
                'additionalFilters:filters:1:filter:input': '-1',
                'additionalFilters:filters:2:filter:input': '-1',
                'additionalFilters:filters:3:filter:slider:model:input':    '0',
                'additionalFilters:filters:4:filter:input': '-1',
                'additionalFilters:filters:5:filter:langs:0:lang:language': '',
                'id145e_hf_0':  '',
                'primaryFilters:1:filter:textfield':    '',
                'searchSubmit': '1',
                token_id: token_value,
                'primaryFilters:0:filter:textField': '[keyword]' 
                }
        # Submit ideal candidate profile to search after logged in
        return scrapy.FormRequest.from_response(
            response,
            formdata=data,
            callback=self.parse_candidates
        )

    def parse_candidates(self, response):
        # Parse the CVs list of candidates
        for row in response.xpath('//*[@class="cv-list-overview"]//tbody/tr'):
            res = {
                'id': row.xpath('td[1]//text()').extract_first()
            }
            yield res

我以

运行蜘蛛之后
scrapy runspider spiders/candidates.py 

它在inspect_response(response,self)行显示了外壳,因此我可以检查响应。它不包含任何结果表,因为我可以通过view(response)进行确认-只是带有字段的表单。

我还尝试了

形式的SplashRequest和FormRequest
return scrapy.FormRequest(url=self.db_url, formdata=data, callback=self.parse_candidates)

也没有结果。

域是https://my.teamio.com

0 个答案:

没有答案