单击按钮时出现Scrapy问题

时间:2017-02-18 17:39:37

标签: python html web-scraping scrapy scrapy-spider

我正在尝试使用Scrapy抓取一个网站。我在表单中有一个锚标记。这是它的外观。

<form id="id8" method="post" action="?x=jmm9hwO6whVFYOGs283r0oqegcq8yXZcwcLUxU*NCVmxpqCK*OWQzUdI-IbQ6HlEzZltb5qJVzerKYQCL6HzihRR9N8V514r"><div style="width:0px;height:0px;position:absolute;left:-100px;top:-100px;overflow:hidden"><input type="hidden" name="id8_hf_0" id="id8_hf_0"></div><div style="width:0px;height:0px;position:absolute;left:-100px;top:-100px;overflow:hidden"><input type="text" autocomplete="false"><input type="submit" name="linkFrag:beginButton" onclick=" var b=document.getElementById('id9'); if (b!=null&amp;&amp;b.onclick!=null&amp;&amp;typeof(b.onclick) != 'undefined') {  var r = b.onclick.bind(b)(); if (r != false) b.click(); } else { b.click(); };  return false;"></div>

<div id="welcomePageHeader" class="adminMainHeader"><h2><span>Welcome Page</span></h2></div>
<!-- div class="adminContentHead subsection2 bgClr2"><span wicket:id="portalName"></span></div -->
<div class="formSec bgClr1 welcomePageSec">
    <div id="acknowledgement">
        <div style="margin-bottom:15px;">
        <a href="#" class="anchorButton" name="linkFrag:beginButton" id="id9" onclick="var wcall=wicketSubmitFormById('id8', '?x=jmm9hwO6whVFYOGs283r0oqegcq8yXZcwcLUxU*NCVmxpqCK*OWQzT6gI89aPkynDZsfSFy0vfCZ8uZoIAv57mMqr2tk6xTsXvag2x0Lls69vFKIQ4*fYtWt7EDYFB1mGj7vxgn8Frj5gSWKFqJKjjfDNioG2zA9SBQwdbcR*80', 'linkFrag:beginButton' ,function() { }.bind(this),function() { }.bind(this), function() {return Wicket.$$(this)&amp;&amp;Wicket.$$('id8')}.bind(this));;; return false;"><span>Search</span></a>
        </div>
    </div>

    <div id="notice" class="subsection1 bgClr2">
        <span type="text" id="alertTextInput" class="containerSpacing"></span>
    </div>
</div>
</form>

我无法点击锚标记链接并继续。到目前为止尝试以下选项。

  1. 前往锚标记内的正常链接

    next_page_url = response.xpath('//div[@id="acknowledgement"]//a/@href').extract_first()
    if next_page_url is not None:
        next_page_url = response.urljoin(next_page_url)
        yield scrapy.Request(next_page_url, callback = self.search_court_case) 
    
  2. 发送表单请求

    yield FormRequest.from_response(response,
                                    formxpath = "//*[@id=\"id8\"]",
                                    callback=self.parse1)
    
  3. 有人可以指导如何解决此问题吗?另外我读了许多文章,Scrapy无法处理Javascript。如果您在上面的HTML中注意到,锚标记正在调用Javascript函数,该函数除了使用标记POST表单之外什么都不做。有没有办法用Scrapy模拟它?

0 个答案:

没有答案