我正在尝试使用Scrapy抓取一个网站。我在表单中有一个锚标记。这是它的外观。
<form id="id8" method="post" action="?x=jmm9hwO6whVFYOGs283r0oqegcq8yXZcwcLUxU*NCVmxpqCK*OWQzUdI-IbQ6HlEzZltb5qJVzerKYQCL6HzihRR9N8V514r"><div style="width:0px;height:0px;position:absolute;left:-100px;top:-100px;overflow:hidden"><input type="hidden" name="id8_hf_0" id="id8_hf_0"></div><div style="width:0px;height:0px;position:absolute;left:-100px;top:-100px;overflow:hidden"><input type="text" autocomplete="false"><input type="submit" name="linkFrag:beginButton" onclick=" var b=document.getElementById('id9'); if (b!=null&&b.onclick!=null&&typeof(b.onclick) != 'undefined') { var r = b.onclick.bind(b)(); if (r != false) b.click(); } else { b.click(); }; return false;"></div>
<div id="welcomePageHeader" class="adminMainHeader"><h2><span>Welcome Page</span></h2></div>
<!-- div class="adminContentHead subsection2 bgClr2"><span wicket:id="portalName"></span></div -->
<div class="formSec bgClr1 welcomePageSec">
<div id="acknowledgement">
<div style="margin-bottom:15px;">
<a href="#" class="anchorButton" name="linkFrag:beginButton" id="id9" onclick="var wcall=wicketSubmitFormById('id8', '?x=jmm9hwO6whVFYOGs283r0oqegcq8yXZcwcLUxU*NCVmxpqCK*OWQzT6gI89aPkynDZsfSFy0vfCZ8uZoIAv57mMqr2tk6xTsXvag2x0Lls69vFKIQ4*fYtWt7EDYFB1mGj7vxgn8Frj5gSWKFqJKjjfDNioG2zA9SBQwdbcR*80', 'linkFrag:beginButton' ,function() { }.bind(this),function() { }.bind(this), function() {return Wicket.$$(this)&&Wicket.$$('id8')}.bind(this));;; return false;"><span>Search</span></a>
</div>
</div>
<div id="notice" class="subsection1 bgClr2">
<span type="text" id="alertTextInput" class="containerSpacing"></span>
</div>
</div>
</form>
我无法点击锚标记链接并继续。到目前为止尝试以下选项。
前往锚标记内的正常链接
next_page_url = response.xpath('//div[@id="acknowledgement"]//a/@href').extract_first()
if next_page_url is not None:
next_page_url = response.urljoin(next_page_url)
yield scrapy.Request(next_page_url, callback = self.search_court_case)
发送表单请求
yield FormRequest.from_response(response,
formxpath = "//*[@id=\"id8\"]",
callback=self.parse1)
有人可以指导如何解决此问题吗?另外我读了许多文章,Scrapy无法处理Javascript。如果您在上面的HTML中注意到,锚标记正在调用Javascript函数,该函数除了使用标记POST表单之外什么都不做。有没有办法用Scrapy模拟它?