我试图从Howdens网站上删除一些地址数据。但是,要做到这一点,我需要输入一些表单数据,以便在我选择的邮政编码附近找到本地软件仓库。
起始网址为" https://www.howdens.com/about-us/contact-your-local-depot/"
,源代码是:
<form id="addressForm" _lpchecked="1">
<label for="address">
Enter your Postcode / Town:
<i id="searchNearest" class="icon-search depot-search"></i>
<input name="address" id="address" class="address-input" value="" ""="">
</label>
<div id="add_matches" class="noDisplay">
<strong>Multiple results found for your input address. Please select one and search again:</strong>
<select id="add_sel" onchange="document.getElementById('address').value=this.options[this.selectedIndex].value;"></select>
</div>
<p>
</p>
</form>
我试图使用的python代码是:
import scrapy
from scrapy.http import FormRequest, Request
from Howdens.items import HowdensItem
class howdensSpider(scrapy.Spider):
name = "howdens"
allowed_domains = ["www.howdens.com"]
start_urls = [
"https://www.howdens.com/about-us/contact-your-local-depot/",
]
def parse(self, response):
yield FormRequest.from_response(response, formxpath='//*[@id="addressForm"]', formdata={'address':'W3'}, callback=self.parse_dir_contents)
def parse_dir_contents(self, response):
for sel in response.xpath('//*[@id="sidebar"]'):
item = HowdensItem()
item['name'] = sel.xpath('./h2/text()').extract()
item['address'] = sel.xpath('./p/text()').extract()
yield item
到目前为止,问题都存在于FormRequest行。它会返回一个&#34; https://www.howdens.com/about-us/contact-your-local-depot/?address=W3&#34;的网址。而不是向网站提交请求以返回更详细的网址。
任何有关我出错的指导都会得到很好的接受?