这是我的种子网址:
http://www.amazon.com/s/ref=sr_nr_n_0?rh=n%3A133140011%2Cn%3A%21133141011%2Cn%3A154606011%2Cn%3A668010011%2Cn%3A158591011%2Cn%3A158592011&bbn=158591011&ie=UTF8&qid=1403264414&rnid=158591011
如何从scrapy中提取所有kindle book链接?
这是我的代码,但我没有得到预期的结果:
class MySpider(CrawlSpider):
name = "scraper"
allowed_domains = ["amazon.com"]
start_urls = ["http://www.amazon.com/s/ref=sr_nr_n_0?rh=n%3A133140011%2Cn%3A%21133141011%2Cn%3A154606011%2Cn%3A668010011%2Cn%3A158591011%2Cn%3A158592011&bbn=158591011&ie=UTF8&qid=1403264414&rnid=158591011"]
def parse(self, response):
hxs = HtmlXPathSelector(response)
items = hxs.select('//*[@id="resultsCol"]').re('\/dp\/B00.*digital-text')
for item in items:
link = item.extract()
print link