尝试从包含多个页面的网站中抓取数据。但是,运行此代码后我什么都没得到。可能有一些遗漏点我无法指出。你想指出所以我可以从print语句中获得输出。 这是蜘蛛文件:
# -*- coding: utf-8 -*-
from scrapy import Spider
from scrapy.http import Request
class SaabSpider(Spider):
name = 'saab'
allowed_domains = ['thesaabsite.com/parts_om.php']
start_urls = ['http://www.thesaabsite.com/parts_om.php']
def parse(self, response):
catagories=response.xpath('//ul[@class="nav col-md-offset-4 col-md-4"]/li/a/text()').extract()
year_page_url=response.xpath('//ul[@class="nav col-md-offset-4 col-md-4"]/li/a/@href').extract()
for j in year_page_url:
absolute_url=response.urljoin(j)
yield Request(absolute_url,callback=self.model_page)
def model_page(self,response):
year=response.xpath('//li[@class="tab-pane first text-center"]/a/text()').extract()
year_url=response.xpath('//li[@class="tab-pane first text-center"]/a/@href').extract()
for y in year:
print '\n'
print y
def main_part(self,response):
pass
在命令提示符下运行它之后。我得到了这个输出!