我正在做scrapy tutorial,并且在“ Craigslist Scrapy Spider#3 –多页”部分中,但是在按照给出的说明进行操作后,无法获得多个页面。我所做的与本教程显示的内容之间的唯一区别是,我使用了“所有作业”,而不是仅使用工程作业(因为只有一页工程作业)。下面是我的代码
import scrapy
from scrapy import Request
class JobsSpider(scrapy.Spider):
name = 'jobs-new'
allowed_domains = ['craigslist.org']
start_urls = ['https://newyork.craigslist.org/search/jjj']
def parse(self, response):
jobs = response.xpath('//p[@class="result-info"]')
for job in jobs:
title = job.xpath('a/text()').extract_first()
address = job.xpath('span[@class="result-meta"]/span[@class="result-hood"]/text()').extract_first("")[2:-1]
relative_url = job.xpath('a/@href').extract_first()
absolute_url = response.urljoin(relative_url)
yield{'URL':absolute_url, 'Title':title, 'Address':address}
relative_next_url = response.xpat('//a[@class="button next"]/@href').extract_first()
absolute_next_url = response.urljoin(relative_next_url)
yield request(absolute_next_url, callback=self.parse)
我在终端中使用
scrapy crawl jobs-new -o jobs-new.csv
但是.csv
文件中只有第一页结果。
要获得一页以上的内容,我需要做什么?教程不正确还是我理解不正确?
答案 0 :(得分:0)
我只是编辑您的代码,然后发现就可以了。
import scrapy
from scrapy import Request
class JobsSpider(scrapy.Spider):
name = 'jobs-new'
allowed_domains = ['craigslist.org']
start_urls = ['https://newyork.craigslist.org/search/jjj']
def parse(self, response):
jobs = response.xpath('//p[@class="result-info"]')
for job in jobs:
title = job.xpath('a/text()').extract_first()
address = job.xpath('span[@class="result-meta"]/span[@class="result-hood"]/text()').extract_first("")[2:-1]
relative_url = job.xpath('a/@href').extract_first()
absolute_url = response.urljoin(relative_url)
yield {'URL': absolute_url, 'Title': title, 'Address': address}
relative_next_url = response.xpath('//a[@class="button next"]/@href').extract_first()
absolute_next_url = response.urljoin(relative_next_url)
yield scrapy.Request(absolute_next_url, callback=self.parse)
这是一些输出
{'URL': 'https://newyork.craigslist.org/brk/trp/d/brooklyn-overnight-parking-attendant/7166876233.html', 'Title': 'Overnight Parking Attendant', 'Address': 'Brooklyn, NY'}
{'URL':'https://newyork.craigslist.org/wch/fbh/d/yonkers-experience-grill-man/7166875818.html','Title':'Experience grill man','地址':'Yonkers'}