我在Scrapy和Python代码中遇到此问题,当前正在抓取第一页或最后一页。换句话说,问题是将返回总共3000个产品中的第一个或最后一个分页的产品集并将其写入Excel。 我需要查询来对所有后续页面重复提取。当URL更改为下一页时,它以120的增量递增(一次显示120个产品),直到分页达到了3000种待售产品的末尾。
“ https://boston.craigslist.org/search/sss” “ https://boston.craigslist.org/search/sss?s=120” “ https://boston.craigslist.org/search/sss?s=240”
import Scrapy
import xlsxwriter
class SalesSpider(scrapy.Spider):
name = 'sales'
allowed_domains = ["craigslist.org"]
start_urls = []
for x in range(120, 3000,120):
url = 'https://boston.craigslist.org/d/for-sale/search/' + "sss?s=%s"%str(x)
start_urls.append(url)
def parse(self, response):
titles = response.xpath('//a[@class="result-title hdrlnk"]/text()').extract()
prices = response.xpath('//span[@class="result-price"]/text()').extract()
totalcount=response.xpath('//span[@class="totalcount"]/text()').extract()
total=totalcount[1]
row=0
row_two=-1
column=0
workbook=xlsxwriter.Workbook('Items.xlsx')
worksheet=workbook.add_worksheet()
for i in titles:
worksheet.write(row,column,i)
row+=2
for j in range(0, len(prices)):
if j % 2:
worksheet.write(row_two,column+1,prices[j])
row_two+=1
workbook.close()