我不确定我应该如何构造我的代码,以便每次函数递归调用自身时都会更新offset参数。这里有关于我的脚本和我试图解决的挑战的更多细节。我觉得我有一些简单的解决方法,我在这里失踪了。 Scraping Website With Infinite Scroll Using Scrapy
import scrapy
import json
import requests
class LetgoSpider(scrapy.Spider):
name = 'letgo'
allowed_domains = ['letgo.com/en']
start_urls = ['https://search-products-pwa.letgo.com/api/products?country_code=US&offset=0&quadkey=0320030123201&num_results=50&distance_type=mi']
def parse(self, response):
data = json.loads(response.text)
for used_item in data:
if len(data) == 0:
break
try:
title = used_item['name']
price = used_item['price']
description = used_item['description']
date = used_item['updated_at']
images = [img['url'] for img in used_item['images']]
latitude = used_item['geo']['lat']
longitude = used_item['geo']['lng']
except Exception:
pass
yield {'Title': title,
'Price': price,
'Description': description,
'Date': date,
'Images': images,
'Latitude': latitude,
'Longitude': longitude
}
i = 0
for new_items_load in response:
i += 50
offset = i
new_request = 'https://search-products-pwa.letgo.com/api/products?country_code=US&offset=' + str(i) + \
'&quadkey=0320030123201&num_results=50&distance_type=mi'
yield scrapy.Request(new_request, callback=self.parse)
答案 0 :(得分:2)
将偏移量定义为类属性:
class LetgoSpider(scrapy.Spider):
name = 'letgo'
allowed_domains = ['letgo.com/en']
start_urls = ['https://search-products-pwa.letgo.com/api/products?country_code=US&offset=0&quadkey=0320030123201&num_results=50&distance_type=mi']
offset = 0 # <- here
然后,您可以使用self.offset
来引用它,并且将在所有函数parse
调用中共享该值。所以它是这样的:
self.offset += 50
new_request = 'https://search-products-pwa.letgo.com/api/products?country_code=US&offset=' + str(self.offset) + \
'&quadkey=0320030123201&num_results=50&distance_type=mi'
yield scrapy.Request(new_request, callback=self.parse)