我在start_urls数组中有一个网址,如下所示:
start_urls = [
'https://www.ebay.com/sch/tp_peacesports/m.html?_nkw=&_armrs=1&_ipg=&_from='
]
def parse(self, response):
shop_title = self.getShopTitle(response)
sell_count = self.getSellCount(response)
self.shopParser(response, shop_title, sell_count)
def shopParser(self, response, shop_title, sell_count):
items = EbayItem()
items['shop_title'] = shop_title
items['sell_count'] = sell_count
if sell_count > 0:
item_links = response.xpath('//ul[@id="ListViewInner"]/li/h3/a/@href').extract()
for link in item_links:
items['item_price'] = response.xpath('//span[@itemprop="price"]/text()').extract_first()
yield items
现在位于for循环内的shopParser()中,我具有不同的链接,并且我需要的响应与来自start_urls的原始响应不同,响应,如何实现?
答案 0 :(得分:1)
您需要调用对新页面的请求,否则您将不会获得任何新的html。尝试类似的东西:
SELECT json_group_array(json_object('rank', rank
, 'name', name
, 'director', director
, 'year', year
, 'rating', rating
, 'starring', starring))
FROM movies;
这些新请求也将通过def parse(self, response):
shop_title = response.meta.get('shop_title', self.getShopTitle(response))
sell_count = response.meta.get('sell_count', self.getSellCount(response))
# here you logic with item parsing
if sell_count > 0:
item_links = response.xpath('//ul[@id="ListViewInner"]/li/h3/a/@href').extract()
# yield requests to next pages
for link in item_links:
yield scrapy.Request(response.urljoin(link), meta={'shop_title': shop_title, 'sell_count': sell_count})
函数进行解析。或者,您可以根据需要设置另一个回调。