谁能告诉我为什么parse()中的索引变量的数量始终是10013?
class GetsourcesSpider(scrapy.Spider):
name = 'getSources'
allowed_domains = ['bizhi.feihuo.com']
base_url = 'http://bizhi.feihuo.com/wallpaper/share?rsid={index}/'
def start_requests(self):
for index in range(10010, 10014):#11886
yield scrapy.Request(url=self.base_url.format(index=index), callback=lambda response:self.parse(response,index))
def parse(self, response, index):
video_label = response.xpath('//video')[0]
item = DynamicdesktopItem()
item['index'] = index # response.url[-6:-1]
item['video'] = video_label.attrib['src']
item['image'] = video_label.attrib['poster']
yield item
答案 0 :(得分:2)
那是因为您给index
变量引用而不是值,所以才得到最后一个值。您需要使用meta
对象。请在下面查看更新的代码
class GetsourcesSpider(scrapy.Spider):
name = 'getSources'
allowed_domains = ['bizhi.feihuo.com']
base_url = 'http://bizhi.feihuo.com/wallpaper/share?rsid={index}/'
def start_requests(self):
for index in range(10010, 10014):#11886
yield scrapy.Request(url=self.base_url.format(index=index), callback=self.parse, meta = {'index': index})
def parse(self, response):
index = response.meta['index']
video_label = response.xpath('//video')[0]
item = DynamicdesktopItem()
item['index'] = index # response.url[-6:-1]
item['video'] = video_label.attrib['src']
item['image'] = video_label.attrib['poster']
yield item
答案 1 :(得分:0)
因为所有lambda引用的index
变量未复制到其本地范围。每次下一次循环迭代时都会对其进行重写。
请考虑以下代码段:
lambdas = []
for i in range(3):
lambdas.append(lambda: print(i))
for fn in lambdas:
fn()
这将打印三个2,最后一个值为i
。
您应该使用Request类的meta=
关键字,而不是执行lambda回调:
https://doc.scrapy.org/en/latest/topics/request-response.html#request-meta-special-keys