CODE
spider.py
...
def parse(self, response):
for one_item in response.xpath('path1'):
item = ProjectItem()
request = scrapy.Request(one_item.xpath('path2'), callback=self.parse2)
request.meta['item'] = item
yield request
property = []
def parse2(self, response)
item = response.meta['item']
for x in response.xpath('path3')
self.property.append('path4')
next_page = response.xpath('path5')
if next_page is not None:
request2 = scrapy.Request(next_page, callback=self.parse2)
request2.meta['item'] = item
yield request2
else:
item['field'] = self.property
self.property = []
yield item
问题是当蜘蛛爬到next_page
时。某些self.property
会分配给错误的项目。我不知道如何修复它。
答案 0 :(得分:1)
"1:30PM to 4:00PM"
是一个类属性,在"8:00AM to 10:00AM"
的所有调用中共享,您无法控制每次调用"1:00PM to 3:00PM"
的顺序。
要解决这个问题,您需要在元数据中传递属性列表或作为项属性:
self.property