您好,我的蜘蛛脚本有问题,我想让脚本可读性强,并希望尽可能节省代码。可以在不同的URL上使用相同的解析吗?
我只希望每页抓取10个项目,并将其保存在items.py
的其他项目功能中
这是我的代码
def start_requests(self): #I have 3 URL's Here
yield scrapy.Request('https://teslamotorsclub.com/tmc/post-ratings/6/posts', self.parse) #Url 1
yield scrapy.Request('https://teslamotorsclub.com/tmc/post-ratings/7/posts', self.parse) #Url 2
yield scrapy.Request('https://teslamotorsclub.com/tmc/post-ratings/1/posts', self.parse) #Url 3
def parse(self, response): #My logic is something like this
if Url == Url1:
item = TmcnfSpiderItem()
elif Url == Url2:
item = TmcnfSpiderItem2()
elif Url == Url3:
item = TmcnfSpiderItem3()
if count <= 9:
count += 1
info = response.css("[id^='fc-post-" + postno_only +"']")
author = info.xpath("@data-author").extract_first()
item['author'] = author
yield item
else:
#Move to next URL and perform same parse
有什么主意吗?
答案 0 :(得分:2)
我认为您可以尝试传递来自start_requests
的所有数据,如下所示:
def start_requests(self):
urls = (
('https://teslamotorsclub.com/tmc/post-ratings/6/posts', TmcnfSpiderItem),
('https://teslamotorsclub.com/tmc/post-ratings/7/posts', TmcnfSpiderItem2),
('https://teslamotorsclub.com/tmc/post-ratings/1/posts', TmcnfSpiderItem3),
)
for url, itemclass in urls:
yield scrapy.Request(url, meta={'itemclass': itemclass})
def parse(self, response):
item = response.meta['itemclass']()
因此,您为每个URL传递了项目类名称,然后在parse
函数中创建此类的新元素。