我正在取消对公司的评论。我已成功抓取评论,但仅提取了评论的第一页。
我以前使用过Scrapy.crawl。我将其修改为Spider,但输出仍然相同
def parse_review(self, response):
item = response.meta['item']
result = []
reviews = response.css('.summary::text').extract()
#Give the extracted content row wise
for review in reviews:
review = review[1:-1]
result.append(review);
item['review'] = result
#next page
next_page = ''.join(response.xpath('//li[@class = "next"]/a/@href').extract())
if next_page == '':
pass
else:
next_page_link = "some Url" + next_page
print(next_page_link)
yield Request(next_page_link, callback = self.parse_review ,meta = {'item':item})
yield item
我的输出看起来像这样:-
{“国家”:“印度”,“公司”:“ xyz”,“审核”:[“高级顾问”,“工作生活平衡”,“项目会计分析师”,“优秀”]},>
我只从第一页获得评论。如何获得所有页面的评论?