Question

我是scrapy的新手。

我试图废弃一个这样的网站：

开始页面=产品列表

此列表中的每个产品都转到产品页面

转到另一个网页抓取数据

class one_Spider(scrapy.Spider):
    name = "one_Spider"
    start_urls = [list_of product]

    def parse(self, response):
        for article_url in response.xpath("//ul[@class=\"products-list row w-product-list\"]//a/@href").extract():
            article_url = response.urljoin(article_url)
            yield SplashRequest(article_url, self.parse_article,endpoint='render.html',
                                args={'wait': 0.5}
                                )

    def parse_article(self, response):
        produit = {"name": response.xpath("//h1[@class=\"saz-h1\"]/text()").extract_first(),
                   "Nombre eval": response.xpath(".//span[@class=\"label text-4\"]/text()").extract_first(),
                   "commentaires": []
                   }
        for comment in response.xpath('//li[@data-numbercomments="10"]'):
            single_comment = {"comment": comment.xpath('//p[@class="review-body"]/text()').extract_first(),
                              "user": ""
                              }

            user_url = comment.xpath('//a[@title="Visitez son espace personnel"]/@href').extract_first()
            req = scrapy.Request(user_url, callback=self.parse_user)
            req.meta['item'] = single_comment
            produit["commentaires"].append(single_comment)
        yield produit



    def parse_user(self,response):
        single_comment = response.meta['item']
        single_comment["user"] = response.xpath('//span[@class="T5darkgray"]/text()').extract_first()
        yield single_comment

current_output：

name : "Product name"
xxx : " xxx "

comment : "samet"
user : "empty"

comment : "same"
user : "empty"

输出需要：

name : "Product name"
xxx : " xxx "

comment : "x"
user : "h"

comment : "y"
user : "g"

有人可以解释我错在哪里，为什么？谢谢你的时间

scrapy方法在解析方法

0 个答案: