Question

在parse函数中将一个变量声明为“ self.Title”并通过另一个函数产生数据之后，它仅返回所有其他URL中的一个URL的数据。这可能会出错。这是代码段。

import scrapy
from scrapy.http import Request

class TestSpider(scrapy.Spider):
    name = 'Test'
    allowed_domains = ['example.com']
    start_urls = ['https://example.com/search?q=com.foo', 'https://example.com/search?q=bar', 'https://example.com/search?q=data']

    def parse(self, response):

        self.Title = response.xpath('//*[@class="search-title"]/a/text()')[0].extract()
        Ini_Url = response.xpath('//*[@class="search-title"]/a/@href')[0].extract()
        Ab_url = "https://example.com" + Ini_Url + "/download?from=details"
        yield Request(Ab_url, callback=self.parse_download)

    def parse_download(self, response):
        Download_URL = response.xpath('//*[@class="fdownload-box"]/p[2]/a/@href')[0].extract()

        yield{"Download_URL": Download_URL, "Title": self.Title}

输出结果像Download_URL对于所有3个抓取的URL都是不同的，但标题虽然对于所有3个请求都是相同的。

Answer 1

您不能在Spider类的实例上存储每个项目的数据。

当parse产生Request时，请按照docs中的说明将Title传递为metadata。然后可以在parse_download属性的response.meta中使用它。

Answer 2

作为解决方案，我确实在第一个函数中编写了此代码段：

request.meta['Title'] = Title
yield request

并通过以下方式将其称为另一个：

Title = response.meta['Title']

Python-Scrapy：通过另一个函数产生在一个函数中定义的变量名

2 个答案: