Question

我正在尝试将字符串追加到解析函数中的索引数组，但是当我尝试将其保存到.json时为空。

import scrapy
import json

class NewsBrief(scrapy.Spider):
    name = "briefs"
    indexes = []
    def start_requests(self):
        ids = []
        url = "url"

        with open('test_id.json') as json_data:
            ids = json.load(json_data)

        for i in ids:
            yield scrapy.http.FormRequest(url=url+str(i), callback=self.parse)

        #self index is empty here
        print(self.indexes)

        with open('data_briefs.json', 'w') as outfile:
            json.dump(self.indexes, outfile)

    def parse(self, response):
        sentence = ""
        for span in enumerate(response.xpath('//div[@class="newsread olnr"]/p/text()').getall()):
            sentence += str(span[1]).replace('\n', ' ').replace('\r', ' ')
        self.indexes.append(sentence)

Answer 1

变量self.indexes将在循环后不填充请求。请求甚至没有在那里完成。

如果您不想使用常见的导出到文件，则可以将对文件的写入放在蜘蛛关闭时起作用。在此处查看详细信息：scrapy: Call a function when a spider quits

您需要将信号绑定到函数并在其中编写代码。

无法为自变量python赋值

1 个答案: