Question

我正在尝试使用Scrappy for Python废弃网站。

我可以废弃数据，但

我想在输出中添加一个额外的字段比如“Serial ID”：“3001”，对于每种产品，它都会报废序列号ID应增加1，如3002,3003,3004 .............

def parse_dir_contents(self,response):
    cat = response.meta['cat']
    serial_id = I
    item = []
    content = {}

    content['serial_id'] = serial_id
    content['url'] = response.url
    content['category'] = cat
    brand = response.xpath('//div[@class="pageinfo__brdcrmb"]/text()').extract()[0].split('/')
    content['brand'] = brand[1].strip()
    I = I + 1
    item.append(content)
    output = json.dumps(item, sort_keys=True, indent=4, separators=(',', ': '))
    self.json_file.write(output)

对于上述代码，我收到类似

的错误

content ['url'] = response.url NameError：名称'response'未定义

Answer 1

未定义第三行中的

名称I。将其更改为

serial_id = 1

然后用：

增加

serial_id += 1

您可以使用scrapy好处（如管道，定义项目......）并保持代码整洁。

阅读这些有用的文件：

https://doc.scrapy.org/en/latest/

想要将字段添加到python scrappy输出中，如序列号，对于每个报废的产品，增加1

1 个答案: