Question

我正在用python3写一个带有scrapy的蜘蛛，我用了很短的时间才开始scrapy。我正在捕获一个网站的数据，几分钟后，该网站可能使我获得302状态，并重定向到另一个URL进行验证。所以我想将网址保存到文件中。

例如，https://www.test.com/article?id=123是我要请求的内容，然后它向我302重定向到https://www.test.com/vrcode

我想将https://www.test.com/article?id=123保存到文件中，该怎么办？

class CatchData(scrapy.Spider):
    name = 'test'

    allowed_domains = ['test.com']

    start_urls = ['test.com/article?id=1',
                  'test.com/article?id=2',
                  # ...
                 ]

    def parse(self, response):
        item = LocationItem()
        item['article'] = response.xpath('...')
        yield item

我从How to get the scrapy failure URLs?找到了答案

但这是六年前的一个答案，我想知道还有更简单的方法可以做到这一点

Answer 1

with open(file_name, 'w', encoding="utf-8") as f:
    f.write(str(item))

如何获取错误状态的网址

1 个答案: