Question

scrapy是否可以抓取包含＆＃39; hello＆＃39;只有一次并继续抓取其余的网址并关注它们？

感谢任何建议/帮助。

Answer 1

您可以定义类级别布尔变量，并默认将其设置为False。然后，在抓取包含hello的网址后，将其设置为True。像这样：

class MySpider(Spider):
    hello_crawled = False

    ...

    def parse(self, response):
        if 'hello' in response.url:
            if self.hello_crawled:
                return
            else:
                self.hello_crawled = True
        ...

Scrapy只抓取一次链接

1 个答案: