Question

我对Python和Scrapy很陌生，我认为答案应该很简单，但很难自己解决这个问题。代码获取所有链接，跟随它们并记录文章的标题。如何传递我到达我的项目的URL？我想保存它与文章标题一起使用的短链接。谢谢

def parse(self, response):
    for url in response.xpath("//li[@id]/@data-shortlink").extract():
        yield scrapy.Request(url, callback=self.get_details)

def get_details(self, response):
        article = ArticleItem()
        article['title'] = response.xpath("//h1/text()").extract()
        yield article

Answer 1

由于它包含在Response() object中，您可以使用response.url获取网址：

def get_details(self, response):
        article = ArticleItem()
        article['title'] = response.xpath("//h1/text()").extract()
        article['url'] = response.url
        yield article

Scrapy - 在关注它们的同时保存链接

1 个答案: