Question

我不确定我的问题是否正确，但我在scrapy方面还很新。因此，与该主题相关的所有评论都将非常有用。

我的问题是我有一个结构的网站（像json文件一样）：

{ 2: { 
       1: 'http://example.com/1.jpg', 
       2: 'http://example.com/2.jpg'
} 
...// bunch of other information
}

和链接相关，但我需要直接链接到此图片。我试图解决这个任务：

urlData = scrapy.Request（url =＆＃39; http://myserver/1.jpg＆＃39;，回调= self.parse_link）

回调函数是：

 def parse_link( self, response ):
    Hxs = scrapy.selector.HtmlXPathSelector(response)
    Data = LacBacLink()
    link =  Hxs.select("(//img)[1]/@src").extract()
    Data["Link"] = link
    return Data

我认为如果我能够立即从scrapy中获得回调请求函数的结果，它将正常工作。

我曾尝试将代码写入文档：

 def parse_page1(self, response):
    return scrapy.Request("http://www.example.com/some_page.html",
                          callback=self.parse_page2)

def parse_page2(self, response):
    # this would log http://www.example.com/some_page.html
    self.logger.info("Visited %s", response.url)

它对我不起作用。如果您知道如何使用此代码解决此任务，请给我任何实现以便更好地理解。

Answer 1

这不是正常工作的代码 - 阅读我添加的评论。

def parse_page1(self, response):
    # here you would collect all the information that you need from the first page and put it in an item
    r = scrapy.Request("http://www.example.com/some_page.html",
                          callback=self.parse_page2)
    r.meta['item'] = item
    yield r

def parse_page2(self, response):
    item = response.meta['item']
    # add what you need from this response to the item
    yield item # if you need to parse more pages yield a request instead.

立即从回调scrapy请求中获取结果

1 个答案: