Question

我正在尝试理解Scrapy执行，但由于在between.i中使用的生成器而感到困惑。我对生成器几乎没有想法，但我无法在这里可视化/关联这些内容

下面是scrapy文档中的代码

问题

1）产量如何在这里起作用

2）我在 parse 函数中看到两个for循环，1st for循环在yield中调用 parse_author 函数但是只在for循环1之后调用（执行两次）和loop2（执行一次）。可以解释一下执行流程是如何发生的。

import scrapy
from datetime import datetime, timedelta
name = 'prox-reveal'
start_urls = ['http://quotes.toscrape.com/']
def parse(self, response):
    # follow links to author pages
    for href in response.css('.author + a::attr(href)'):
        print('1---------->{}'.format(datetime.now().strftime('%Y%m%d_%H%M%S-%f')))
        yield response.follow(href, self.parse_author)

    # follow pagination links
    for href in response.css('li.next a::attr(href)'):
        print('2---------->{}'.format(datetime.now().strftime('%Y%m%d_%H%M%S-%f')))
        yield response.follow(href, self.parse)

def parse_author(self, response):
    print('3---------->{}'.format(datetime.now().strftime('%Y%m%d_%H%M%S-%f')))
    def extract_with_css(query):
        return response.css(query).extract_first().strip()

    yield {
        'name': extract_with_css('h3.author-title::text'),
        'birthdate': extract_with_css('.author-born-date::text'),
        'bio': extract_with_css('.author-description::text'),
    }

感谢

Answer 1

请求及其回调之间关系的简要概述：

创建Request对象并将其传递给Scrapy的引擎以进行进一步处理
```
yield response.follow(href, self.parse_author)
```
下载请求的网页并创建Response对象
使用创建的响应调用请求的回调（parse_author()）

现在我认为这部分会给你带来麻烦。

Scrapy是一个异步框架，它可以在等待I / O操作（例如下载网页）时完成其他任务。

因此，您的循环将继续，其他请求将被创建和处理，并且一旦其数据可用，就会调用回调。

Scrapy执行流程

1 个答案: