如何在抓取页面时循环播放?

时间:2019-04-20 05:37:57

标签: python scrapy

我正在抓取页面,但是有问题。我不想一次又一次地打印函数中的items ['Paragraphs'] = response.css('p :: text')。extract()。相反,我想做一个循环。我尝试了几次,但是失败了。这是代码。

def parse_about(self, response):
    # do your stuff on a page
    items = response.meta['items']
    names = {'name1':'Headings','name2':'Paragraphs'}
    finder = {'find1':'h2::text , #mainContent h1::text','find2':'p::text'}
    for name in names.values():
        for find in finder.values():
            items[name] = response.css(find).extract()
            yield items

1 个答案:

答案 0 :(得分:0)

您能描述一下您想要获得什么输出吗?

据我所知,您可以将zip应用于字典,它将合并您的值并以更清晰的方式实现迭代。并且最好在周期结束时产生项目。

def parse_about(self, response):
    # do your stuff on a page
    items = response.meta['items']
    names = {'name1':'Headings','name2': 'Paragraphs'}
    finder = {'find1':'h2::text , #mainContent h1::text', 'find2': 'p::text'}
    for name, find in zip(names.values(), finder.values()):
        items[name] = response.css(find).extract()
    yield items

还是为什么不从一开始就写出正确的格言?

def parse_about(self, response):
    # do your stuff on a page
    items = response.meta['items']
    dct = {
        'Headings': 'h2::text , #mainContent h1::text',
        'Paragraphs': 'p::text',
    }
    for name, find in dct.iteritems():
        items[name] = response.css(find).extract()
    yield items