在刮刀中使用“返回”功能而不是“打印”

时间:2017-09-06 21:08:51

标签: python python-3.x web-scraping return

在我下面的脚本中,如果我拿出“return”语句并在那里放置“print”,那么我得到所有结果。但是,如果我按原样运行它,我只得到第一项。我的问题是如何在这种情况下使用“返回”获得所有结果,我的意思是,应该是什么过程?

这是脚本:

import requests
from lxml import html

main_link = "http://onlinelibrary.wiley.com/journal/10.1111/(ISSN)1467-6281/issues"

def abacus_scraper(main_link):
    tree = html.fromstring(requests.get(main_link).text)
    for titles in tree.cssselect("a.issuesInYear"):
        title = titles.cssselect("span")[0].text
        title_link = titles.attrib['href']
        return title, title_link

print(abacus_scraper(main_link))

结果:

('2017 - Volume 53 Abacus', '/journal/10.1111/(ISSN)1467-6281/issues?activeYear=2017')

1 个答案:

答案 0 :(得分:4)

一旦从函数返回,就退出for循环。

您应该在算盘中保留一个列表,并在每次迭代时附加到列表中。循环结束后,返回列表。

例如:

import requests
from lxml import html

main_link = "http://onlinelibrary.wiley.com/journal/10.1111/(ISSN)1467-6281/issues"

def abacus_scraper(main_link):
    results = []
    tree = html.fromstring(requests.get(main_link).text)
    for titles in tree.cssselect("a.issuesInYear"):
        title = titles.cssselect("span")[0].text
        title_link = titles.attrib['href']
        results.append([title, title_link])
    return results

print(abacus_scraper(main_link))