Question

我搜索了很多关于此问题，但我可能会使用错误的条款，我发现的答案不是很相关，或者对我来说太过分了。

所以，我有一个非常简单的程序。我有一个函数，它读取一个网页，使用BeautifulSoup扫描href链接，获取它创建的链接之一并跟随它。该函数通过用户输入获取第一个链接。

现在我希望这个函数能够使用它找到的链接自动重新运行，但我只能通过使用它获得的第一个变量来创建无限循环。这一切都是在受控环境中完成的，最大深度为10个链接。

这是我的代码：

import urllib
from BeautifulSoup import *
site=list()

def follinks(x):
    html = urllib.urlopen(x).read()
    bs = BeautifulSoup(html)
    tags = bs('a')
    for tag in tags:
        site.append(tag.get('href', None))
    x=site[2]
    print x
    return;
url1 = raw_input('Enter url:')

如何让它使用x变量并返回启动并重新运行该函数，直到没有更多链接可供使用？我尝试了一些真实的变种，但又以用户给出的网址的无限循环结束。

感谢。

Answer 1

您正在寻找的是递归。这是您在自己的身体定义中调用方法的地方。

def follow_links(x):
    html = urllib.urlopen(x).read()
    bs = BeautifulSoup(html)

    # Put all the links on page x into the pagelinks list
    pagelinks = []
    tags = bs('a')
    for tag in tags:
        pagelinks.append(tag.get('href', None))

    # Track all links from this page in the master sites list
    site += pagelinks

    # Follow the third link, if there is one
    if len(pagelinks) > 2:
        follow_links(pagelinks[2])

Python在函数内更新值并重用它

1 个答案: