'元组'对象没有属性'追加'

时间:2014-11-27 04:12:52

标签: python-3.x

我正在尝试通过递归从给定文本中提取所有链接。我遇到的问题是我想在列表中存储链接,无论出于何种原因,调用append都会导致我的代码崩溃。

def findLink(text, start, *links):
    linkStart = text.find('http', start);
    if linkStart == -1:
        return

    linkEnd = text.find('">', linkStart);
    url = text[linkStart:linkEnd];
    links.append(url);
    findLink(text, linkEnd + 2, links);


source = '''<html xmlns="http://www.w3.org/1999/xhtml">
          <head>
          <title>Udacity</title>
          </head>
          <body>
          <h1>Udacity</h1>
          <p><b>Udacity</b> is a private institution of
          <a href="http://www.wikipedia.org/wiki/Higher_education">higher education founded by</a> <a href="http://www.wikipedia.org/wiki/Sebastian_Thrun">Sebastian Thrun</a>, David Stavens, and Mike Sokolsky with the goal to provide university-level education that is "both high quality and low cost".</p>   
          <p> It is the outgrowth of a free computer science class offered in 2011 through Stanford University. Currently, Udacity is working on its second course on building a search engine. Udacity was announced at the 2012 <a href="http://www.wikipedia.org/wiki/Digital_Life_Design">Digital Life Design</a> conference.</p>      
          </body>
          </html>'''

links = list();
findLink(source, 0, links);

for link in links:
    print(link);

1 个答案:

答案 0 :(得分:0)

首先,两个一般性评论:

  1. 您不需要在行尾添加分号。

  2. Don't parse HTML with regular expressions。 Python在标准库中有convenient xml parser

  3. 现在,关于你的问题。当你最后用varargs写一个函数时,就像f(a, b, *c)一样,Python使c成为一个元组。元组是不可变的,因此它们没有append()方法。因此,您可以将其转换为list,然后使用append(),或转到(半)纯粹并写入links = links + (url,)

    此外,稍后调用递归函数的方式也不正确。你需要写

    findLink(text, linkEnd + 2, *links)
    

    links作为varargs传递(将同时用于列表和元组)。话虽如此,没有理由这样传递它,因为在大量的HTML上会导致很多参数传递给函数,而我不确定Python会如何处理它。只需将其作为列表或元组正常传递。