Python ValueError:要解压缩的值太多

时间:2014-08-31 09:26:36

标签: python

我正在构建一个抓取工具来提取文章标题&网址。我尝试运行以下代码,但我在标题中得到错误。我需要定义一本词典吗?我做错了什么?

def get_page(page):
    from urllib.request import urlopen
    html = urlopen(page).read()
    p = str(html, encoding='utf-8')
    return p

def get_next_target(page):
    start_link = page.find('title may-blank" href=')
    start_quote = page.find('"', start_link)
    end_quote = page.find ('"', start_quote + 1)
    url = page[start_quote+1:end_quote] # Gets Article URL
    start_title = page.find (">", end_quote)
    end_title = page.find ("<", start_title)
    title = page[start_title+1:end_title] # Gets Article Title
    return title, url, end_quote

def print_all_links(page):
    while True:
        url, endpos = get_next_target(page)
        if url:
            print("%s, %s" % (title, url))
            page = page[endpos:]
        else:
            break

reddit_url = 'http://www.reddit.com/r/worldnews'

print(print_all_links(reddit_url))

2 个答案:

答案 0 :(得分:2)

get_next_target函数返回一个包含3个元素的元组,但是你将它们解包为2个变量。你做了

title, url, endpos = get_next_target(page)

答案 1 :(得分:0)

你的问题在这里(正如另一个已经指出的那样):

def print_all_links(page):
    while True:
        url, endpos = get_next_target(page)
        if url:
            print("%s, %s" % (title, url))
            page = page[endpos:]
        else:
            break

get_next_target(page)返回3个elemens。

你需要这个

title, url, endpos = get_next_target(page)

而不是

url, endpos = get_next_target(page)