Python:从列表中解析只打印最后一项,而不是全部?

时间:2015-04-19 18:18:51

标签: python parsing for-loop printing beautifulsoup

我的代码:

from urllib2 import urlopen
from bs4 import BeautifulSoup

url = "https://realpython.com/practice/profiles.html"

html_page = urlopen(url)
html_text = html_page.read()

soup = BeautifulSoup(html_text)

links = soup.find_all('a', href = True)

files = []
base = "https://realpython.com/practice/"


def page_names():
    for a in links:
        files.append(base + a['href'])

page_names()

for i in files:
    all_page = urlopen(i)

all_text = all_page.read()
all_soup = BeautifulSoup(all_text)
print all_soup

解析的前半部分收集了三个链接,后半部分应打印出所有的html。

可悲的是,它只会打印最后一个链接的HTML。

可能是因为

for i in files:
    all_page = urlopen(i)

之前有8行代码在for文件中为for i提供服务:目的但我想清理它并将其归结为这两个。好吧,显然不是因为它不起作用。

虽然没有错误!

3 个答案:

答案 0 :(得分:3)

您只在循环中存储最后一个值,您需要在循环内移动所有赋值和打印:

for i in files:
    all_page = urlopen(i)
    all_text = all_page.read()
    all_soup = BeautifulSoup(all_text)
    print all_soup

如果您要使用函数,我会传递参数并创建列表,否则您可能会得到意外的输出:

def page_names(b,lnks):
    files = []
    for a in lnks:
        files.append(b + a['href'])
    return files


for i in page_names(base,links):
    all_page = urlopen(i)
    all_text = all_page.read()
    all_soup = BeautifulSoup(all_text)
    print all_s

然后,您的函数可以返回列表解析:

def page_names(b,lnks):
    return [b + a['href'] for a in lnks]

答案 1 :(得分:1)

在for循环中,你正在向all_page求助,它会在每次循环时覆盖它,所以它只会有最后一次迭代的值。

如果你想让它为每个页面打印all_soup,你也可以将这3行缩进到for循环中,然后每次循环执行它们。

答案 2 :(得分:1)

这似乎只是一个格式化问题,你可能打算在循环中打印它,对吗?

for i in files:
    all_page = urlopen(i)
    all_text = all_page.read()
    all_soup = BeautifulSoup(all_text)
    print all_soup