beautifulSoup for look只显示最后的结果

时间:2018-02-07 22:22:18

标签: python for-loop web-scraping beautifulsoup

我目前正在尝试学习python web抓取。我完成了与教程中完全相同的一切。但循环不起作用。如果我用print测试它,它只显示最后一个条目。

import requests
from bs4 import BeautifulSoup as soup
import lxml

url = "https://www.moviepilot.de/dvd/dvds-neu"

agent = {"User-Agent":'Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.115 Safari/537.36'}

page = requests.get(url, headers=agent)

page_soup = soup(page.content, "html.parser")

ergebnisse = page_soup.findAll("li", {"class":"movie"})

for container in ergebnisse:
    filmname = container.a["title"]



print(filmname)

3 个答案:

答案 0 :(得分:1)

每次循环时,都会覆盖filmname的值,因此可能只打印最后一个。您需要将每个电影名称添加到循环内的空列表中。

试试这个:

import requests
from bs4 import BeautifulSoup as soup
import lxml

url = "https://www.moviepilot.de/dvd/dvds-neu"

agent = {"User-Agent":'Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.115 Safari/537.36'}

page = requests.get(url, headers=agent)

page_soup = soup(page.content, "html.parser")

ergebnisse = page_soup.findAll("li", {"class":"movie"})
films = []
for container in ergebnisse:
    filmname = container.a["title"]
    films.append(filmname)



print(films)

答案 1 :(得分:1)

目前,您的for循环正在重新分配filmname,因此,只会存储上一次迭代的值。但是,您可以使用列表推导来存储所有必需的值:

ergebnisse = page_soup.findAll("li", {"class":"movie"})
names = [container.a["title"] for container in ergebnisse]

答案 2 :(得分:1)

你也可以使用选择器做同样的事情:

import requests
from bs4 import BeautifulSoup as soup

URL = "https://www.moviepilot.de/dvd/dvds-neu"

page = requests.get(URL, headers={"User-Agent":"Mozilla/5.0"})

page_soup = soup(page.content, "html.parser")

container = '\n'.join([item["title"] for item in page_soup.select("li.movie a")])
print(container)