我目前正在尝试学习python web抓取。我完成了与教程中完全相同的一切。但循环不起作用。如果我用print测试它,它只显示最后一个条目。
import requests
from bs4 import BeautifulSoup as soup
import lxml
url = "https://www.moviepilot.de/dvd/dvds-neu"
agent = {"User-Agent":'Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.115 Safari/537.36'}
page = requests.get(url, headers=agent)
page_soup = soup(page.content, "html.parser")
ergebnisse = page_soup.findAll("li", {"class":"movie"})
for container in ergebnisse:
filmname = container.a["title"]
print(filmname)
答案 0 :(得分:1)
每次循环时,都会覆盖filmname的值,因此可能只打印最后一个。您需要将每个电影名称添加到循环内的空列表中。
试试这个:
import requests
from bs4 import BeautifulSoup as soup
import lxml
url = "https://www.moviepilot.de/dvd/dvds-neu"
agent = {"User-Agent":'Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.115 Safari/537.36'}
page = requests.get(url, headers=agent)
page_soup = soup(page.content, "html.parser")
ergebnisse = page_soup.findAll("li", {"class":"movie"})
films = []
for container in ergebnisse:
filmname = container.a["title"]
films.append(filmname)
print(films)
答案 1 :(得分:1)
目前,您的for
循环正在重新分配filmname
,因此,只会存储上一次迭代的值。但是,您可以使用列表推导来存储所有必需的值:
ergebnisse = page_soup.findAll("li", {"class":"movie"})
names = [container.a["title"] for container in ergebnisse]
答案 2 :(得分:1)
你也可以使用选择器做同样的事情:
import requests
from bs4 import BeautifulSoup as soup
URL = "https://www.moviepilot.de/dvd/dvds-neu"
page = requests.get(URL, headers={"User-Agent":"Mozilla/5.0"})
page_soup = soup(page.content, "html.parser")
container = '\n'.join([item["title"] for item in page_soup.select("li.movie a")])
print(container)