我的代码只使用BeautifulSoup解析器从IMDB中获取热门电影 网站。
代码如下:
from bs4 import BeautifulSoup
import requests
import sys
url = "http://www.imdb.com/chart"
response = requests.get(url)
soup = BeautifulSoup(response.text)
tr = soup.findChildren("tr")
tr = iter(tr)
next(tr)
for movie in tr:
title = movie.find('td',{'class':'titleColumn'}).find('a').contents[0]
year = movie.find('td',{'class':'titleColumn'}).find('span',{'class':'secondaryInfo'}).contents[0]
rating = movie.find('td',{'class':'ratingColumn imdbRating'}).find('strong').contents[0]
row = title + "-" + year + " " + " " + rating
print(row)
它出现以下错误:
Traceback (most recent call last):
File "imdb_scrap.py", line 12, in <module>
year = movie.find('td',{'class':'titleColumn'}).find('span',{'class':'secondaryInfo'}).contents[0]
AttributeError: 'NoneType' object has no attribute 'contents'
给定的代码第一次运行时输出不完整,而错误在以后根本没有运行。
示例输出:
Stalker-(1979) 8.1
Paper Moon-(1973) 8.1
The Maltese Falcon-(1941) 8.1
The Truman Show-(1998) 8.1
Hachi: A Dog's Tale-(2009) 8.1
Le notti di Cabiria-(1957) 8.1
The Princess Bride-(1987) 8.1
Kaze no tani no Naushika-(1984) 8.1
Munna Bhai M.B.B.S.-(2003) 8.1
Before Sunrise-(1995) 8.1
Harry Potter and the Deathly Hallows: Part 2-(2011) 8.0
The Grapes of Wrath-(1940) 8.0
Prisoners-(2013) 8.0
Rocky-(1976) 8.0
Star Wars: Episode VII - The Force Awakens-(2015) 8.0
Touch of Evil-(1958) 8.0
Sholay-(1975) 8.0
Catch Me If You Can-(2002) 8.0
Gandhi-(1982) 8.0
任何帮助或建议将不胜感激。
答案 0 :(得分:0)
查看代码,错误在于
.find('span',{'class':'secondaryInfo'}).contents[0]
这可能不是唯一的错误,但这是第一个看到它给你的错误消息的错误。我也知道变量年的前半部分是正确的,因为它与变量标题相同。 考虑到这一点,可能是因为你假设年份位于[0],但这显然是错误的,因为它告诉你位置[0]根本没有信息,所以如果你要将它打印为
print(movie.find('td',{'class':'titleColumn'}).find('span',{'class':'secondaryInfo'}).contents[0])
我保证你会得到None作为答案,那就是没有属性的NoneType。