难以解决
尝试编写代码(如果存在于game_containers中)game_rating,然后编写game_rating(附加列表等级),否则编写“ na”(附加列表中带有“ na”)
我正在尝试获取游戏名称和等级,以与之匹配:
Headers = ["Game Name:", "Metascore", "Userscore:", "Release Data:", "Publisher:", "Rating:", 'Genre:']
names = []
metascores = []
userscores = []
release_dates = []
release_datesNew = []
publishers = []
ratings = []
ratingsNew = []
genres = []
genresNew = []
for element in i:
url = "http://www.metacritic.com/browse/games/score/metascore/year/pc/filtered?view=detailed&sort=desc&year_selected=" + format(year_number)
print(url)
year_number -= 1
# not sure about this but it works (I was getting blocked by something and this the way I found around it)
req = Request(url, headers={'User-Agent': 'Mozilla/5.0'})
web_byte = urlopen(req).read()
webpage = web_byte.decode('utf-8')
#this grabs the all the text from the page
html_soup = BeautifulSoup(webpage, 'html5lib')
#this is for selecting all the games in from 1 to 100 (the list of them)
game_containers = html_soup.find_all("div", class_="wrap product_wrap")
game_names = html_soup.find_all("div", class_="main_stats")
game_metas = html_soup.find_all("a", class_="basic_stat product_score")
game_users = html_soup.find_all("li", class_='stat product_avguserscore')
game_releases = html_soup.find_all("ul", class_='more_stats')
game_publishers = html_soup.find_all("li", class_='stat publisher')
game_ratings = html_soup.find("li", class_='stat maturity_rating')
game_genres = html_soup.find_all("li", class_='stat genre')
container_number = 0
for containers in game_containers:
if containers.find(containers.game_names) is not None:
names.append(game_names[container_number].text.strip())
else:
names.append("na")
try:
if containers.find(game_ratings) is not None:
ratings.append(game_ratings[container_number].text.strip())
else:
ratings.append("na")
except:
ratings.append("na")
container_number += 1
for x in ratings:
temp = str(x)
temp2 = temp.replace("\n Rating:\n ", "")
temp3 = temp2.replace("\n ", "")
ratingsNew.append(temp3)
我所做的就是找到游戏的“容器”(即(“ div”,class _ =“ wrap product_wrap”))
但无法弄清楚不存在时如何跳过该容器(给列表一个“ na”)
有人可以指出我正确的方向吗?
谢谢。