试图弄清楚如何通过beautifulsoup提取游戏名称
我认为我的HTML方面存在问题
这里我到目前为止:
from requests import get
url = 'https://howlongtobeat.com/game.php?id=38050'
response = get(url)
from bs4 import BeautifulSoup
html_soup = BeautifulSoup(response.text, 'html.parser')
game_length = html_soup.find_all('div', class_='game_times')
length = (game_length[-1].find_all({'li': ' short time_100 shadow_box'})[-1].contents[3].get_text())
print(length)
game_name = html_soup.find_all('div', class_='profile_header_game')
game = (game_name[].find({"profile_header shadow_text"})[].contents[].get_text())
print(game)
我得到了长度而不是游戏名称为什么?
用于打印(长度)打印:
31 Hours
但是对于打印(游戏)打印:
game_name = html_soup.find_all('div',class _ ='profile_header_game')
game =(game_name []。find({“profile_header shadow_text”})[]。contents []。get_text()) 文件“”,第1行 game =(game_name []。find({“profile_header shadow_text”})[]。contents []。get_text()) ^ SyntaxError:语法无效
打印(游戏) Traceback(最近一次调用最后一次): 文件“”,第1行,in NameError:名称'game'未定义
我做错了什么?
答案 0 :(得分:1)
您的代码中似乎存在一些语法问题。这是一个更正版本:
from bs4 import BeautifulSoup
import requests
url = 'https://howlongtobeat.com/game.php?id=38050'
response = requests.get(url)
html_soup = BeautifulSoup(response.text, 'html.parser')
game_times_tag = html_soup.find('div', class_='game_times')
game_time_list = []
for li_tag in game_times_tag.find_all('li'):
title = li_tag.find('h5').text.strip()
play_time = li_tag.find('div').text.strip()
game_time_list.append((title, play_time))
for game_time in game_time_list:
print(game_time)
profile_header_tag = html_soup.find("div", {"class": "profile_header shadow_text"})
game_name = profile_header_tag.text.strip()
print(game_name)
答案 1 :(得分:0)
较短版本
game_length = html_soup.select('div.game_times li div')[-1].text
game_name = html_soup.select('div.profile_header')[0].text
developer = html_soup.find_all('strong', string='\nDeveloper:\n')[0].next_sibling