Question

想知道从howlongtobeat.com获取多个网址的最佳方法是什么

我试图将电子表格放在一起并需要这些数据

我的想法是给我们python3，beautifulsoup和selenium，但我不确定最好的方法去做吧

我使用的是Linux（ubuntu 18.04）命令控制台，可以使用一些提示（对此非常新）

这是我到目前为止的代码：

url = 'https://howlongtobeat.com/game.php?id=38050'

response = get(url)

from bs4 import BeautifulSoup

html_soup = BeautifulSoup(response.text, 'html.parser')
type(html_soup)

game_containers = html_soup.find_all('div', class_ = 'li.short:nth-of-type(2)')

first_game = game_containers[0]
first_game.text

出现错误：

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
IndexError: list index out of range

之后返回：

'\nGod of War (2018) '

我想要的是＆＃34; 30 1/2小时＆＃34;从页面（理想情况下是30.5，但我认为我可以使用excel，除非在此阶段有办法实现）

让我知道如何做到这一点

我需要硒吗？

谢谢，

Answer 1

game_containers = soup.find_all('div', class_='game_times')

返回ResultSet统计信息表格的game_times。

使用[-1]获取最后一项，并获取其text：

print(game_containers[-1].find_all({'li': '    short time_100 shadow_box'})[-1].contents[3].get_text())

打印：
30½ Hours

数据刮痧howlongtobeat.com与python3，美丽的汤和硒（也许）

1 个答案: