我从以下网站获取信息: “http://www.mobygames.com/game/wheelman/view-moby-score”。这是我的代码
url_credit = "http://www.mobygames.com/game/wheelman/view-moby-score"
response = requests.get(url_credit, headers=headers)
soup = BeautifulSoup(response.text, "lxml")
table = soup.find("table", class_="reviewList table table-striped table-condensed table-hover").select('tr[valign="top"]')
for row in table[1:]:
print(row)
x = soup.select('td[class="left"]').get("colspan")
我想要的输出是这样的:
platform total_votes rating_category score total_score
PlayStation3 None None None None
Windows 6 Acting 4.2 4.1
Windows 6 AI 3.7 4.1
Windows 6 Gameplay 4.0 4.1
主要问题是在平台列上具有用于相应观察的平台名称。 我怎么能得到它?
答案 0 :(得分:1)
您可以看到具有新平台的行有3列,而其他行有2列。您可以使用它来更改平台。
您可以看到 PlayStation 等行包含<td>
标记的列colspan="2" class="center"
。使用它来处理 PlayStation 等案例。
代码:
url_credit = "http://www.mobygames.com/game/wheelman/view-moby-score"
response = requests.get(url_credit, headers=headers)
soup = BeautifulSoup(response.text, "lxml")
table = soup.find("table", class_="reviewList table table-striped table-condensed table-hover").select('tr[valign="top"]')
platform = ''
total_votes, total_score = None, None
for row in table[1:]:
# handle cases like playstation
if row.find('td', colspan='2', class_='center'):
platform = row.find('td').text
total_score, total_votes = None, None
print('{} | {} | {} | {} | {}'.format(platform, total_votes, None, None, total_score))
continue
cols = row.find_all('td')
if len(cols) == 3:
platform = cols[0].text
total_votes = cols[1].text
total_score = cols[2].text
continue
print('{} | {} | {} | {} | {}'.format(platform, total_votes, cols[0].text, cols[1].text, total_score))
输出:
PlayStation 3 | None | None | None | None
Windows | 6 | Acting | 4.2 | 4.1
Windows | 6 | AI | 3.7 | 4.1
Windows | 6 | Gameplay | 4.0 | 4.1
Windows | 6 | Graphics | 4.2 | 4.1
Windows | 6 | Personal Slant | 4.3 | 4.1
Windows | 6 | Sound / Music | 4.3 | 4.1
Windows | 6 | Story / Presentation | 3.8 | 4.1
Xbox 360 | 5 | Acting | 3.8 | 3.5
Xbox 360 | 5 | AI | 3.2 | 3.5
Xbox 360 | 5 | Gameplay | 3.4 | 3.5
Xbox 360 | 5 | Graphics | 3.6 | 3.5
Xbox 360 | 5 | Personal Slant | 3.6 | 3.5
Xbox 360 | 5 | Sound / Music | 3.4 | 3.5
Xbox 360 | 5 | Story / Presentation | 3.8 | 3.5
注意:通过 print ,我的意思是将这些值保存在您正在使用的任何列表/数据框架中。我只是使用print()
来展示如何在需要时更改platform
变量。