为什么我的网页抓取没有返回任何内容?

时间:2019-06-28 05:26:30

标签: python web-scraping containers

我正在尝试使用Python在开放网站上的表格中进行网页抓取。我已检查以确保它将使用命令“ page_soup.p”连接到该站点,并返回带有“ p”标签的项目。

当我检查以确保抓取标签可与命令containers[0]配合使用时,我会遇到:

  

回溯(最近通话最近一次)

     

文件“”,位于

的第1行      

IndexError:列表索引超出范围”

from urllib.request import urlopen as uReq
from bs4 import BeautifulSoup as soup

my_url = 'https://overwatchleague.com/en-us/stats'

# opening up connect, grabbing the page
uClient = uReq(my_url)
page_html = uClient.read()
uClient.close()

# html parsing
page_soup = soup(page_html, "html.parser")

# grabs each player
containers = page_soup.findAll("tr",{"class":"Table-row"})

该标签应该大约有183行,显然0不是我期望的。对我的不当行为有何见解?

1 个答案:

答案 0 :(得分:2)

数据通过JSON加载。为了找出正确的网址,例如在Firefox开发人员工具中,页面建立了哪些网络连接:

import requests
from datetime import timedelta

url = 'https://api.overwatchleague.com/stats/players?stage_id=regular_season&season=2019'

data = requests.get(url).json()

print('{:^12}{:^12}{:^12}{:^20}'.format('Name', 'Team', 'Deaths', 'Time Played'))
print('-' * (12*3+20))
for row in data['data']:
    print('{:^12}'.format(row['name']), end='')
    print('{:^12}'.format(row['team']), end='')
    print('{:^12.2f}'.format(row['deaths_avg_per_10m']), end='')
    t = timedelta(seconds=float(row['time_played_total']))
    print('{:>20}'.format(str(t)))

打印:

    Name        Team       Deaths       Time Played     
--------------------------------------------------------
    Ado         WAS         5.47         15:23:08.217194
   Adora        HZS         3.72          9:08:57.586787
 Agilities      VAL         5.27         17:16:59.668653
    Aid         TOR         5.08          8:02:19.102897
   AimGod       BOS         4.69         17:04:31.769137
    aKm         DAL         4.64         16:57:14.261245
   alemao       BOS         4.99          2:36:25.171021
   ameng        CDH         6.24         16:06:12.084212
   Anamo        NYE         2.36         17:33:31.143450
 Architect      SFS         4.33          3:18:45.065564
   ArHaN        HOU         6.39          1:54:10.439213
    ArK         WAS         2.50          9:32:57.421203

...and so on.