刮json网页

时间:2017-10-20 20:46:25

标签: python json web-scraping python-requests

我对网络抓取非常陌生,并且在从nba.com抓取一些NBA球员数据时遇到了一些麻烦。我首先尝试使用bs4抓取页面,但遇到了一个问题,经过一些研究后我认为是由于我读过的文章中的“XHR”。我能够找到json格式数据的网址,但我的python程序似乎陷入困境,从不加载数据。再次,我在网络抓取方面非常新,但我想我会看到我是否偏离这里......有什么建议吗?谢谢! (以下代码)

import requests
import json

url = "http://stats.nba.com/stats/leaguedashplayerstats?College=&Conference=&Country=&DateFrom=&DateTo=&Division=&DraftPick=&DraftYear=&GameScope=&GameSegment=&Height=&LastNGames=0&LeagueID=00&Location=&MeasureType=Base&Month=0&OpponentTeamID=0&Outcome=&PORound=0&PaceAdjust=N&PerMode=PerGame&Period=0&PlayerExperience=&PlayerPosition=&PlusMinus=N&Rank=N&Season=2017-18&SeasonSegment=&SeasonType=Regular+Season&ShotClockRange=&StarterBench=&TeamID=0&VsConference=&VsDivision=&Weight="

resp = requests.get(url=url)
data = json.loads(resp.text)
print(data)

2 个答案:

答案 0 :(得分:1)

给它一个机会。它将根据我定义的标题从该页面生成所有类别。顺便说一下,你初次尝试时没有得到回应,因为网页期望你的请求中有一个User-Agent,以确保请求不是来自机器人,而是来自任何真正的浏览器。但是,我伪造了它并找到了答案。

import requests

url = "http://stats.nba.com/stats/leaguedashplayerstats?College=&Conference=&Country=&DateFrom=&DateTo=&Division=&DraftPick=&DraftYear=&GameScope=&GameSegment=&Height=&LastNGames=0&LeagueID=00&Location=&MeasureType=Base&Month=0&OpponentTeamID=0&Outcome=&PORound=0&PaceAdjust=N&PerMode=PerGame&Period=0&PlayerExperience=&PlayerPosition=&PlusMinus=N&Rank=N&Season=2017-18&SeasonSegment=&SeasonType=Regular+Season&ShotClockRange=&StarterBench=&TeamID=0&VsConference=&VsDivision=&Weight="
resp = requests.get(url,headers={'User-Agent':'Mozilla/5.0'})
data = resp.json()

storage = data['resultSets']
for elem in storage:
    all_list = elem['rowSet']

    for item in all_list:
        Player_Id = item[0]
        Player_name = item[1]
        Team_Id = item[2]
        Team_abbr = item[3]
        print("Player_Id: {} Player_name: {} Team_Id: {} Team_abbr: {}".format(
            Player_Id,Player_name,Team_Id,Team_abbr))

答案 1 :(得分:0)

刚才意识到这是因为用户代理标题不同......一旦添加了它就会起作用