NHL-API的抓取玩家统计信息

时间:2019-01-02 11:09:59

标签: python python-3.x loops web-scraping

试图从游戏中抓取每个玩家的统计信息,但是在尝试.get()方法时遇到麻烦。

这是API: https://statsapi.web.nhl.com/api/v1/game/2017020002/feed/live

要抓取玩家统计信息,我首先需要他们的ID。第一个for循环通过到达('skaters')并获取其ID并将其放入来实现 在列表player_id中。

接下来的两个for循环是获取主场/客场以及player_id中每个ID的球员统计信息。玩家ID的编号为数字=> 8474756 但是,要抓取player_stats的密钥被命名为“ ID”“ PLAYER NUMBER””,因此每个密钥都有一个唯一的名称,这就是为什么存在for循环的原因。但是我不知道如何使它正常工作。 谢谢您的帮助!

import numpy as np
import pandas as pd
import requests
import json
from sklearn import preprocessing
from sklearn.preprocessing import OneHotEncoder

results = []
player_id = []
 for game_id in range(2018020001, 2018020002, 1):
    url = 'https://statsapi.web.nhl.com/api/v1/game/{}/feed/live'.format(game_id)
    r = requests.get(url)
    game_data = r.json()

for homeaway in ['home','away']:
    player_dict = game_data.get('liveData').get('boxscore').get('teams').get(homeaway).get('skaters')
    player_id.append(player_dict)

for homeaway in ['home', 'away']:
    for playerID in player_id:
        play_dict = game_data.get('liveData').get('boxscore').get('teams')
        .get('homeaway').get('players').get('ID'+player_id).get('person')

这是我必须抓取游戏数据的代码,我想要具有这种类型的输出给我。

import numpy as np
import pandas as pd
import requests
import json
from sklearn import preprocessing
from sklearn.preprocessing import OneHotEncoder

results = []
for game_id in range(2018020598, 2018020650, 1):
    url = 'https://statsapi.web.nhl.com/api/v1/game/{}/boxscore'.format(game_id)
    r = requests.get(url)
    game_data = r.json()

    for homeaway in ['home','away']:

    game_dict = game_data.get('teams').get(homeaway).get('teamStats').get('teamSkaterStats')
    game_dict['team'] = game_data.get('teams').get(homeaway).get('team').get('name')
    game_dict['homeaway'] = homeaway
    game_dict['game_id'] = game_id
    results.append(game_dict)


df = pd.DataFrame(results)

这是一张我希望数据集看起来如何的示例表

PlayerID   Team   Won/lost   opponent   game_id     metric1   metric2 metric_n
  1          LA      1          CAP       0001         10       10        10

1 个答案:

答案 0 :(得分:0)

您的player_id是列表的列表,因此,当您进行for playerID in player_id:时,实际上会遍历子列表,而不是玩家ID。请尝试修改代码:

player_id = {}
results = []

for homeaway in ['home','away']:
    player_dict = game_data.get('liveData').get('boxscore').get('teams').get(homeaway).get('skaters')
    player_id[homeaway] = player_dict

现在player_id是字典,就像{'home': [ID_1, ID_2,...],离开: [ID_3, ID_4, ...]}

for homeaway in player_id:
    for playerID in player_id[homeaway]:
        play_dict = game_data.get('liveData').get('boxscore').get('teams').get(homeaway).get('players').get('ID' + str(playerID)).get('person')
        results.append(play_dict)