网页刮刮多个网址

时间:2017-11-19 07:18:11

标签: python json web-scraping

我有我想要的内容所需的代码,但是我想要浏览到目前为止所播放的所有gameId而不仅仅是URL中的那个。我想改变2017020001并让它进入2017021272或直到1272年左右的赛季结束我相信。如何使用下面的代码完成?

import csv
import requests
import os

req = requests.get('https://statsapi.web.nhl.com/api/v1/game/2017020001/feed/live?site=en_nhl')
data = req.json()

my_data = []
pk = data['gameData']['game']['pk']
for item in data['liveData']['plays']['allPlays']:
    players = item.get('players')
    if players:
        player_a = players[0]['player']['fullName'] if len(players) > 0 else None
        player_b = players[1]['player']['fullName'] if len(players) > 1 else None
    else:
        player_a, player_b = None, None
    event = item['result']['event']
    time = item['about']['periodTime']
    triCode = item.get('team', {}).get('triCode')
    coordinates_x, coordinates_y = item['coordinates'].get('x'), item['coordinates'].get('y')
    my_data.append([pk, player_a, player_b, event, time, triCode, coordinates_x, coordinates_y])

headers = ["pk", "player_a", "player_b", "event", "time", "triCode", "coordinates_x", "coordinates_y"]

with open("NHL_2017020001.csv", "a", newline='') as f:
    writer = csv.writer(f)
    writer.writerow(headers)
    writer.writerows(my_data)
f.close()

2 个答案:

答案 0 :(得分:2)

如果游戏id按顺序编号,那么就像在for循环中嵌套所有代码一样简单,迭代遍历所有游戏ID并使用str.format()在这种情况下为数字添加必要的填充零件会改变:

import csv
import requests
import os

for i in range(1, 1273):
    url = 'https://statsapi.web.nhl.com/api/v1/game/201702{:04d}/feed/live?site=en_nhl'.format(i)
    req = requests.get(url)
    req.raise_for_status()
    data = req.json()
    my_data = []
    pk = data['gameData']['game']['pk']
    for item in data['liveData']['plays']['allPlays']:
        players = item.get('players')
        if players:
            player_a = players[0]['player']['fullName'] if len(players) > 0 else None
            player_b = players[1]['player']['fullName'] if len(players) > 1 else None
        else:
            player_a, player_b = None, None
            event = item['result']['event']
            time = item['about']['periodTime']
            triCode = item.get('team', {}).get('triCode')
        coordinates_x, coordinates_y = item['coordinates'].get('x'), item['coordinates'].get('y')
        my_data.append([pk, player_a, player_b, event, time, triCode, coordinates_x, coordinates_y])

        headers = ["pk", "player_a", "player_b", "event", "time", "triCode", "coordinates_x", "coordinates_y"]

    with open("NHL_201702{:04d}.csv".format(i), "a", newline='') as f:
        writer = csv.writer(f)
        writer.writerow(headers)
        writer.writerows(my_data)

最后一个更正是使用with ... as使得您无需明确关闭文件。 您可以找到有关使用str.format()here

的其他信息

答案 1 :(得分:1)

您应该使用for-loop

迭代您的代码

这样的事情应该有效:

import csv
import requests
import os

for x in range(2017020001, 2017021273):
    req = requests.get('https://statsapi.web.nhl.com/api/v1/game/%s/feed/live?site=en_nhl' % x)
    data = req.json()

    my_data = []
    pk = data['gameData']['game']['pk']
    for item in data['liveData']['plays']['allPlays']:
        players = item.get('players')
        if players:
            player_a = players[0]['player']['fullName'] if len(players) > 0 else None
            player_b = players[1]['player']['fullName'] if len(players) > 1 else None
        else:
            player_a, player_b = None, None
        event = item['result']['event']
        time = item['about']['periodTime']
        triCode = item.get('team', {}).get('triCode')
        coordinates_x, coordinates_y = item['coordinates'].get('x'), item['coordinates'].get('y')
        my_data.append([pk, player_a, player_b, event, time, triCode, coordinates_x, coordinates_y])

    headers = ["pk", "player_a", "player_b", "event", "time", "triCode", "coordinates_x", "coordinates_y"]

    with open("NHL_2017020001.csv", "a", newline='') as f:
        writer = csv.writer(f)
        writer.writerow(headers)
        writer.writerows(my_data)
    f.close()