我想从stats.nba.com获得分数。所有数据都包含在json中。在阅读了json并放入数据帧后,我有以下示例数据:
Season Date Team PTS Game_id
2015 07/02/2015 Chicago 107 0021400758
2015 07/02/2015 New Orleans 72 0021400758
2015 07/02/2015 Brooklyn 77 0021400759
2015 07/02/2015 Washington 114 0021400759
我的目标是将行与相同的game_id组合并重命名列,例如PTS为HOME_PTS和AWAY_PTS。
到目前为止,我使用词典将所有内容都放在正确的形状中。
我的代码:
import requests
import json
import datetime
from dateutil.parser import parse
import pandas as pd
def read_json(json_url, retries=5):
headers = {'User-agent': 'Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.0.1) \
Gecko/2008071615 Fedora/3.0.1-1.fc9 Firefox/3.0.1'}
errors = 0
json_data = None
while errors <= retries:
r = requests.get(json_url, headers=headers)
json_data = r.json()
if json_data:
break
return json_data
def get_team_status(x):
global group_by_game_id
game_id = x["GAME_ID"]
team = x["TEAM_CITY_NAME"]
team_index = group_by_game_id.get_group(game_id)["TEAM_CITY_NAME"].tolist().index(team)
status_dict = {0: "Away", 1: "Home"}
team_status = status_dict[team_index]
return team_status
def parse_score(day):
global group_by_game_id
json_url = "http://stats.nba.com/stats/scoreboardV2?DayOffset=0&LeagueID=00&gameDate=" + day
json_data = read_json(json_url)
linescores = filter(lambda x: x["name"] == "LineScore", json_data["resultSets"])[0]
rowsets = linescores["rowSet"]
headers = linescores["headers"]
data_dict = map(lambda x: dict(zip(headers, x)), rowsets)
df = pd.DataFrame(data_dict)
# Find Home, Away
group_by_game_id = df.groupby("GAME_ID")
df["TEAM_STATUS"] = df.apply(get_team_status, axis=1)
# Fix date
df["GAME_DATE_EST"] = df.apply(lambda x: parse(x["GAME_DATE_EST"]).strftime("%d/%m/%Y"), axis=1)
# Create new columns based on TEAM_STATUS
columns = df.columns.tolist()
data = []
df_values = df.values.tolist()
for row in df_values:
new_columns = []
if "Home" in row:
for column in columns:
if column != "TEAM_STATUS":
if "TEAM" in column:
new_columns.append("HOME" + column)
elif "PTS" in column:
new_columns.append("HOME_" + column)
else:
new_columns.append(column)
elif "Away" in row:
for column in columns:
if column != "TEAM_STATUS":
if "TEAM" in column:
new_columns.append("AWAY" + column)
elif "PTS" in column:
new_columns.append("AWAY_" + column)
else:
new_columns.append(column)
row_dict = dict(zip(new_columns, row))
data.append(row_dict)
# Merge same matches on game_id
new_data = []
for index, row in enumerate(data):
game_id = row["GAME_ID"]
for index2, row2 in enumerate(data):
game_id2 = row2["GAME_ID"]
if game_id == game_id2 and index2 > index:
row.update(row2)
new_data.append(row)
df_data = pd.DataFrame(new_data)
return df_data
df_data = parse_score("02/07/2015")
答案 0 :(得分:0)
您可以执行以下操作
给出您的示例数据框
df = pd.DataFrame({'Date': {0: '07/02/2015', 1: '07/02/2015', 2: '07/02/2015', 3: '07/02/2015'},
'Game_id': {0: 21400758, 1: 21400758, 2: 21400759, 3: 21400759},
'PTS': {0: 107, 1: 72, 2: 77, 3: 114},
'Season': {0: 2015, 1: 2015, 2: 2015, 3: 2015},
'Team': {0: 'Chicago', 1: 'NewOrleans', 2: 'Brooklyn', 3: 'Washington'}})
然后创建两个帧(离开和回家)按Game_id分组并取第一个/离开和最后一个/主页
修改在评论中为每个问题添加数据
awayTeams = df[['Date','Game_id','Team','PTS']].groupby('Game_id').first()
homeTeams = df[['Game_id','Team','PTS']].groupby('Game_id').last()
然后合并()
pd.merge(awayTeams,homeTeams,left_index=True,right_index=True,suffixes=['_away','_home'])
会给你
Team_away PTS_away Team_home PTS_home
Game_id
21400758 Chicago 107 NewOrleans 72
21400759 Brooklyn 77 Washington 114
如果您不喜欢合并后的名称,您可以随时使用rename()更改它们。例如
yourdataframe.rename(columns={'Team_away' : 'whatever',
'Team_home' : 'whatever2'},inplace=True)