如何组合行并使用Pandas重命名标题?

时间:2015-02-09 02:42:06

标签: python json pandas

我想从stats.nba.com获得分数。所有数据都包含在json中。在阅读了json并放入数据帧后,我有以下示例数据:

Season     Date      Team        PTS   Game_id
 2015   07/02/2015  Chicago      107  0021400758
 2015   07/02/2015  New Orleans   72  0021400758
 2015   07/02/2015  Brooklyn      77  0021400759
 2015   07/02/2015  Washington   114  0021400759

我的目标是将行与相同的game_id组合并重命名列,例如PTS为HOME_PTS和AWAY_PTS。

到目前为止,我使用词典将所有内容都放在正确的形状中。

我的代码:

import requests
import json
import datetime
from dateutil.parser import parse
import pandas as pd

def read_json(json_url, retries=5):
        headers = {'User-agent': 'Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.0.1) \
             Gecko/2008071615 Fedora/3.0.1-1.fc9 Firefox/3.0.1'}
        errors = 0
        json_data = None
        while errors <= retries:
            r = requests.get(json_url, headers=headers)
            json_data = r.json()

            if json_data:
                break
        return json_data

def get_team_status(x):
    global group_by_game_id
    game_id = x["GAME_ID"]
    team = x["TEAM_CITY_NAME"]
    team_index = group_by_game_id.get_group(game_id)["TEAM_CITY_NAME"].tolist().index(team)
    status_dict = {0: "Away", 1: "Home"}
    team_status = status_dict[team_index]
    return team_status

def parse_score(day):
    global group_by_game_id
    json_url = "http://stats.nba.com/stats/scoreboardV2?DayOffset=0&LeagueID=00&gameDate=" + day 
    json_data = read_json(json_url)

    linescores = filter(lambda x: x["name"] == "LineScore", json_data["resultSets"])[0]
    rowsets = linescores["rowSet"]
    headers = linescores["headers"]
    data_dict = map(lambda x:  dict(zip(headers, x)), rowsets)

    df = pd.DataFrame(data_dict)
    # Find Home, Away
    group_by_game_id = df.groupby("GAME_ID")
    df["TEAM_STATUS"] = df.apply(get_team_status, axis=1)

    # Fix date
    df["GAME_DATE_EST"] = df.apply(lambda x: parse(x["GAME_DATE_EST"]).strftime("%d/%m/%Y"), axis=1)

    # Create new columns based on TEAM_STATUS
    columns = df.columns.tolist()
    data = []
    df_values = df.values.tolist()
    for row in df_values:
        new_columns = []
        if "Home" in row:
            for column in columns:
                if column != "TEAM_STATUS":
                    if "TEAM" in column:
                        new_columns.append("HOME" + column)
                    elif "PTS" in column:
                        new_columns.append("HOME_" + column)
                    else:
                        new_columns.append(column)
        elif "Away" in row:
            for column in columns:
                if column != "TEAM_STATUS":
                    if "TEAM" in column:
                        new_columns.append("AWAY" + column)
                    elif "PTS" in column:
                        new_columns.append("AWAY_" + column)
                    else:
                        new_columns.append(column)
        row_dict = dict(zip(new_columns, row))
        data.append(row_dict)

    # Merge same matches on game_id
    new_data = []
    for index, row in enumerate(data):
        game_id = row["GAME_ID"]
        for index2, row2 in enumerate(data):
            game_id2 = row2["GAME_ID"]
            if game_id == game_id2 and index2 > index:
                row.update(row2)
                new_data.append(row)

    df_data = pd.DataFrame(new_data)
    return df_data

df_data = parse_score("02/07/2015")

1 个答案:

答案 0 :(得分:0)

您可以执行以下操作

给出您的示例数据框

df = pd.DataFrame({'Date': {0: '07/02/2015', 1: '07/02/2015', 2: '07/02/2015', 3: '07/02/2015'},
 'Game_id': {0: 21400758, 1: 21400758, 2: 21400759, 3: 21400759},
 'PTS': {0: 107, 1: 72, 2: 77, 3: 114},
 'Season': {0: 2015, 1: 2015, 2: 2015, 3: 2015},
 'Team': {0: 'Chicago', 1: 'NewOrleans', 2: 'Brooklyn', 3: 'Washington'}})

然后创建两个帧(离开和回家)按Game_id分组并取第一个/离开和最后一个/主页

修改在评论中为每个问题添加数据

awayTeams = df[['Date','Game_id','Team','PTS']].groupby('Game_id').first()
homeTeams = df[['Game_id','Team','PTS']].groupby('Game_id').last()

然后合并()

pd.merge(awayTeams,homeTeams,left_index=True,right_index=True,suffixes=['_away','_home'])

会给你

            Team_away   PTS_away    Team_home   PTS_home
Game_id             
21400758    Chicago     107         NewOrleans  72
21400759    Brooklyn    77          Washington  114

如果您不喜欢合并后的名称,您可以随时使用rename()更改它们。例如

yourdataframe.rename(columns={'Team_away' : 'whatever',
                              'Team_home' : 'whatever2'},inplace=True)