在熊猫中添加特定团队分数

时间:2017-12-20 10:05:24

标签: python python-3.x pandas

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from matplotlib import style
style.use("fivethirtyeight")

df_2010=pd.read_csv("c:/users/ashub/downloads/documents/MLB 2010.csv",index_col=0)
df_new=df_2010[["Home Score","Away Score","Home Team","Away Team","Home Hits","Away Hits","Home Err","Away Err"]]
#print(df_2010)

flag=df_2010["Home Score"]>df_2010["Away Score"]
df_new["Home Score Index"]= flag.astype(int)
df_new["Away Score Index"]= (~flag).astype(int)

flag1=df_2010["Home Score"]/df_2010["Home Hits"]
df_new["Home to Hits Index"]= flag1.astype(float)

flag1=df_2010["Away Score"]/df_2010["Away Hits"]
df_new["Away to Hits Index"]= flag1.astype(float)

flag1=df_2010["Home Err"]/df_2010["Home Score"]
df_new["Home Error Factor"]= flag1.astype(float)

flag1=df_2010["Away Err"]/df_2010["Away Score"]
df_new["Away Error Factor"]= flag1.astype(float)

df_new["Home Error Factor"].fillna(0,inplace=True)
df_new["Away Error Factor"].fillna(0,inplace=True)


wins_home=sum(df_new["Home Score Index"].tolist())
total_games=len(df_2010)
prob_win_at_home=wins_home/total_games
prob_lose_at_home=1-prob_win_at_home
print(prob_win_at_home)
print(prob_lose_at_home)


df_new.to_html("c:/users/ashub/desktop/ashu.html")

Sample Data

现在我想计算一个特定球队的胜利,当他们在主场比赛时他们在主场比赛中没有比赛,我也想计算客场胜利的数量。如何接近此?

2 个答案:

答案 0 :(得分:2)

我认为这是一种方法吗?

import pandas as pd

games = pd.DataFrame(data = {"home" : ["A", "B", "A", "A", "B"],
                             "away" : ["B", "C", "C", "B", "A"],
                             "homescore" : [0, 1, 4, 3, 0],
                             "awayscore" : [1, 2, 2, 1, 1]})

games["homewin"] = games.apply(lambda row: 1 if row.homescore > row.awayscore else 0, axis=1)

g = games.groupby(by=["home", "homewin"]).size().reset_index(name="games")
g["homewin"] = g.apply(lambda row: row.homewin*row.games, axis=1)
g = g.groupby(by=["home"]).sum()
g["homewinratio"] = g["homewin"]/g["games"]

    g
    Out[105]: 
          homewin  games  homewinratio
    home                              
    A           2      3      0.666667
    B           0      2      0.000000

虽然我确定有更好的 - 我也很好奇

答案 1 :(得分:2)

当您groupby时,您可以同时以多种方式进行汇总。使用erocoar的示例数据框:

import pandas as pd

games = pd.DataFrame(data={'Home': ['A', 'B', 'A', 'A', 'B'],
                           'Away': ['B', 'C', 'C', 'B', 'A'],
                           'Home Score': [0, 1, 4, 3, 0],
                           'Away Score': [1, 2, 2, 1, 1]})

games['Home Win'] = (games['Home Score'] > games['Away Score']).astype(int)

summary = games.groupby('Home').agg({'Home Win': 'sum',
                                     'Home': 'count'})

summary['Home Win Ratio'] = summary['Home Win'] / summary['Home']

会给你输出:

      Home Win  Home  Home Win Ratio
Home                                
A            2     3        0.666667
B            0     2        0.000000