在Python中评估运动游戏估算的最优雅方式是什么?

时间:2013-12-29 18:56:53

标签: python

我想评估体育比赛的估计 - 在我的情况下是足球(即足球)比赛。我想使用Python。

基本上,始终有team_home个结果,team_away结果,estimate_homeestimate_away。例如,游戏结束1:0,估算值为0:0 - 这将返回wrong

只有四种可能的案例和结果:

  1. wrong与上面的情况一样
  2. tendency对获胜者的估计是正确的,但不是目标差异(例如3:0
  3. goal difference了解正确的目标差异,例如2:1
  4. right获得准确的正确估算值
  5. 在Python中处理估算和结果的最优雅方法是什么?

4 个答案:

答案 0 :(得分:2)

另一个答案,反映了我对优雅的看法(一个相当主观的参数,我同意)。我想让我的对象由类定义,用OOP构建,并用ORM来管理对象之间的关系。这带来了许多优点和更清晰的代码。

我在这里使用pony ORM,但还有许多其他优秀的选项(最终允许使用更宽松的许可),例如SQLAlchemyDjango's ORM

这是一个完整的示例 - 首先我们定义模型:

from pony.orm import *

class Player(db.Entity):
    """A player is somebody who place a bet, identified by its name."""
    name = Required(unicode)
    score = Required(int, default=0)
    bets = Set('Bet', reverse='player')
    # any other player's info can be stored here


class Match(db.Entity):
    """A Match is a game, played or not yet played."""

    ended = Required(bool, default=False)
    home_score = Required(int, default=0)
    visitors_score = Required(int, default=0)

    bets = Set('Bet', reverse='match')


class Bet(db.Entity):
    """A class that stores a bet for a specific game"""

    match = Required(Match, reverse="bets")
    home_score = Required(int, default=0)
    visitors_score = Required(int, default=0)
    player = Required(Player, reverse="bets")

@db_session
def calculate_wins(match):
    bets = select(b for b in Bet if b.match == match)[:]
    for bet in bets:
        if (match.home_score == bet.home_score) and (match.visitors_score == bet.visitors_score):
            bet.player.score += 3  # exact
        elif (match.home_score - match.visitors_score) == (bet.home_score - bet.visitors_score):
            bet.player.score += 2  # goal differences
        elif ((match.home_score > match.visitors_score) == (bet.home_score > bet.visitors_score)) and \
           (match.home_score != match.visitors_score) and (bet.home_score != bet.visitors_score):
            bet.player.score += 1  # tendency
        else:
            bet.player.score += 0  # wrong

通过这些课程,您可以创建和更新您的比赛,球员,投注数据库。 如果您想要统计和数据汇总/排序,您可以根据需要查询数据库。

db = Database('sqlite', ':memory:')  # you may store it on a file if you like
db.generate_mapping(create_tables=True)

player1 = Player(name='furins')
player2 = Player(name='Martin')

match1 = Match()

furins_bet = Bet(match=match1, player=player1, home_score=0, visitors_score=0)
martin_bet = Bet(match=match1, player=player2, home_score=3, visitors_score=0)


# the game begins ...
match1.home_score = 1
match1.visitors_score = 0
# the game ended ...
match1.ended = True

commit() #let's update the database


calculate_wins(match1)

print("furins score: %d"%(player1.score)) # returns 0
print("Martin score: %d"%(player2.score)) # returns 1

如果你愿意的话,你最终可能会使用numpy整合非常复杂的时间序列数据分析,正如Carst建议的那样,但我相信这些新增内容 - 虽然非常有趣 - 但对于你原来的问题来说有点过时了。 / p>

答案 1 :(得分:1)

首先,我想请你思考一下你会有什么样的问题?即。

  • 您是否想要向每位玩家报告他的估算与实际情况的列表?
  • 你想排名球员吗?
  • 你想做更多的统计工作吗? (球员x在评估参与球队y的比赛时更好)

我会假设你至少要做到前两个!

我试图使代码可读/简单,但在许多方面它比其他答案复杂得多,但它也为您提供了一个完整的工具箱,您可以使用它完成并处理大量数据很快。所以只是将其视为另一种选择:)

基本上在大熊猫的情况下,您可以在将来做更多的统计工作。但实际上,这些问题确实影响了你的问题的答案(或者更确切地说:这里的答案最合适)。

我假设你有一个数据库(关系/ mongodb /无论如何),我在这里通过添加列表来伪装它。即使我在这里使用pandas,你在那里描述的大部分内容也可以通过一种非常简单的方式在关系数据库中完成。但是熊猫摇滚;)所以这也可以。如果您使用excel或csv文件与朋友做某事,您也可以使用pandas read_csv或read_xls直接导入这些文件

import pandas as pd

# game is a unique id (like a combination of date, home_team and away_team)
bet_list = [
    {'playerid': 1, 'game': 1, 'date': 1, 'home_team': 'Bayern', 'away_team': 'VfL', 'home_goals': 3, 'away_goals': 5},
    {'playerid': 2, 'game': 1, 'date': 1, 'home_team': 'Bayern', 'away_team': 'VfL', 'home_goals': 2, 'away_goals': 1},
    {'playerid': 3, 'game': 1, 'date': 1, 'home_team': 'Bayern', 'away_team': 'VfL', 'home_goals': 1, 'away_goals': 0},
    {'playerid': 4, 'game': 1, 'date': 1, 'home_team': 'Bayern', 'away_team': 'VfL', 'home_goals': 0, 'away_goals': 0},
    {'playerid': 1, 'game': 2, 'date': 2, 'home_team': 'Bayern', 'away_team': 'VfL', 'home_goals': 3, 'away_goals': 5},
    {'playerid': 2, 'game': 2, 'date': 2, 'home_team': 'Bayern', 'away_team': 'VfL', 'home_goals': 2, 'away_goals': 1},
    {'playerid': 3, 'game': 2, 'date': 2, 'home_team': 'Bayern', 'away_team': 'VfL', 'home_goals': 1, 'away_goals': 0},
    {'playerid': 4, 'game': 2, 'date': 2, 'home_team': 'Bayern', 'away_team': 'VfL', 'home_goals': 0, 'away_goals': 0},   
    {'playerid': 1, 'game': 3, 'date': 3, 'home_team': 'Bayern', 'away_team': 'VfL', 'home_goals': 3, 'away_goals': 5},
    {'playerid': 2, 'game': 3, 'date': 3, 'home_team': 'Bayern', 'away_team': 'VfL', 'home_goals': 2, 'away_goals': 1},
    {'playerid': 3, 'game': 3, 'date': 3, 'home_team': 'Bayern', 'away_team': 'VfL', 'home_goals': 1, 'away_goals': 0},
    {'playerid': 4, 'game': 3, 'date': 3, 'home_team': 'Bayern', 'away_team': 'VfL', 'home_goals': 0, 'away_goals': 0}  
]

result_list = [
    {'game': 1, 'date': 1, 'home_team': 'Bayern', 'away_team': 'VfL', 'home_goals': 3, 'away_goals': 4},
    {'game': 2, 'date': 2, 'home_team': 'Bayern', 'away_team': 'VfL', 'home_goals': 2, 'away_goals': 2},
    {'game': 3, 'date': 3, 'home_team': 'Bayern', 'away_team': 'VfL', 'home_goals': 0, 'away_goals': 0},
]

def calculate_result(input_df):
    input_df['result'] = 0
    # home wins (result 1)
    mask = input_df['home_goals'] > input_df['away_goals']
    input_df['result'][mask] = 1
    # away wins (result 2)
    mask = input_df['home_goals'] < input_df['away_goals']
    input_df['result'][mask] = 2
    # draws (result 3)
    mask = input_df['home_goals'] == input_df['away_goals']
    input_df['result'][mask] = 3
    # goal difference
    input_df['goal_difference'] = input_df['home_goals'] - input_df['away_goals']
    return input_df

# so what where the expectations?
bet_df = pd.DataFrame(bet_list)
bet_df = calculate_result(bet_df)
# if you want to look at the results
bet_df

# what were the actuals
result_df = pd.DataFrame(result_list)
result_df = calculate_result(result_df)
# if you want to look at the results
result_df

# now let's compare them!
# i take a subsetof the result df and link results on the game
combi_df = pd.merge(left=bet_df, right=result_df[['game', 'home_goals', 'away_goals', 'result', 'goal_difference']], left_on='game', right_on='game', how='inner', suffixes=['_bet', '_actual'])
# look at the data
combi_df

def calculate_bet_score(input_df):
    '''
Notice that I'm keeping in extra columns, because those are nice for comparative analytics in the future. Think: "you had this right, just like x% of all the people"

    '''
    input_df['bet_score'] = 0
    # now look at where people have correctly predicted the result
    input_df['result_estimation'] = 0
    mask = input_df['result_bet'] == input_df['result_actual']
    input_df['result_estimation'][mask] = 1 # correct result
    input_df['bet_score'][mask] = 1 # bet score for a correct result
    # now look at where people have correctly predicted the difference in goals when they already predicted the result correctly
    input_df['goal_difference_estimation'] = 0
    bet_mask = input_df['bet_score'] == 1
    score_mask = input_df['goal_difference_bet'] == input_df['goal_difference_actual']
    input_df['goal_difference_estimation'][(bet_mask) & (score_mask)] = 1 # correct result
    input_df['bet_score'][(bet_mask) & (score_mask)] = 2 # bet score for a correct result
    # now look at where people have correctly predicted the exact goals
    input_df['goal_exact_estimation'] = 0
    bet_mask = input_df['bet_score'] == 2
    home_mask = input_df['home_goals_bet'] == input_df['home_goals_actual']
    away_mask = input_df['away_goals_bet'] == input_df['away_goals_actual']
    input_df['goal_exact_estimation'][(bet_mask) & (home_mask) & (away_mask)] = 1 # correct result
    input_df['bet_score'][(bet_mask)  & (home_mask) & (away_mask)] = 3 # bet score for a correct result
    return input_df

combi_df = calculate_bet_score(combi_df)

# now look at the results
combi_df

# and you can do nifty stuff like making a top player list like this:
combi_df.groupby('playerid')['bet_score'].sum().order(ascending=False)
# player 4 is way ahead!
# which game was the best estimated game?
combi_df.groupby('game')['bet_score'].mean().order(ascending=False)
# game 3! though abysmal predictions in general ;) 

正如我所说,主要是为了给出Python中数据操作可能性的不同观点/想法。一旦你认真对待大量数据,这种(矢量/ numpy / pandas)方法将是最快的,但你必须问自己,你想在数据库内部和外面做什么逻辑,等等。

希望这有帮助!

答案 2 :(得分:0)

这是一个尽管不是很优雅的解决方案:

def evaluation(team_home, team_away, estimate_home, estimate_away):
    delta_result = team_home - team_away
    delta_estimate = estimate_home - estimate_away

    if delta_result == delta_estimate:
        if team_home != estimate_home:
            print "goal difference"
        else:
            print "right"
    elif delta_result > 0 and delta_estimate > 0:
        print "tendency"
    elif delta_result < 0 and delta_estimate < 0:
        print "tendency"
    else:
        print "wrong"

evaluation(2, 1, 2, 1)  # right
evaluation(2, 1, 1, 0)  # goal difference
evaluation(2, 1, 3, 0)  # tendency
evaluation(2, 1, 0, 0)  # wrong

evaluation(2, 2, 2, 2)  # right
evaluation(2, 2, 1, 1)  # goal difference
evaluation(2, 2, 0, 0)  # goal difference
evaluation(2, 2, 1, 0)  # wrong

evaluation(0, 1, 0, 1)  # right
evaluation(0, 1, 1, 2)  # goal difference
evaluation(0, 1, 0, 2)  # tendency
evaluation(0, 1, 0, 0)  # wrong

答案 3 :(得分:0)

这是一个更紧凑,更对称的功能。这是你对“优雅”的意思吗?

def evaluate(team_home, team_away, estimate_home, estimate_away):
    if (team_home == estimate_home) and (team_away == estimate_away):
        return 'right'
    if (team_home - team_away) == (estimate_home - estimate_away):
        return 'goal difference'
    if ((team_home > team_away) == (estimate_home > estimate_away)) and \
       (team_home != team_away) and (estimate_home != estimate_away):
        return 'tendency'
    return 'wrong'