我想评估体育比赛的估计 - 在我的情况下是足球(即足球)比赛。我想使用Python。
基本上,始终有team_home
个结果,team_away
结果,estimate_home
和estimate_away
。例如,游戏结束1:0
,估算值为0:0
- 这将返回wrong
。
只有四种可能的案例和结果:
wrong
与上面的情况一样tendency
对获胜者的估计是正确的,但不是目标差异(例如3:0
)goal difference
了解正确的目标差异,例如2:1
right
获得准确的正确估算值在Python中处理估算和结果的最优雅方法是什么?
答案 0 :(得分:2)
另一个答案,反映了我对优雅的看法(一个相当主观的参数,我同意)。我想让我的对象由类定义,用OOP构建,并用ORM来管理对象之间的关系。这带来了许多优点和更清晰的代码。
我在这里使用pony ORM,但还有许多其他优秀的选项(最终允许使用更宽松的许可),例如SQLAlchemy或Django's ORM。
这是一个完整的示例 - 首先我们定义模型:
from pony.orm import *
class Player(db.Entity):
"""A player is somebody who place a bet, identified by its name."""
name = Required(unicode)
score = Required(int, default=0)
bets = Set('Bet', reverse='player')
# any other player's info can be stored here
class Match(db.Entity):
"""A Match is a game, played or not yet played."""
ended = Required(bool, default=False)
home_score = Required(int, default=0)
visitors_score = Required(int, default=0)
bets = Set('Bet', reverse='match')
class Bet(db.Entity):
"""A class that stores a bet for a specific game"""
match = Required(Match, reverse="bets")
home_score = Required(int, default=0)
visitors_score = Required(int, default=0)
player = Required(Player, reverse="bets")
@db_session
def calculate_wins(match):
bets = select(b for b in Bet if b.match == match)[:]
for bet in bets:
if (match.home_score == bet.home_score) and (match.visitors_score == bet.visitors_score):
bet.player.score += 3 # exact
elif (match.home_score - match.visitors_score) == (bet.home_score - bet.visitors_score):
bet.player.score += 2 # goal differences
elif ((match.home_score > match.visitors_score) == (bet.home_score > bet.visitors_score)) and \
(match.home_score != match.visitors_score) and (bet.home_score != bet.visitors_score):
bet.player.score += 1 # tendency
else:
bet.player.score += 0 # wrong
通过这些课程,您可以创建和更新您的比赛,球员,投注数据库。 如果您想要统计和数据汇总/排序,您可以根据需要查询数据库。
db = Database('sqlite', ':memory:') # you may store it on a file if you like
db.generate_mapping(create_tables=True)
player1 = Player(name='furins')
player2 = Player(name='Martin')
match1 = Match()
furins_bet = Bet(match=match1, player=player1, home_score=0, visitors_score=0)
martin_bet = Bet(match=match1, player=player2, home_score=3, visitors_score=0)
# the game begins ...
match1.home_score = 1
match1.visitors_score = 0
# the game ended ...
match1.ended = True
commit() #let's update the database
calculate_wins(match1)
print("furins score: %d"%(player1.score)) # returns 0
print("Martin score: %d"%(player2.score)) # returns 1
如果你愿意的话,你最终可能会使用numpy整合非常复杂的时间序列数据分析,正如Carst建议的那样,但我相信这些新增内容 - 虽然非常有趣 - 但对于你原来的问题来说有点过时了。 / p>
答案 1 :(得分:1)
首先,我想请你思考一下你会有什么样的问题?即。
我会假设你至少要做到前两个!
我试图使代码可读/简单,但在许多方面它比其他答案复杂得多,但它也为您提供了一个完整的工具箱,您可以使用它完成并处理大量数据很快。所以只是将其视为另一种选择:)
基本上在大熊猫的情况下,您可以在将来做更多的统计工作。但实际上,这些问题确实影响了你的问题的答案(或者更确切地说:这里的答案最合适)。
我假设你有一个数据库(关系/ mongodb /无论如何),我在这里通过添加列表来伪装它。即使我在这里使用pandas,你在那里描述的大部分内容也可以通过一种非常简单的方式在关系数据库中完成。但是熊猫摇滚;)所以这也可以。如果您使用excel或csv文件与朋友做某事,您也可以使用pandas read_csv或read_xls直接导入这些文件
import pandas as pd
# game is a unique id (like a combination of date, home_team and away_team)
bet_list = [
{'playerid': 1, 'game': 1, 'date': 1, 'home_team': 'Bayern', 'away_team': 'VfL', 'home_goals': 3, 'away_goals': 5},
{'playerid': 2, 'game': 1, 'date': 1, 'home_team': 'Bayern', 'away_team': 'VfL', 'home_goals': 2, 'away_goals': 1},
{'playerid': 3, 'game': 1, 'date': 1, 'home_team': 'Bayern', 'away_team': 'VfL', 'home_goals': 1, 'away_goals': 0},
{'playerid': 4, 'game': 1, 'date': 1, 'home_team': 'Bayern', 'away_team': 'VfL', 'home_goals': 0, 'away_goals': 0},
{'playerid': 1, 'game': 2, 'date': 2, 'home_team': 'Bayern', 'away_team': 'VfL', 'home_goals': 3, 'away_goals': 5},
{'playerid': 2, 'game': 2, 'date': 2, 'home_team': 'Bayern', 'away_team': 'VfL', 'home_goals': 2, 'away_goals': 1},
{'playerid': 3, 'game': 2, 'date': 2, 'home_team': 'Bayern', 'away_team': 'VfL', 'home_goals': 1, 'away_goals': 0},
{'playerid': 4, 'game': 2, 'date': 2, 'home_team': 'Bayern', 'away_team': 'VfL', 'home_goals': 0, 'away_goals': 0},
{'playerid': 1, 'game': 3, 'date': 3, 'home_team': 'Bayern', 'away_team': 'VfL', 'home_goals': 3, 'away_goals': 5},
{'playerid': 2, 'game': 3, 'date': 3, 'home_team': 'Bayern', 'away_team': 'VfL', 'home_goals': 2, 'away_goals': 1},
{'playerid': 3, 'game': 3, 'date': 3, 'home_team': 'Bayern', 'away_team': 'VfL', 'home_goals': 1, 'away_goals': 0},
{'playerid': 4, 'game': 3, 'date': 3, 'home_team': 'Bayern', 'away_team': 'VfL', 'home_goals': 0, 'away_goals': 0}
]
result_list = [
{'game': 1, 'date': 1, 'home_team': 'Bayern', 'away_team': 'VfL', 'home_goals': 3, 'away_goals': 4},
{'game': 2, 'date': 2, 'home_team': 'Bayern', 'away_team': 'VfL', 'home_goals': 2, 'away_goals': 2},
{'game': 3, 'date': 3, 'home_team': 'Bayern', 'away_team': 'VfL', 'home_goals': 0, 'away_goals': 0},
]
def calculate_result(input_df):
input_df['result'] = 0
# home wins (result 1)
mask = input_df['home_goals'] > input_df['away_goals']
input_df['result'][mask] = 1
# away wins (result 2)
mask = input_df['home_goals'] < input_df['away_goals']
input_df['result'][mask] = 2
# draws (result 3)
mask = input_df['home_goals'] == input_df['away_goals']
input_df['result'][mask] = 3
# goal difference
input_df['goal_difference'] = input_df['home_goals'] - input_df['away_goals']
return input_df
# so what where the expectations?
bet_df = pd.DataFrame(bet_list)
bet_df = calculate_result(bet_df)
# if you want to look at the results
bet_df
# what were the actuals
result_df = pd.DataFrame(result_list)
result_df = calculate_result(result_df)
# if you want to look at the results
result_df
# now let's compare them!
# i take a subsetof the result df and link results on the game
combi_df = pd.merge(left=bet_df, right=result_df[['game', 'home_goals', 'away_goals', 'result', 'goal_difference']], left_on='game', right_on='game', how='inner', suffixes=['_bet', '_actual'])
# look at the data
combi_df
def calculate_bet_score(input_df):
'''
Notice that I'm keeping in extra columns, because those are nice for comparative analytics in the future. Think: "you had this right, just like x% of all the people"
'''
input_df['bet_score'] = 0
# now look at where people have correctly predicted the result
input_df['result_estimation'] = 0
mask = input_df['result_bet'] == input_df['result_actual']
input_df['result_estimation'][mask] = 1 # correct result
input_df['bet_score'][mask] = 1 # bet score for a correct result
# now look at where people have correctly predicted the difference in goals when they already predicted the result correctly
input_df['goal_difference_estimation'] = 0
bet_mask = input_df['bet_score'] == 1
score_mask = input_df['goal_difference_bet'] == input_df['goal_difference_actual']
input_df['goal_difference_estimation'][(bet_mask) & (score_mask)] = 1 # correct result
input_df['bet_score'][(bet_mask) & (score_mask)] = 2 # bet score for a correct result
# now look at where people have correctly predicted the exact goals
input_df['goal_exact_estimation'] = 0
bet_mask = input_df['bet_score'] == 2
home_mask = input_df['home_goals_bet'] == input_df['home_goals_actual']
away_mask = input_df['away_goals_bet'] == input_df['away_goals_actual']
input_df['goal_exact_estimation'][(bet_mask) & (home_mask) & (away_mask)] = 1 # correct result
input_df['bet_score'][(bet_mask) & (home_mask) & (away_mask)] = 3 # bet score for a correct result
return input_df
combi_df = calculate_bet_score(combi_df)
# now look at the results
combi_df
# and you can do nifty stuff like making a top player list like this:
combi_df.groupby('playerid')['bet_score'].sum().order(ascending=False)
# player 4 is way ahead!
# which game was the best estimated game?
combi_df.groupby('game')['bet_score'].mean().order(ascending=False)
# game 3! though abysmal predictions in general ;)
正如我所说,主要是为了给出Python中数据操作可能性的不同观点/想法。一旦你认真对待大量数据,这种(矢量/ numpy / pandas)方法将是最快的,但你必须问自己,你想在数据库内部和外面做什么逻辑,等等。
希望这有帮助!
答案 2 :(得分:0)
这是一个尽管不是很优雅的解决方案:
def evaluation(team_home, team_away, estimate_home, estimate_away):
delta_result = team_home - team_away
delta_estimate = estimate_home - estimate_away
if delta_result == delta_estimate:
if team_home != estimate_home:
print "goal difference"
else:
print "right"
elif delta_result > 0 and delta_estimate > 0:
print "tendency"
elif delta_result < 0 and delta_estimate < 0:
print "tendency"
else:
print "wrong"
evaluation(2, 1, 2, 1) # right
evaluation(2, 1, 1, 0) # goal difference
evaluation(2, 1, 3, 0) # tendency
evaluation(2, 1, 0, 0) # wrong
evaluation(2, 2, 2, 2) # right
evaluation(2, 2, 1, 1) # goal difference
evaluation(2, 2, 0, 0) # goal difference
evaluation(2, 2, 1, 0) # wrong
evaluation(0, 1, 0, 1) # right
evaluation(0, 1, 1, 2) # goal difference
evaluation(0, 1, 0, 2) # tendency
evaluation(0, 1, 0, 0) # wrong
答案 3 :(得分:0)
这是一个更紧凑,更对称的功能。这是你对“优雅”的意思吗?
def evaluate(team_home, team_away, estimate_home, estimate_away):
if (team_home == estimate_home) and (team_away == estimate_away):
return 'right'
if (team_home - team_away) == (estimate_home - estimate_away):
return 'goal difference'
if ((team_home > team_away) == (estimate_home > estimate_away)) and \
(team_home != team_away) and (estimate_home != estimate_away):
return 'tendency'
return 'wrong'