我有一个CSV文件,其中包含(足球联赛)英超联赛中拍摄的所有镜头。对于每次射击,我都计算了“ expectedGoal”比率。直观上,可以用以下方式解释该比率:
有关更多信息,请参见following link
从泊松角度看xG比率,我可以将其视为伯努利成功的概率。基本上,我可以使用泊松二项分布从一组xG中找到 P(Goals = G)(请参阅代码部分的第一部分)。第二部分是original script,我基于此脚本
import numpy as np
import pandas as pd
from poibin import PoiBin
shots = pd.DataFrame()
shots = pd.read_csv('shots.csv', encoding='ISO-8859-1')
shots = shots.drop_duplicates(['match_id'], keep='first')
shots_df = shots[['match_id', 'Date', 'home', 'away', 'hgoals', 'agoals', 'xG_Home', 'xG_Away']].rename(columns={'hgoals' : 'home_goals',
'agoals' : 'away_goals',
'xG_Home': 'home_xG',
'xG_Away': 'away_xG'})
def score_prob(dataset, max_goals=5):
# Find P(Goals=G) from a set of xGs with the usage of a Poisson-Binomial distribution
for row in dataset.itertuples():
home_xG = row.home_xG
away_xG = row.away_xG
pb = PoiBin(home_xG)
score_prob_home = [pb.pmf(x) for x in range(0, max_goals+1)]
pb = PoiBin(away_xG)
score_prob_away = [pb.pmf(x) for x in range(0, max_goals + 1)]
return(np.outer(score_prob_home, score_prob_away))
if __name__ == '__main__':
score_prob = score_prob(shots_df, max_goals=5)
def simulate_match(foot_model, homeTeam, awayTeam, max_goals=10):
home_goals_avg = foot_model.predict(pd.DataFrame(data={'team': homeTeam,
'opponent': awayTeam,'home':1},
index=[1])).values[0]
away_goals_avg = foot_model.predict(pd.DataFrame(data={'team': awayTeam,
'opponent': homeTeam,'home':0},
index=[1])).values[0]
team_pred = [[poisson.pmf(i, team_avg) for i in range(0, max_goals+1)] for team_avg in [home_goals_avg, away_goals_avg]]
return(np.outer(np.array(team_pred[0]), np.array(team_pred[1])))
不幸的是,我收到以下错误;
文件“ C:\ Users \ HJA \ Desktop \ Betting \ understatV0.01 \ poibin.py”,行 183,在check_input_prob中 “输入必须是一维数组或列表。”)ValueError:输入必须是一维数组或列表。
documentation指出以下内容:
考虑n个独立且不相同分布的随机变量,并 p 列出相应伯努利成功概率的列表/ NumPy数组。为了创建泊松二项分布,使用
from poibin import PoiBin
pb = PoiBin(p)
Be __x__ a list/NumPy array of different numbers of success. Use the following methods to obtain the corresponding quantities:
Probability mass function
pb.pmf(x)
因此,我的问题基本上是,在脚本中定义 pb 之前,我只有一个xG值。这与原始脚本的 poisson.pmf()部分不同。如何调整脚本使其正常工作?