在概率质量函数的循环中包含概率

时间:2018-12-23 10:28:31

标签: python poisson

我有一个CSV文件,其中包含(足球联赛)英超联赛中拍摄的所有镜头。对于每次射击,我都计算了“ expectedGoal”比率。直观上,可以用以下方式解释该比率:

  • xG值为1等于目标
  • xG值> 1等于未命中

有关更多信息,请参见following link

从泊松角度看xG比率,我可以将其视为伯努利成功的概率。基本上,我可以使用泊松二项分布从一组xG中找到 P(Goals = G)(请参阅代码部分的第一部分)。第二部分是original script,我基于此脚本

import numpy as np
import pandas as pd
from poibin import PoiBin


shots = pd.DataFrame()
shots = pd.read_csv('shots.csv', encoding='ISO-8859-1')
shots = shots.drop_duplicates(['match_id'], keep='first')
shots_df = shots[['match_id', 'Date', 'home', 'away', 'hgoals', 'agoals', 'xG_Home', 'xG_Away']].rename(columns={'hgoals' : 'home_goals',
                                                                                                             'agoals' : 'away_goals',
                                                                                                             'xG_Home': 'home_xG',
                                                                                                             'xG_Away': 'away_xG'})

def score_prob(dataset, max_goals=5):
# Find P(Goals=G) from a set of xGs with the usage of a Poisson-Binomial distribution
   for row in dataset.itertuples():
       home_xG = row.home_xG
       away_xG = row.away_xG
       pb = PoiBin(home_xG)
       score_prob_home = [pb.pmf(x) for x in range(0, max_goals+1)]
       pb = PoiBin(away_xG)
       score_prob_away = [pb.pmf(x) for x in range(0, max_goals + 1)]
       return(np.outer(score_prob_home, score_prob_away))


if __name__ == '__main__':
    score_prob = score_prob(shots_df, max_goals=5)

def simulate_match(foot_model, homeTeam, awayTeam, max_goals=10):
   home_goals_avg = foot_model.predict(pd.DataFrame(data={'team': homeTeam, 
                                                        'opponent': awayTeam,'home':1},
                                                  index=[1])).values[0]
   away_goals_avg = foot_model.predict(pd.DataFrame(data={'team': awayTeam, 
                                                        'opponent': homeTeam,'home':0},
                                                  index=[1])).values[0]
   team_pred = [[poisson.pmf(i, team_avg) for i in range(0, max_goals+1)] for team_avg in [home_goals_avg, away_goals_avg]]
   return(np.outer(np.array(team_pred[0]), np.array(team_pred[1])))

不幸的是,我收到以下错误;

  

文件“ C:\ Users \ HJA \ Desktop \ Betting \ understatV0.01 \ poibin.py”,行   183,在check_input_prob中       “输入必须是一维数组或列表。”)ValueError:输入必须是一维数组或列表。

documentation指出以下内容:

考虑n个独立且不相同分布的随机变量,并 p 列出相应伯努利成功概率的列表/ NumPy数组。为了创建泊松二项分布,使用

from poibin import PoiBin

pb = PoiBin(p)

Be __x__ a list/NumPy array of different numbers of success. Use the following methods to obtain the corresponding quantities:

Probability mass function

pb.pmf(x)

因此,我的问题基本上是,在脚本中定义 pb 之前,我只有一个xG值。这与原始脚本的 poisson.pmf()部分不同。如何调整脚本使其正常工作?

0 个答案:

没有答案