用R中的dpois函数构建预测模型

时间:2018-06-17 18:25:46

标签: r machine-learning regression prediction poisson

你好!我正处于构建(和学习!)的初始阶段,如何为体育建立预测模型,特别是使用NHL统计数据。 我拥有自1990年以来NHL的所有比赛结果,我想用#目标预测未来比赛的结果(目前仅基于目标)

以下是我的数据集的摘录,但可以在此Git链接中找到完整的数据集:

https://github.com/papelr/nhldatar/blob/master/nhldatar/data/NHL_outcomes.rda

           Date               Visitor GVisitor                  Home GHome Att.
1    1990-10-04   Philadelphia Flyers        1         Boston Bruins     4 <NA>
2    1990-10-04    Montreal Canadiens        3        Buffalo Sabres     3 <NA>
3    1990-10-04     Vancouver Canucks        2        Calgary Flames     3 <NA>
4    1990-10-04      New York Rangers        3    Chicago Blackhawks     4 <NA>
5    1990-10-04      Quebec Nordiques        3      Hartford Whalers     3 <NA>
6    1990-10-04    New York Islanders        1     Los Angeles Kings     4 <NA>
7    1990-10-04       St. Louis Blues        3 Minnesota North Stars     2 <NA>
8    1990-10-04     Detroit Red Wings        3     New Jersey Devils     3 <NA>
9    1990-10-04   Toronto Maple Leafs        1         Winnipeg Jets     7 <NA>
10   1990-10-05   Pittsburgh Penguins        7   Washington Capitals     4 <NA>
11   1990-10-06      Quebec Nordiques        1         Boston Bruins     7 <NA>
12   1990-10-06   Toronto Maple Leafs        1        Calgary Flames     4 <NA>
13   1990-10-06         Winnipeg Jets        3       Edmonton Oilers     3 <NA>
14   1990-10-06      New York Rangers        4      Hartford Whalers     5 <NA>
15   1990-10-06     Vancouver Canucks        6     Los Angeles Kings     3 <NA>
16   1990-10-06    New York Islanders        2 Minnesota North Stars     4 <NA>
17   1990-10-06        Buffalo Sabres        5    Montreal Canadiens     6 <NA>
18   1990-10-06   Philadelphia Flyers        1     New Jersey Devils     3 <NA>
19   1990-10-06    Chicago Blackhawks        5       St. Louis Blues     2 <NA>
20   1990-10-06     Detroit Red Wings        4   Washington Capitals     6 <NA>
21   1990-10-07    New York Islanders        4    Chicago Blackhawks     2 <NA>
22   1990-10-07   Toronto Maple Leafs        2       Edmonton Oilers     3 <NA>
23   1990-10-07     Detroit Red Wings        2   Philadelphia Flyers     7 <NA>
24   1990-10-07     New Jersey Devils        4   Pittsburgh Penguins     7 <NA>
25   1990-10-07         Boston Bruins        5      Quebec Nordiques     2 <NA>
26   1990-10-08      Hartford Whalers        3    Montreal Canadiens     5 <NA>
27   1990-10-08 Minnesota North Stars        3      New York Rangers     6 <NA>
28   1990-10-08        Calgary Flames        4         Winnipeg Jets     3 <NA>
29   1990-10-09 Minnesota North Stars        2     New Jersey Devils     5 <NA>
30   1990-10-09   Pittsburgh Penguins        3       St. Louis Blues     4 <NA>
31   1990-10-09     Los Angeles Kings        6     Vancouver Canucks     2 <NA>
32   1990-10-10        Calgary Flames        5     Detroit Red Wings     6 <NA>
33   1990-10-10        Buffalo Sabres        3      Hartford Whalers     4 <NA>
34   1990-10-10   Washington Capitals        2      New York Rangers     4 <NA>
35   1990-10-10      Quebec Nordiques        8   Toronto Maple Leafs     5 <NA>
36   1990-10-10         Boston Bruins        4         Winnipeg Jets     2 <NA>
37   1990-10-11   Pittsburgh Penguins        1    Chicago Blackhawks     4 <NA>
38   1990-10-11       Edmonton Oilers        5     Los Angeles Kings     5 <NA>
39   1990-10-11         Boston Bruins        3 Minnesota North Stars     3 <NA>
40   1990-10-11     New Jersey Devils        4   Philadelphia Flyers     7 <NA>

这是我到目前为止提出的预测模型,我未能得到下面的模拟匹配行所附的矩阵。任何帮助都会很棒。

# Using number of goals for prediction model
model_one <- 
  rbind(
    data.frame(goals = outcomes$GHome,
               team = outcomes$Home,
               opponent = outcomes$Visitor,
               home = 1),
    data.frame(goals = outcomes$GVisitor,
               team = outcomes$Visitor,
               opponent = outcomes$Home,
               home = 0)) %>%
  glm(goals ~ home + team + opponent, 
      family = poisson (link = log), data = .)
summary(model_one)

# Probability function / matrix
simulate_game <- function(stat_model, homeTeam, awayTeam, max_goals = 
   10) {

  home_goals <- predict(model_one,
                        data.frame(home = 1, 
                                   team = homeTeam,
                                   opponent = awayTeam), 
                        type ="response")

  away_goals <- predict(model_one, 
                        data.frame(home = 0, 
                                   team = awayTeam, 
                                   opponent = homeTeam), 
                        type ="response")

  dpois(0: max_goals, home_goals) %>%  
   dpois(0: max_goals, away_goals) 
}

simulate_game(model_one, "Nashville Predators", "Chicago Blackhawks", 
      max_goals = 10)

我完全理解泊松模型不是运动预测的最佳选择,但我正在重建一个我为EPL找到的模型,用于学习/练习,并使其适应NHL(来自David Sheehan的模型,https://dashee87.github.io/data%20science/football/r/predicting-football-results-with-statistical-modelling/)。

任何提示都会很棒,因为目前这个模型会返回一堆警告

There were 11 warnings (use warnings() to see them)
> warnings()
Warning messages:
1: In dpois(., 0:max_goals, away_goals_avg) : non-integer x = 0.062689
2: In dpois(., 0:max_goals, away_goals_avg) : non-integer x = 0.173621

0 个答案:

没有答案