你好!我正处于构建(和学习!)的初始阶段,如何为体育建立预测模型,特别是使用NHL统计数据。 我拥有自1990年以来NHL的所有比赛结果,我想用#目标预测未来比赛的结果(目前仅基于目标)
以下是我的数据集的摘录,但可以在此Git链接中找到完整的数据集:
https://github.com/papelr/nhldatar/blob/master/nhldatar/data/NHL_outcomes.rda
Date Visitor GVisitor Home GHome Att.
1 1990-10-04 Philadelphia Flyers 1 Boston Bruins 4 <NA>
2 1990-10-04 Montreal Canadiens 3 Buffalo Sabres 3 <NA>
3 1990-10-04 Vancouver Canucks 2 Calgary Flames 3 <NA>
4 1990-10-04 New York Rangers 3 Chicago Blackhawks 4 <NA>
5 1990-10-04 Quebec Nordiques 3 Hartford Whalers 3 <NA>
6 1990-10-04 New York Islanders 1 Los Angeles Kings 4 <NA>
7 1990-10-04 St. Louis Blues 3 Minnesota North Stars 2 <NA>
8 1990-10-04 Detroit Red Wings 3 New Jersey Devils 3 <NA>
9 1990-10-04 Toronto Maple Leafs 1 Winnipeg Jets 7 <NA>
10 1990-10-05 Pittsburgh Penguins 7 Washington Capitals 4 <NA>
11 1990-10-06 Quebec Nordiques 1 Boston Bruins 7 <NA>
12 1990-10-06 Toronto Maple Leafs 1 Calgary Flames 4 <NA>
13 1990-10-06 Winnipeg Jets 3 Edmonton Oilers 3 <NA>
14 1990-10-06 New York Rangers 4 Hartford Whalers 5 <NA>
15 1990-10-06 Vancouver Canucks 6 Los Angeles Kings 3 <NA>
16 1990-10-06 New York Islanders 2 Minnesota North Stars 4 <NA>
17 1990-10-06 Buffalo Sabres 5 Montreal Canadiens 6 <NA>
18 1990-10-06 Philadelphia Flyers 1 New Jersey Devils 3 <NA>
19 1990-10-06 Chicago Blackhawks 5 St. Louis Blues 2 <NA>
20 1990-10-06 Detroit Red Wings 4 Washington Capitals 6 <NA>
21 1990-10-07 New York Islanders 4 Chicago Blackhawks 2 <NA>
22 1990-10-07 Toronto Maple Leafs 2 Edmonton Oilers 3 <NA>
23 1990-10-07 Detroit Red Wings 2 Philadelphia Flyers 7 <NA>
24 1990-10-07 New Jersey Devils 4 Pittsburgh Penguins 7 <NA>
25 1990-10-07 Boston Bruins 5 Quebec Nordiques 2 <NA>
26 1990-10-08 Hartford Whalers 3 Montreal Canadiens 5 <NA>
27 1990-10-08 Minnesota North Stars 3 New York Rangers 6 <NA>
28 1990-10-08 Calgary Flames 4 Winnipeg Jets 3 <NA>
29 1990-10-09 Minnesota North Stars 2 New Jersey Devils 5 <NA>
30 1990-10-09 Pittsburgh Penguins 3 St. Louis Blues 4 <NA>
31 1990-10-09 Los Angeles Kings 6 Vancouver Canucks 2 <NA>
32 1990-10-10 Calgary Flames 5 Detroit Red Wings 6 <NA>
33 1990-10-10 Buffalo Sabres 3 Hartford Whalers 4 <NA>
34 1990-10-10 Washington Capitals 2 New York Rangers 4 <NA>
35 1990-10-10 Quebec Nordiques 8 Toronto Maple Leafs 5 <NA>
36 1990-10-10 Boston Bruins 4 Winnipeg Jets 2 <NA>
37 1990-10-11 Pittsburgh Penguins 1 Chicago Blackhawks 4 <NA>
38 1990-10-11 Edmonton Oilers 5 Los Angeles Kings 5 <NA>
39 1990-10-11 Boston Bruins 3 Minnesota North Stars 3 <NA>
40 1990-10-11 New Jersey Devils 4 Philadelphia Flyers 7 <NA>
这是我到目前为止提出的预测模型,我未能得到下面的模拟匹配行所附的矩阵。任何帮助都会很棒。
# Using number of goals for prediction model
model_one <-
rbind(
data.frame(goals = outcomes$GHome,
team = outcomes$Home,
opponent = outcomes$Visitor,
home = 1),
data.frame(goals = outcomes$GVisitor,
team = outcomes$Visitor,
opponent = outcomes$Home,
home = 0)) %>%
glm(goals ~ home + team + opponent,
family = poisson (link = log), data = .)
summary(model_one)
# Probability function / matrix
simulate_game <- function(stat_model, homeTeam, awayTeam, max_goals =
10) {
home_goals <- predict(model_one,
data.frame(home = 1,
team = homeTeam,
opponent = awayTeam),
type ="response")
away_goals <- predict(model_one,
data.frame(home = 0,
team = awayTeam,
opponent = homeTeam),
type ="response")
dpois(0: max_goals, home_goals) %>%
dpois(0: max_goals, away_goals)
}
simulate_game(model_one, "Nashville Predators", "Chicago Blackhawks",
max_goals = 10)
我完全理解泊松模型不是运动预测的最佳选择,但我正在重建一个我为EPL找到的模型,用于学习/练习,并使其适应NHL(来自David Sheehan的模型,https://dashee87.github.io/data%20science/football/r/predicting-football-results-with-statistical-modelling/)。
任何提示都会很棒,因为目前这个模型会返回一堆警告:
There were 11 warnings (use warnings() to see them)
> warnings()
Warning messages:
1: In dpois(., 0:max_goals, away_goals_avg) : non-integer x = 0.062689
2: In dpois(., 0:max_goals, away_goals_avg) : non-integer x = 0.173621