协方差矩阵 - R.

时间:2016-12-15 01:34:46

标签: r covariance

我有一个玩家统计数据框,我想要做的是在玩家之间创建MB统计数据的协方差矩阵,以了解哪些玩家在一起表现良好,哪些通常会相互影响。

请注意,并非所有玩家都参与每场比赛。

我希望得到类似下面的内容,显然'x'是相关的协方差值。

               Player.Name Damian Lillard C.J. McCollum Allen Crabbe Noah Vonleh  etc, etc
1           Damian Lillard              x             x            x           x
2            C.J. McCollum              x             x            x           x
3             Allen Crabbe              x             x            x           x
4              Noah Vonleh              x             x            x           x
5                 Ed Davis              x             x            x           x
6          Al-Farouq Aminu              x             x            x           x
7              Evan Turner              x             x            x           x
8         Maurice Harkless              x             x            x           x
9           Meyers Leonard              x             x            x           x
10           Mason Plumlee              x             x            x           x
11          Shabazz Napier              x             x            x           x

> df
          Player.Name  Tm   MB    DS     Game
1      Damian Lillard POR 54.8 59.50 20161025
11      C.J. McCollum POR 30.9 32.50 20161025
16       Allen Crabbe POR 24.1 28.25 20161025
19        Noah Vonleh POR 14.2 15.25 20161025
22           Ed Davis POR 17.9 18.00 20161025
26    Al-Farouq Aminu POR 16.3 18.25 20161025
34        Evan Turner POR 20.5 19.25 20161025
64   Maurice Harkless POR  4.7  5.25 20161025
65     Meyers Leonard POR  2.7  2.25 20161025
68      Mason Plumlee POR  4.7  4.00 20161025
290  Maurice Harkless POR 35.6 35.75 20161027
295     Mason Plumlee POR 36.6 36.75 20161027
299    Damian Lillard POR 41.5 44.25 20161027
309     C.J. McCollum POR 26.8 27.50 20161027
318      Allen Crabbe POR 17.2 16.25 20161027
349       Noah Vonleh POR  5.0  4.75 20161027
358       Evan Turner POR 10.7 10.50 20161027
359          Ed Davis POR  5.6  5.50 20161027
364    Shabazz Napier POR  0.0  0.00 20161027
369   Al-Farouq Aminu POR 13.6 13.25 20161027
545    Damian Lillard POR 56.5 58.25 20161029
557     C.J. McCollum POR 49.5 51.25 20161029
610     Mason Plumlee POR 22.9 22.50 20161029
611      Allen Crabbe POR 22.6 22.75 20161029
637       Evan Turner POR 15.6 16.75 20161029
649   Al-Farouq Aminu POR 27.9 28.25 20161029
673          Ed Davis POR  8.9  9.50 20161029
704       Noah Vonleh POR  4.8  5.00 20161029
719  Maurice Harkless POR  9.6 11.00 20161029
723    Meyers Leonard POR  6.2  6.25 20161029
728    Shabazz Napier POR  0.0  0.00 20161029

数据

structure(list(PlayerName = c("Damian Lillard", "C.J. McCollum", 
"Allen Crabbe", "Noah Vonleh", "Ed Davis", "Al-Farouq Aminu", 
"Evan Turner", "Maurice Harkless", "Meyers Leonard", "Mason Plumlee", 
"Maurice Harkless", "Mason Plumlee", "Damian Lillard", "C.J. McCollum", 
"Allen Crabbe", "Noah Vonleh", "Evan Turner", "Ed Davis", "Shabazz Napier", 
"Al-Farouq Aminu", "Damian Lillard", "C.J. McCollum", "Mason Plumlee", 
"Allen Crabbe", "Evan Turner", "Al-Farouq Aminu", "Ed Davis", 
"Noah Vonleh", "Maurice Harkless", "Meyers Leonard", "Shabazz Napier"
), TM = c("POR", "POR", "POR", "POR", "POR", "POR", "POR", "POR", 
"POR", "POR", "POR", "POR", "POR", "POR", "POR", "POR", "POR", 
"POR", "POR", "POR", "POR", "POR", "POR", "POR", "POR", "POR", 
"POR", "POR", "POR", "POR", "POR"), MB = c(54.8, 30.9, 24.1, 
14.2, 17.9, 16.3, 20.5, 4.7, 2.7, 4.7, 35.6, 36.6, 41.5, 26.8, 
17.2, 5, 10.7, 5.6, 0, 13.6, 56.5, 49.5, 22.9, 22.6, 15.6, 27.9, 
8.9, 4.8, 9.6, 6.2, 0), DS = c(59.5, 32.5, 28.25, 15.25, 18, 
18.25, 19.25, 5.25, 2.25, 4, 35.75, 36.75, 44.25, 27.5, 16.25, 
4.75, 10.5, 5.5, 0, 13.25, 58.25, 51.25, 22.5, 22.75, 16.75, 
28.25, 9.5, 5, 11, 6.25, 0), Game = c(20161025L, 20161025L, 20161025L, 
20161025L, 20161025L, 20161025L, 20161025L, 20161025L, 20161025L, 
20161025L, 20161027L, 20161027L, 20161027L, 20161027L, 20161027L, 
20161027L, 20161027L, 20161027L, 20161027L, 20161027L, 20161029L, 
20161029L, 20161029L, 20161029L, 20161029L, 20161029L, 20161029L, 
20161029L, 20161029L, 20161029L, 20161029L)), .Names = c("PlayerName", 
"TM", "MB", "DS", "Game"), row.names = c(NA, -31L), class = "data.frame")

2 个答案:

答案 0 :(得分:1)

您可以使用cov()功能来实现此目的,例如:

cov_mat <- cov(t(x[,3:4]))
rownames(cov_mat) <- x$PlayerName
colnames(cov_mat) <- x$PlayerName


> cov_mat[1:3,1:3]
               Damian Lillard C.J. McCollum Allen Crabbe
Damian Lillard        11.0450          3.76      9.75250
C.J. McCollum          3.7600          1.28      3.32000
Allen Crabbe           9.7525          3.32      8.61125

如果您想要计算相关性,只需将cov()换成cor()

答案 1 :(得分:1)

我认为您首先需要做的是reshape数据,因此每一行都是一个游戏,每一列都是玩家游戏的MB。假设我们的数据位于dat

dat <- dat[,-c(2,4)] #remove team name and DS
#names left in data.frame
names(dat)
[1] "PlayerName" "MB"         "Game"      

#reshape from long to wide
dat.wide <- reshape(dat, direction = 'wide',idvar = 'Game',
        timevar = 'PlayerName')

dat.wide[1:4, 1:4]
       Game MB.Damian Lillard MB.C.J. McCollum MB.Allen Crabbe
1  20161025              54.8             30.9            24.1
11 20161027              41.5             26.8            17.2
21 20161029              56.5             49.5            22.6

#compute using cov function
cov_m <- cov(dat.wide[,-1], use = 'pairwise.complete')
cov_m[1:4,1:4]

                  MB.Damian Lillard MB.C.J. McCollum MB.Allen Crabbe MB.Noah Vonleh
MB.Damian Lillard          67.46333         71.10833          28.370          17.23
MB.C.J. McCollum           71.10833        146.34333          20.495         -23.61
MB.Allen Crabbe            28.37000         20.49500          13.170          12.75
MB.Noah Vonleh             17.23000        -23.61000          12.750          28.84