我有一个玩家统计数据框,我想要做的是在玩家之间创建MB统计数据的协方差矩阵,以了解哪些玩家在一起表现良好,哪些通常会相互影响。
请注意,并非所有玩家都参与每场比赛。
我希望得到类似下面的内容,显然'x'是相关的协方差值。
Player.Name Damian Lillard C.J. McCollum Allen Crabbe Noah Vonleh etc, etc
1 Damian Lillard x x x x
2 C.J. McCollum x x x x
3 Allen Crabbe x x x x
4 Noah Vonleh x x x x
5 Ed Davis x x x x
6 Al-Farouq Aminu x x x x
7 Evan Turner x x x x
8 Maurice Harkless x x x x
9 Meyers Leonard x x x x
10 Mason Plumlee x x x x
11 Shabazz Napier x x x x
> df
Player.Name Tm MB DS Game
1 Damian Lillard POR 54.8 59.50 20161025
11 C.J. McCollum POR 30.9 32.50 20161025
16 Allen Crabbe POR 24.1 28.25 20161025
19 Noah Vonleh POR 14.2 15.25 20161025
22 Ed Davis POR 17.9 18.00 20161025
26 Al-Farouq Aminu POR 16.3 18.25 20161025
34 Evan Turner POR 20.5 19.25 20161025
64 Maurice Harkless POR 4.7 5.25 20161025
65 Meyers Leonard POR 2.7 2.25 20161025
68 Mason Plumlee POR 4.7 4.00 20161025
290 Maurice Harkless POR 35.6 35.75 20161027
295 Mason Plumlee POR 36.6 36.75 20161027
299 Damian Lillard POR 41.5 44.25 20161027
309 C.J. McCollum POR 26.8 27.50 20161027
318 Allen Crabbe POR 17.2 16.25 20161027
349 Noah Vonleh POR 5.0 4.75 20161027
358 Evan Turner POR 10.7 10.50 20161027
359 Ed Davis POR 5.6 5.50 20161027
364 Shabazz Napier POR 0.0 0.00 20161027
369 Al-Farouq Aminu POR 13.6 13.25 20161027
545 Damian Lillard POR 56.5 58.25 20161029
557 C.J. McCollum POR 49.5 51.25 20161029
610 Mason Plumlee POR 22.9 22.50 20161029
611 Allen Crabbe POR 22.6 22.75 20161029
637 Evan Turner POR 15.6 16.75 20161029
649 Al-Farouq Aminu POR 27.9 28.25 20161029
673 Ed Davis POR 8.9 9.50 20161029
704 Noah Vonleh POR 4.8 5.00 20161029
719 Maurice Harkless POR 9.6 11.00 20161029
723 Meyers Leonard POR 6.2 6.25 20161029
728 Shabazz Napier POR 0.0 0.00 20161029
structure(list(PlayerName = c("Damian Lillard", "C.J. McCollum",
"Allen Crabbe", "Noah Vonleh", "Ed Davis", "Al-Farouq Aminu",
"Evan Turner", "Maurice Harkless", "Meyers Leonard", "Mason Plumlee",
"Maurice Harkless", "Mason Plumlee", "Damian Lillard", "C.J. McCollum",
"Allen Crabbe", "Noah Vonleh", "Evan Turner", "Ed Davis", "Shabazz Napier",
"Al-Farouq Aminu", "Damian Lillard", "C.J. McCollum", "Mason Plumlee",
"Allen Crabbe", "Evan Turner", "Al-Farouq Aminu", "Ed Davis",
"Noah Vonleh", "Maurice Harkless", "Meyers Leonard", "Shabazz Napier"
), TM = c("POR", "POR", "POR", "POR", "POR", "POR", "POR", "POR",
"POR", "POR", "POR", "POR", "POR", "POR", "POR", "POR", "POR",
"POR", "POR", "POR", "POR", "POR", "POR", "POR", "POR", "POR",
"POR", "POR", "POR", "POR", "POR"), MB = c(54.8, 30.9, 24.1,
14.2, 17.9, 16.3, 20.5, 4.7, 2.7, 4.7, 35.6, 36.6, 41.5, 26.8,
17.2, 5, 10.7, 5.6, 0, 13.6, 56.5, 49.5, 22.9, 22.6, 15.6, 27.9,
8.9, 4.8, 9.6, 6.2, 0), DS = c(59.5, 32.5, 28.25, 15.25, 18,
18.25, 19.25, 5.25, 2.25, 4, 35.75, 36.75, 44.25, 27.5, 16.25,
4.75, 10.5, 5.5, 0, 13.25, 58.25, 51.25, 22.5, 22.75, 16.75,
28.25, 9.5, 5, 11, 6.25, 0), Game = c(20161025L, 20161025L, 20161025L,
20161025L, 20161025L, 20161025L, 20161025L, 20161025L, 20161025L,
20161025L, 20161027L, 20161027L, 20161027L, 20161027L, 20161027L,
20161027L, 20161027L, 20161027L, 20161027L, 20161027L, 20161029L,
20161029L, 20161029L, 20161029L, 20161029L, 20161029L, 20161029L,
20161029L, 20161029L, 20161029L, 20161029L)), .Names = c("PlayerName",
"TM", "MB", "DS", "Game"), row.names = c(NA, -31L), class = "data.frame")
答案 0 :(得分:1)
您可以使用cov()
功能来实现此目的,例如:
cov_mat <- cov(t(x[,3:4]))
rownames(cov_mat) <- x$PlayerName
colnames(cov_mat) <- x$PlayerName
> cov_mat[1:3,1:3]
Damian Lillard C.J. McCollum Allen Crabbe
Damian Lillard 11.0450 3.76 9.75250
C.J. McCollum 3.7600 1.28 3.32000
Allen Crabbe 9.7525 3.32 8.61125
如果您想要计算相关性,只需将cov()
换成cor()
。
答案 1 :(得分:1)
我认为您首先需要做的是reshape
数据,因此每一行都是一个游戏,每一列都是玩家游戏的MB
。假设我们的数据位于dat
:
dat <- dat[,-c(2,4)] #remove team name and DS
#names left in data.frame
names(dat)
[1] "PlayerName" "MB" "Game"
#reshape from long to wide
dat.wide <- reshape(dat, direction = 'wide',idvar = 'Game',
timevar = 'PlayerName')
dat.wide[1:4, 1:4]
Game MB.Damian Lillard MB.C.J. McCollum MB.Allen Crabbe
1 20161025 54.8 30.9 24.1
11 20161027 41.5 26.8 17.2
21 20161029 56.5 49.5 22.6
#compute using cov function
cov_m <- cov(dat.wide[,-1], use = 'pairwise.complete')
cov_m[1:4,1:4]
MB.Damian Lillard MB.C.J. McCollum MB.Allen Crabbe MB.Noah Vonleh
MB.Damian Lillard 67.46333 71.10833 28.370 17.23
MB.C.J. McCollum 71.10833 146.34333 20.495 -23.61
MB.Allen Crabbe 28.37000 20.49500 13.170 12.75
MB.Noah Vonleh 17.23000 -23.61000 12.750 28.84