R - 如何根据另一个矩阵

时间:2015-05-12 20:24:17

标签: r if-statement matrix subset covariance

我在R中有以下协方差矩阵:

AB-2000 AB-2600 AB-3500 AC-0100 AD-0100 AF-0200
AB-2000 6.5 NA  -1.8    3.65    -17.96  -26.5
AB-2600 NA  7.18    NA  NA  NA  NA
AB-3500 -1.79   NA  5.4 NA  -4.63   NA
AC-0100 3.65    NA  NA  4.22    9.8 NA
AD-0100 -17.96  NA  -4.63   9.8 5.9 NA
AF-0200 -26.5   NA  NA  NA  NA  4.28

每列和每行对应一个足球运动员(即AB-2000)。所以AB-2000,AB-2000的交集给出了球员表现的差异。像AB-2000,AF-0200这样的行给出了两个球员表现的协方差。

目前,矩阵显示所有协方差值。但是,并非所有协方差值都很重要。事实上,唯一重要的是当两个玩家在那周玩同一个游戏时(在这种情况下,具有相同的游戏ID(GID))。

下表显示了某一周玩家的GID:

GID PLAYER
3467    AB-2000
3460    AB-2600
3463    AB-3500
3467    AC-0100
3458    AD-0100
3461    AF-0200

当两个玩家拥有相同的GID时,如何仅保留协方差矩阵中的值(例如,玩家AB-2000和AC-0100)?

感谢您的帮助!

1 个答案:

答案 0 :(得分:1)

I think this does what you're asking, if I'm interpreting the question correctly. I've given you a couple solutions, pick your poison. The first relies on a nested for loop which could be slow and further optimized if you knew for sure your matrix was symmetric.

> as.character(c(1,2))
[1] "1" "2"

Alternatively this solution does relies on the m <- read.table(header=T, stringsAsFactors=F, text=" AB-2000 AB-2600 AB-3500 AC-0100 AD-0100 AF-0200 AB-2000 6.5 NA -1.8 3.65 -17.96 -26.5 AB-2600 NA 7.18 NA NA NA NA AB-3500 -1.79 NA 5.4 NA -4.63 NA AC-0100 3.65 NA NA 4.22 9.8 NA AD-0100 -17.96 NA -4.63 9.8 5.9 NA AF-0200 -26.5 NA NA NA NA 4.28 ") p <- read.table(header=T, stringsAsFactors=F, text=" GID PLAYER 3467 AB-2000 3460 AB-2600 3463 AB-3500 3467 AC-0100 3458 AD-0100 3461 AF-0200 ") m_t2 <- cm names(m_t2) <- row.names(m_t2) ## Replace names with GID: row_names <- p$GID[which(p$PLAYER == row.names(m_t2))] col_names <- p$GID[which(p$PLAYER == names(m_t2))] for (i in 1:nrow(m_t2)) { m_t2[i, col_names != row_names[i]] <- NA } m_t2 <- as.matrix(m_t2) and tidyr packages but it should be quite efficient for very large datasets:

dplyr

The solution in either case looks like this:

m <- cm
names(m) <- row.names(m)
m$row_names <- row.names(m)

library(tidyr)
library(dplyr)

d <- m %>% 
  gather(col_names, "cv", -row_names, convert=T) %>% 
  left_join(p, by = c("row_names" = "PLAYER")) %>% 
  mutate(GID_row = GID) %>% 
  select(-GID) %>% 
  left_join(p, by=c("col_names" = "PLAYER")) %>% 
  mutate(GID_col = GID) %>% 
  mutate(new_cv = ifelse((GID_row == GID_col), cv, NA)) %>%
  select(row_names, col_names, new_cv) %>% 
  spread(col_names, new_cv)

m_t <- as.matrix(d[,-1])
row.names(m_t) <- d[["row_names"]]