VZ.Close CBOU.Close SBUX.Close T.Close
VZ.Close 1.0000000 0.5804478 0.8872978 0.9480894
CBOU.Close 0.5804478 1.0000000 0.7876277 0.4988890
SBUX.Close 0.8872978 0.7876277 1.0000000 0.8143305
T.Close 0.9480894 0.4988890 0.8143305 1.0000000
所以,假设我有股价之间的这些相关性。我想看看第一行并找到具有最高相关性的对。这将是VZ和T.然后我想删除这两个股票作为期权。然后,在剩余的股票中找到具有最高相关性的货币对。等等,直到所有股票都配对。在这个例子中,它显然是CBOU和SBUX,因为它们只剩下2个,但我希望代码能够容纳任意数量的对。
答案 0 :(得分:4)
如果您想查看每个步骤的最大相关性,这是一个解决方案。因此,第一步不仅仅是第一行,而是整个矩阵。
示例数据:
d <- matrix(runif(36),ncol=6,nrow=6)
rownames(d) <- colnames(d) <- LETTERS[1:6]
diag(d) <- 1
d
A B C D E F
A 1.00000000 0.65209204 0.8520392 0.26980214 0.5844000 0.69335143
B 0.73531603 1.00000000 0.5499431 0.60511580 0.7483990 0.14788134
C 0.56433218 0.27242769 1.0000000 0.07952776 0.2147628 0.03711562
D 0.91756919 0.04853523 0.5554490 1.00000000 0.4344089 0.23381447
E 0.06897889 0.80740821 0.7974340 0.87425643 1.0000000 0.74546072
F 0.19961474 0.61665231 0.2829632 0.58110694 0.7433924 1.00000000
代码:
results <- data.frame(v1=character(0), v2=character(0), cor=numeric(0), stringsAsFactors=FALSE)
diag(d) <- 0
while (sum(d>0)>1) {
maxval <- max(d)
max <- which(d==maxval, arr.ind=TRUE)[1,]
results <- rbind(results, data.frame(v1=rownames(d)[max[1]], v2=colnames(d)[max[2]], cor=maxval))
d[max[1],] <- 0
d[,max[1]] <- 0
d[max[2],] <- 0
d[,max[2]] <- 0
}
给出了:
v1 v2 cor
1 D A 0.9175692
2 E B 0.8074082
3 F C 0.2829632
答案 1 :(得分:0)
我认为这回答了你的问题,但我不能确定原来的问题有点模棱两可......
# Construct toy example of symmentrical matrix
# nc is number of rows/columns in matrix, in the problem above it was 4, but let's try with 6
nc <- 6
mat <- diag( 1 , nc )
# Create toy correlation data for matrix
dat <- runif( ( (nc^2-nc)/2 ) )
# Fill both triangles of matrix so it is symmetric
mat[lower.tri( mat ) ] <- dat
mat[upper.tri( mat ) ] <- dat
# Create vector of random string names for row/column names
names <- replicate( nc , expr = paste( sample( c( letters , LETTERS ) , 3 , replace = TRUE ) , collapse = "" ) )
dimnames(mat) <- list( names , names )
# Sanity check
mat
SXK llq xFL RVW oYQ Seb
SXK 1.000 0.973 0.499 0.585 0.813 0.751
llq 0.973 1.000 0.075 0.533 0.794 0.826
xFL 0.499 0.099 1.000 0.099 0.481 0.968
RVW 0.075 0.813 0.620 1.000 0.620 0.307
oYQ 0.585 0.794 0.751 0.968 1.000 0.682
Seb 0.533 0.481 0.826 0.307 0.682 1.000
# Ok - to problem at hand , you can just substitute your matrix into these lines:
# Clearly the diagonal in a correlation matrix will be 1 so this is excluded as per your problem
diag( mat ) <- NA
# Now find the next highest correlation in each row and set this to NA
mat <- t( apply( mat , 1 , function(x) { x[ which.max(x) ] <- NA ; return(x) } ) )
# Another sanity check...!
mat
SXK llq xFL RVW oYQ Seb
SXK NA NA 0.499 0.585 0.813 0.751
llq NA NA 0.075 0.533 0.794 0.826
xFL 0.499 0.099 NA 0.099 0.481 NA
RVW 0.075 NA 0.620 NA 0.620 0.307
oYQ 0.585 0.794 0.751 NA NA 0.682
Seb 0.533 0.481 NA 0.307 0.682 NA
# Now return the two remaining columns with greatest correlation in that row
res <- t( apply( mat , 1 , function(x) { y <- names( sort(x , TRUE ) )[1:2] ; return( y ) } ) )
res
[,1] [,2]
SXK "oYQ" "Seb"
llq "Seb" "oYQ"
xFL "SXK" "oYQ"
RVW "xFL" "oYQ"
oYQ "llq" "xFL"
Seb "oYQ" "SXK"
这会回答你的问题吗?