Question

根据findCorrelation() document我运行的官方示例如下所示：

代码：

library(caret)

R1 <- structure(c(1, 0.86, 0.56, 0.32, 0.85, 0.86, 1, 0.01, 0.74, 0.32, 
                  0.56, 0.01, 1, 0.65, 0.91, 0.32, 0.74, 0.65, 1, 0.36,
                  0.85, 0.32, 0.91, 0.36, 1), 
                .Dim = c(5L, 5L))


colnames(R1) <- rownames(R1) <- paste0("x", 1:ncol(R1))

findCorrelation(R1, cutoff = .6, exact = TRUE, names = TRUE
                ,verbose = TRUE)

结果：

> findCorrelation(R1, cutoff = .6, exact = TRUE, names = TRUE, verbose = TRUE)
## Compare row 1  and column  5 with corr  0.85 
##   Means:  0.648 vs 0.545 so flagging column 1 
## Compare row 5  and column  3 with corr  0.91 
##   Means:  0.53 vs 0.49 so flagging column 5 
## Compare row 3  and column  4 with corr  0.65 
##   Means:  0.33 vs 0.352 so flagging column 4 
## All correlations <= 0.6 
## [1] "x1" "x5" "x4"

我不知道计算过程是如何工作的，我。即为什么首先比较row 1和column 5，以及如何计算均值，即使我已阅读the source file。

我希望有人可以借助我的例子解释算法。

Answer 1

首先，它确定每个变量的平均绝对相关性。列x1和x5的平均值分别最高（mean(c(0.85, 0.56, 0.32, 0.86))和mean(c(0.85, 0.9, 0.36, 0.32))），因此它会在第一步中删除其中一个。它发现x1是全球最具攻击性的，所以它将其删除。

之后，它会使用相同的过程重新计算并比较x5和x3。

在删除三列后停止，因为所有成对相关都低于您的阈值。

r-设置exact = True时，混淆findCorrelation（）（插入符号包）的详细信息

1 个答案: