使用pls包的变量和组件之间的相关性

时间:2016-08-28 06:41:56

标签: r

使用下面的data.frame(来源:http://eric.univ-lyon2.fr/~ricco/tanagra/fichiers/en_Tanagra_PLSR_Software_Comparison.pdf

数据

df <- read.table(text = c("
diesel  twodoors    sportsstyle wheelbase   length  width   height  curbweight  enginesize  horsepower  horse_per_weight    conscity    price   symboling
0   1   0   97  172 66  56  2209    109 85  0.0385  8.7 7975    2
0   0   0   100 177 66  54  2337    109 102 0.0436  9.8 13950   2
0   0   0   116 203 72  57  3740    234 155 0.0414  14.7    34184   -1
0   1   1   103 184 68  52  3016    171 161 0.0534  12.4    15998   3
0   0   0   101 177 65  54  2765    164 121 0.0438  11.2    21105   0
0   1   0   90  169 65  52  2756    194 207 0.0751  13.8    34028   3
1   0   0   105 175 66  54  2700    134 72  0.0267  7.6 18344   0
0   0   0   108 187 68  57  3020    120 97  0.0321  12.4    11900   0
0   0   1   94  157 64  51  1967    90  68  0.0346  7.6 6229    1
0   1   0   95  169 64  53  2265    98  112 0.0494  9.0 9298    1
1   0   0   96  166 64  53  2275    110 56  0.0246  6.9 7898    0
0   1   0   100 177 66  53  2507    136 110 0.0439  12.4    15250   2
0   1   1   94  157 64  51  1876    90  68  0.0362  6.4 5572    1
0   0   0   95  170 64  54  2024    97  69  0.0341  7.6 7349    1
0   1   1   95  171 66  52  2823    152 154 0.0546  12.4    16500   1
0   0   0   103 175 65  60  2535    122 88  0.0347  9.8 8921    -1
0   0   0   113 200 70  53  4066    258 176 0.0433  15.7    32250   0
0   0   0   95  165 64  55  1938    97  69  0.0356  7.6 6849    1
1   0   0   97  172 66  56  2319    97  68  0.0293  6.4 9495    2
0   0   0   97  172 66  56  2275    109 85  0.0374  8.7 8495    2"), header = T)

和这个

代码

library(plsdepot)
df.plsdepot = plsreg1(df[, 1:11], df[, 14, drop = FALSE], comps = 3)
data<-df.plsdepot$cor.xyt
data<-as.data.frame(data)

我得到了data.frame变量和组件之间的相关性

data
#                          t1          t2           t3
#diesel           -0.23513860 -0.38154681  0.439221649
#twodoors          0.71849247  0.45622386  0.055982798
#sportsstyle       0.51909329 -0.02381952 -0.672617464
#wheelbase        -0.86843937  0.34114664 -0.254589548
#length           -0.75311884  0.62404991 -0.085596033
#width            -0.67444970  0.62282146 -0.158675019
#height           -0.67228557 -0.14675385  0.317166599
#curbweight       -0.59305898  0.73532560 -0.241983833
#enginesize       -0.39475651  0.82353941 -0.252270394
#horsepower        0.04843256  0.96637015 -0.148407288
#horse_per_weight  0.50515322  0.81502376 -0.006045151
#symboling         0.64900253  0.23673633  0.346902434

我设法将它们绘制如下

library(plsdepot)
df.plsdepot = plsreg1(df[, 1:11], df[, 14, drop = FALSE], comps = 3)
plot(df.plsdepot, comps = c(1, 2))

enter image description here

我必须使用pls包而不是plsdepot。我需要获得变量和组件之间的相关性并绘制它们

使用pls,我设法绘制变量和组件之间的相关性,如下所示

library(pls)
Y <- as.matrix(df[,14])
X <- as.matrix(df[,1:11])
df.pls <- mvr(Y ~ X, ncomp = 3, method = "oscorespls", scale = T)
plot(df.pls, "correlation")

enter image description here

但是,我找不到获取这些值(变量和组件之间的相关性)并使用pls包将它们转换为data.frame的方法。

任何帮助如何使用pls包获得这些相关值将受到高度赞赏?

1 个答案:

答案 0 :(得分:3)

感谢Bjørn-Helge Mevik(pls包的维护者),感谢下面的回答

=============================================== ===========================

如果你看一下corrplot代码:

> corrplot
function (object, comps = 1:2, labels, radii = c(sqrt(1/2), 1), 
    identify = FALSE, type = "p", xlab, ylab, ...) {
    nComps <- length(comps)
    if (nComps < 2) 
        stop("At least two components must be selected.")
    if (is.matrix(object)) {
        cl <- object[, comps, drop = FALSE]
        varlab <- colnames(cl)
    }
    else {
        S <- scores(object)[, comps, drop = FALSE]
        if (is.null(S)) 
            stop("`", deparse(substitute(object)), "' has no scores.")
        cl <- cor(model.matrix(object), S)
        varlab <- compnames(object, comps, explvar = TRUE)
    }

你会看到它基本上是

S <- scores(object)[, comps, drop = FALSE]
cl <- cor(model.matrix(object), S)

计算相关负载。使用df.pls代替object可以为您提供相关加载矩阵。

S <- scores(df.pls)[, comps= 1:2, drop = FALSE]
cl <- cor(model.matrix(df.pls), S)
df.cor <- as.data.frame(cl)
df.cor
#                      Comp 1      Comp 2
#diesel           -0.23513860 -0.38154681
#twodoors          0.71849247  0.45622386
#sportsstyle       0.51909329 -0.02381952
#wheelbase        -0.86843937  0.34114664
#length           -0.75311884  0.62404991
#width            -0.67444970  0.62282146
#height           -0.67228557 -0.14675385
#curbweight       -0.59305898  0.73532560
#enginesize       -0.39475651  0.82353941
#horsepower        0.04843256  0.96637015
#horse_per_weight  0.50515322  0.81502376