如何找到原始和旋转PCA加载矩阵之间的角度?

时间:2015-07-18 12:35:26

标签: r pca rotational-matrices psych

假设我有两个PCA加载矩阵loa.origloa.rot,我知道loa.rotloa.orig的轮换(手动或其他)。

loa.orig也可能已经通过varimax或其他东西进行正交旋转,但我认为不重要。)

我知道想知道loa.orig已被轮换到loa.rot角度

我从this comment on another question了解到“旋转不通勤”,因此,成对(平面)旋转的顺序也很重要

所以loa.rot 重现loa.orig我需要知道一系列必要的轮换,最好按照下面rots给出的顺序。

这是一个MWE:

library(psych) # this allows manual rotation

# suppose I have some ORIGINAL loadings matrix, from a principal components analysis, with three retained components
loa.orig <- cbind(c(0.6101496, 0.7114088, 0.3356003, 0.7318809, 0.5980133, 0.4102817, 0.7059148, 0.6080662, 0.5089014, 0.587025, 0.6166816, 0.6728603, 0.7482675, 0.5409658, 0.6415472, 0.3655053, 0.6313868), c(-0.205317, 0.3273207, 0.7551585, -0.1981179, -0.423377, -0.07281187, -0.04180098, 0.5003459, -0.504371, 0.1942334, -0.3285095, 0.5221494, 0.1850734, -0.2993066, -0.08715662, -0.02191772, -0.2002428), c(-0.4692407, 0.1581682, -0.04574932, -0.1189175, 0.2449018, -0.5283772, 0.02826476, 0.1703277, 0.2305158, 0.2135566, -0.2783354, -0.05187637, -0.104919, 0.5054129, -0.2403471, 0.5380329, -0.07999642))

# I then rotate 1-2 by 90°, and 1-3 by 45°
loa.rot <- factor.rotate(f = loa.orig, angle = 90, col1 = 1, col2 = 2)
loa.rot <- factor.rotate(f = loa.rot, angle = 45, col1 = 1, col2 = 3)

# predictably, loa.rot and loa.orig are now different
any(loa.rot == loa.orig) # are any of them the same?

显然,在这种情况下,我知道角度和顺序,但我们假设我没有。 另外,我们假设在实际用例中可能会保留和旋转许多组件,而不仅仅是三个。

我有点不确定报告组件对(平面)旋转角度的顺序的常规方法,但我想象的列表可能的组合(~~不是排列~~)应该这样做。

combs <- combn(x = ncol(loa.orig), m = 2, simplify = TRUE)  # find all possible combinations of factors
rots <- data.frame(t(combs), stringsAsFactors = FALSE)  # transpose
rots  # these rows give the *order* in which the rotations should be done

rots给出了这些排列。

很高兴知道如何从loa.rot loa.orig到达rots,旋转varimax中行所给出的组件对。

更新:根据以下答案尝试

根据以下答案,我尝试将一个函数组合在一起,并使用varimax旋转和真实数据集对其进行测试。 (library(psych) data("Harman74.cor") # notice the correlation matrix is called "cov", but doc says its a cor matrix vanilla <- principal(r = Harman74.cor$cov, nfactors = 4, rotate = "none", )$loadings # this is unrotated class(vanilla) <- NULL # print methods only causes confusion varimax <- principal(r = Harman74.cor$cov, nfactors = 4, rotate = "varimax")$loadings # this is rotated class(varimax) <- NULL # print methods only causes confusion find.rot.instr <- function(original, rotated) { # original <- vanilla$loadings # testing # rotated <- varimax$loadings # testing getAngle <- function(A, B) acos(sum(A*B) / (norm(A, "F") * norm(B, "F"))) * 180/pi rots <- combn(x = ncol(original), m = 2, simplify = FALSE) # find all possible combinations of factor pairs tmp <- original angles <- sapply(rots, function(cols) { angle <- getAngle(tmp[, cols], rotated[, cols]) tmp <<- factor.rotate(tmp, angle = angle, col1 = cols[1], col2 = cols[2]) return(angle) }) return(angles) } vanilla.to.varimax.instr <- find.rot.instr(original = vanilla, rotated = varimax) # these are the angles we would need to transform in this order rots <- combn(x = ncol(vanilla), m = 2, simplify = FALSE) # find all possible combinations of factor pairs # this is again, because above is in function # now let's implement the extracted "recipe" varimax.recreated <- vanilla # start with original loadings varimax.recreated == vanilla # confirm that it IS the same for (i in 1:length(rots)) { # loop over all combinations, starting from the top varimax.recreated <- factor.rotate(f = varimax.recreated, angle = vanilla.to.varimax.instr[i], col1 = rots[[i]][1], col2 = rots[[i]][2]) } varimax == varimax.recreated # test whether they are the same varimax - varimax.recreated # are the close? 没有特别的理由 - 我只是想要一些我们实际上不知道角度的旋转。)。

然后我测试是否可以使用提取的角度从香草载荷中重新创建varimax旋转。

> varimax == varimax.recreated  # test whether they are the same
PC1   PC3   PC2   PC4
VisualPerception       FALSE FALSE FALSE FALSE
Cubes                  FALSE FALSE FALSE FALSE
PaperFormBoard         FALSE FALSE FALSE FALSE
Flags                  FALSE FALSE FALSE FALSE
GeneralInformation     FALSE FALSE FALSE FALSE
PargraphComprehension  FALSE FALSE FALSE FALSE
SentenceCompletion     FALSE FALSE FALSE FALSE
WordClassification     FALSE FALSE FALSE FALSE
WordMeaning            FALSE FALSE FALSE FALSE
Addition               FALSE FALSE FALSE FALSE
Code                   FALSE FALSE FALSE FALSE
CountingDots           FALSE FALSE FALSE FALSE
StraightCurvedCapitals FALSE FALSE FALSE FALSE
WordRecognition        FALSE FALSE FALSE FALSE
NumberRecognition      FALSE FALSE FALSE FALSE
FigureRecognition      FALSE FALSE FALSE FALSE
ObjectNumber           FALSE FALSE FALSE FALSE
NumberFigure           FALSE FALSE FALSE FALSE
FigureWord             FALSE FALSE FALSE FALSE
Deduction              FALSE FALSE FALSE FALSE
NumericalPuzzles       FALSE FALSE FALSE FALSE
ProblemReasoning       FALSE FALSE FALSE FALSE
SeriesCompletion       FALSE FALSE FALSE FALSE
ArithmeticProblems     FALSE FALSE FALSE FALSE
> varimax - varimax.recreated  # are the close?
PC1         PC3          PC2       PC4
VisualPerception        0.2975463  1.06789735  0.467850675 0.7740766
Cubes                   0.2317711  0.91086618  0.361004861 0.4366521
PaperFormBoard          0.1840995  0.98694002  0.369663215 0.5496151
Flags                   0.4158185  0.82820078  0.439876777 0.5312143
GeneralInformation      0.8807097 -0.33385999  0.428455899 0.7537385
PargraphComprehension   0.7604679 -0.30162120  0.389727192 0.8329341
SentenceCompletion      0.9682664 -0.39302764  0.445263121 0.6673116
WordClassification      0.7714312  0.03747430  0.460461099 0.7643221
WordMeaning             0.8010876 -0.35125832  0.396077591 0.8201986
Addition                0.4236932 -0.32573100  0.204307400 0.6380764
Code                    0.1654224 -0.01757153  0.194533996 0.9777764
CountingDots            0.3585004  0.28032822  0.301148474 0.5929926
StraightCurvedCapitals  0.5313385  0.55251701  0.452293566 0.6859854
WordRecognition        -0.3157408 -0.13019630 -0.034647588 1.1235253
NumberRecognition      -0.4221889  0.10729098 -0.035324356 1.0963785
FigureRecognition      -0.3213392  0.76012989  0.158748259 1.1327322
ObjectNumber           -0.3234966 -0.02363732 -0.007830001 1.1804147
NumberFigure           -0.2033601  0.59238705  0.170467459 1.0831672
FigureWord             -0.0788080  0.35303097  0.154132395 0.9097971
Deduction               0.3423495  0.41210812  0.363022937 0.9181519
NumericalPuzzles        0.3573858  0.57718626  0.393958036 0.8206304
ProblemReasoning        0.3430690  0.39082641  0.358095577 0.9133117
SeriesCompletion        0.4933886  0.56821932  0.465602192 0.9062039
ArithmeticProblems      0.4835965 -0.03474482  0.332889805 0.9364874

不幸的是,它们不一样,甚至不相似:(

string.IsNullOrEmpty(row.Cells[clm.Index].Value.ToString())

很明显,我犯了一个错误。

2 个答案:

答案 0 :(得分:2)

编辑:任意维数的方法

我现在有一种方法可以找到旋转矩阵的任意数量维度的欧拉角的模拟(尽管随着尺寸的增加,它变得越来越计算密集)。此方法适用于样本数据集和varimax,适用于2到6个因子。我没有测试超过6.对于5和6因素,似乎第5列增加了反射 - 我不确定是什么决定哪些列被反射,因此这在目前的例子中是硬编码的。

该方法与以前一样,使用lm.fit生成旋转矩阵。然后使用yacas使用符号矩阵乘法计算复合旋转矩阵,以便在适当数量的维度中进行任意旋转。然后迭代地解决角度。矩阵中总有一个元素基于一个角度的sin,然后可以用来迭代计算其他角度的值。

输出包括输入中实际不同的列子集,使用线性模型生成的复合旋转/反射矩阵,根据角度和列的单独旋转,计算的复合旋转矩阵,反射/列某些示例所需的交换矩阵,以及计算的旋转和输入之间差异的平方和(对于所有示例,这是1e-20的顺序)。

与我原来的解决方案不同,这只提供了一种可能的旋转顺序组合。对于factorial(n * (n-1) / 2)维度,实际可能性数量(即使仅相当于Tait-Bryan anglesn,对于6维,约为1.3e12

library("psych")
library("magrittr")
library("stringr")
library("Ryacas")
rot_mat_nd <- function(dimensions, composite_var = NULL,
                       rot_order = combn(dimensions, 2, simplify = FALSE)) {
  d <- diag(dimensions)
  storage.mode(d) <- "character"
  mats <- lapply(seq(rot_order), function(i) {
    l <- paste0("a", i)
    cmb <- rot_order[[i]]
    d[cmb[1], cmb[1]] <- sprintf("cos(%s)", l)
    d[cmb[1], cmb[2]] <- sprintf("-sin(%s)", l)
    d[cmb[2], cmb[1]] <- sprintf("sin(%s)", l)
    d[cmb[2], cmb[2]] <- sprintf("cos(%s)", l)
    paste0("{{",
           paste(apply(d, 1, paste, collapse = ","), collapse = "},{"),
           "}}")
  })

  yac_statement <- paste0("Simplify(", paste(mats, collapse = "*"), ")")
  if (!is.null(composite_var)) {
    yac_statement <- paste0(composite_var, " := ", yac_statement)
  }
  output <- yacas(yac_statement)
  list(mats = mats, composite = output, rot_order = rot_order)
}


find_angles_nd <- function(input, rotated, reflect = NULL) {
  matched_cols <- sapply(1:ncol(input), function(i)
    isTRUE(all.equal(input[, i], rotated[, i])))
  dimensions <- sum(!matched_cols)
  theor_rot <- rot_mat_nd(dimensions, "r")
  rv <- yacas("rv := Concat @ r")
  swap_mat <- matrix(0, dimensions, dimensions)
  swap_mat[cbind(1:dimensions, match(colnames(input), colnames(rotated)))] <- 1
  if (!is.null(reflect)) {
    swap_mat[, reflect] <- -swap_mat[, reflect]
  }
  input_changed <- input[, !matched_cols]
  rotated_changed <- rotated[, !matched_cols]
  rotated_swapped <- rotated_changed %*% swap_mat
  rot_mat <- lm.fit(input_changed, rotated_swapped)$coef
  rot_mat_v <- c(t(rot_mat))

  known_angles <- numeric()
  angles_to_find <- nrow(rot_mat) * (nrow(rot_mat) - 1) / 2

  iterations <- 0L
  angles_found <- -1
  while(length(known_angles) < angles_to_find & angles_found != 0) {
    iterations <- iterations + 1L
    message(sprintf("Iteration %d; angles remaining at start %d",
                iterations, angles_to_find - length(known_angles)))
    yacas(sprintf("rvwv := WithValue({%s}, {%s}, rv)",
                  paste(names(known_angles), collapse = ", "),
                  paste(known_angles, collapse = ", ")
    ))
    var_num <- yacas("MapSingle(Length, MapSingle(VarList, rvwv))") %>%
      as.expression %>%
      eval %>%
      unlist
    angles_found <- 0L
    for (i in which(var_num == 1)) {
      var <- as.character(yacas(sprintf("VarList(rvwv[%d])[1]", i)))
      if (!(var %in% names(known_angles))) {
        to_solve <- as.character(yacas(sprintf("rvwv[%d]", i)))
        fun_var <- str_extract(to_solve, sprintf("(sin|cos)\\(%s\\)", var))
        fun_c <- substr(fun_var, 1, 3)
        if (fun_c == "sin") {
          to_solve_mod <- str_replace(to_solve, fixed(fun_var), "x")
          solved <- as.character(yacas(sprintf("Solve(%s == %0.15f, x)[1]", to_solve_mod, rot_mat_v[i])))
          answer <- asin(eval(parse(text = str_replace(solved, "x == ", ""))))
          known_angles <- c(known_angles, setNames(answer, var))
          angles_found <- angles_found + 1L
        }
      }
    }
    message(sprintf("- found %d", angles_found))
  }
  calc_rot_mat <-
    matrix(unlist(simplify2array(eval(
      as.expression(theor_rot$composite),
      as.list(known_angles)
    ))), dimensions, byrow = TRUE)

  ssd <- sum((input_changed %*% calc_rot_mat %*% swap_mat - rotated_changed) ^ 2)

  angles <- known_angles[paste0("a", 1:angles_to_find)] / pi * 180

  list(rot_mat = rot_mat, calc_rot_mat = calc_rot_mat, swap_mat = swap_mat, angles = angles,
       rot_order = theor_rot$rot_order, input_changed = input_changed,
       rotated_changed = rotated_changed, sum_square_diffs = ssd)
}

factor_rotate_multiple <- function(input_changed, angles, rot_order, swap_mat) {
  rotated <- input_changed
  for (i in 1:length(angles)) {
    rotated <- factor.rotate(rotated, angles[i], rot_order[[i]][1], rot_order[[i]][2])
  }
  rotated %*% swap_mat
}

2-6维度的示例

data("Harman74.cor")  # notice the correlation matrix is called "cov", but doc says its a cor matrix

example_nd <- function(dimensions, reflect = NULL) {
  find_angles_nd(
    unclass(principal(r = Harman74.cor$cov, nfactors = dimensions, rotate = "none")$loadings),
    unclass(principal(r = Harman74.cor$cov, nfactors = dimensions, rotate = "varimax")$loadings),
    reflect = reflect
  )
}

angles_2d <- example_nd(2)

angles_2d[c("angles", "rot_order", "sum_square_diffs")]
# shows the resultant angle in degrees, rotation order and the
# sum of the squares of the differences between the provided and calculated
# rotations

#$angles
#       a1 
#-39.88672 
#
#$rot_order
#$rot_order[[1]]
#[1] 1 2
#
#
#$sum_square_diffs
#[1] 8.704914e-20

angles_3d <- example_nd(2)

angles_3d[c("angles", "rot_order", "sum_square_diffs")]
#$angles
#       a1        a2        a3 
#-45.19881 -29.77423 -17.07210 
#
#$rot_order
#$rot_order[[1]]
#[1] 1 2
#
#$rot_order[[2]]
#[1] 1 3
#
#$rot_order[[3]]
#[1] 2 3
#
#
#$sum_square_diffs
#[1] 7.498253e-20

angles_4d <- example_nd(2)
angles_5d <- example_nd(2, reflect = 5)
angles_6d <- example_nd(2, reflect = 5)

原始

这个问题可以分为两个。第一部分是计算将输入与输出相关联的复合旋转矩阵。即在等式A %*% B == C中,从BA计算C。对于方形矩阵,可以使用solve来完成。然而,在这种情况下,行&gt;列最简单的方法是使用线性模型和lm.fit函数。

问题的第二部分是识别为了产生复合旋转矩阵而执行的旋转。存在无限多种可能的组合,但是通常,就围绕三个轴的一系列旋转(即使用欧拉角)工作(对于3列)是合理的。即使这样,轮换也有六种可能的顺序。对于两列,问题是微不足道的,因为只需要一次旋转,并且单个asinacos足以识别角度。对于超过3列,可能generalise the Euler angle method但更复杂。

这是R中使用线性模型找到复合旋转矩阵和RSpincalc包以找到欧拉角的完整方法。它假设旋转影响了3列,如给出的例子。

library("RSpincalc")
library("combinat")
find_rotations <- function(input, rotated) {
  matched_cols <- sapply(1:ncol(input), function(i) isTRUE(all.equal(input[, i], rotated[, i])))
  if (sum(!matched_cols) != 3) {
    stop("This method only works for rotations affecting 3 columns.")
  }
  rot_mat <- lm.fit(input[, !matched_cols], rotated[, !matched_cols])$coef
  rot_poss <- as.data.frame(do.call("rbind", permn(c("z", "y", "x"))), stringsAsFactors = FALSE)
  rot_poss$axes_for_EA <- apply(rot_poss, 1, function(x) paste(rev(x), collapse = ""))
  combo_cols <- as.data.frame(matrix(which(!matched_cols)[combn(3, 2)], 2))
  rot_poss[5:10] <- do.call("rbind", permn(combo_cols, unlist))
  names(rot_poss)[c(1:3, 5:10)] <- c(paste0("axis", 1:3),
    paste0("rot", rep(1:3, each = 2), "_c", rep(1:2, 3)))
  rot_poss[paste0("angle", 1:3)] <- t(round(sapply(rot_poss$axes, DCM2EA, DCM = rot_mat), 14)) * 180 / pi
  rot_poss[paste0("angle", 1:3)] <-
    lapply(1:3, function(i) {
      ifelse(rot_poss[, paste0("axis", i)] == "y", 1, -1) *
        rot_poss[, paste0("angle", 4 - i)]
    })
  rot_poss
}

对于OP的数据:

rot_poss <- find_rotations(loa.orig, loa.rot)

另一个更全面的演示:

set.seed(123)
input <- matrix(rnorm(65), 13)

library("magrittr")

rotated <- input %>%
  factor.rotate(33.5, 1, 3) %>%
  factor.rotate(63, 2, 3) %>%
  factor.rotate(-3, 1, 2)

rot_poss <- find_rotations(input, rotated)

rot_poss

#  axis1 axis2 axis3 axes_for_EA rot1_c1 rot1_c2 rot2_c1 rot2_c2 rot3_c1 rot3_c2
#1     z     y     x         xyz       1       2       1       3       2       3
#2     z     x     y         yxz       1       2       2       3       1       3
#3     x     z     y         yzx       2       3       1       2       1       3
#4     x     y     z         zyx       2       3       1       3       1       2
#5     y     x     z         zxy       1       3       2       3       1       2
#6     y     z     x         xzy       1       3       1       2       2       3
#     angle1    angle2   angle3
#1 -1.585361 30.816825 63.84410
#2 44.624426 50.431683 53.53631
#3 59.538985 26.581047 16.27152
#4 66.980038 14.511490 27.52974
#5 33.500000 63.000000 -3.00000
#6 30.826477 -1.361477 63.03177

possible_calls <- do.call("sprintf", c(list(fmt = "input %%>%%
  factor.rotate(%1$g, %4$g, %5$g) %%>%%
  factor.rotate(%2$g, %6$g, %7$g) %%>%%
  factor.rotate(%3$g, %8$g, %9$g)"),
  rot_poss[c(paste0("angle", 1:3),
    grep("^rot", names(rot_poss), value = TRUE))]))
cat(possible_calls, sep = "\n\n")

#input %>%
  #factor.rotate(-1.58536, 1, 2) %>%
  #factor.rotate(30.8168, 1, 3) %>%
  #factor.rotate(63.8441, 2, 3)

#input %>%
  #factor.rotate(44.6244, 1, 2) %>%
  #factor.rotate(50.4317, 2, 3) %>%
  #factor.rotate(53.5363, 1, 3)

#input %>%
  #factor.rotate(59.539, 2, 3) %>%
  #factor.rotate(26.581, 1, 2) %>%
  #factor.rotate(16.2715, 1, 3)

#input %>%
  #factor.rotate(66.98, 2, 3) %>%
  #factor.rotate(14.5115, 1, 3) %>%
  #factor.rotate(27.5297, 1, 2)

#input %>%
  #factor.rotate(33.5, 1, 3) %>%
  #factor.rotate(63, 2, 3) %>%
  #factor.rotate(-3, 1, 2)

#input %>%
  #factor.rotate(30.8265, 1, 3) %>%
  #factor.rotate(-1.36148, 1, 2) %>%
  #factor.rotate(63.0318, 2, 3)

lapply(possible_calls, function(cl) all.equal(eval(parse(text = cl)), rotated, tolerance = 1e-6))
# tolerance reduced because above calls have rounding to 6 significant figures

注意最后一位将使用可以重现旋转的magrittr管道输出调用。另请注意,每个角度有6种可能的旋转顺序。

为了使角度匹配,我不得不在y旋转时翻转标志。我也不得不翻转轮换顺序。

对于varimax更新,我的方法适用于nfactor = 3,但需要调整为4或更高。

答案 1 :(得分:1)

我认为如果你进行矩阵代数问题会更容易(我正在做正交旋转,而不是倾斜变换,但逻辑是一样的。)

首先,注意任何旋转t都会产生一个旋转的分量矩阵c,使c = Ct,其中C是未旋转的PCA解。

然后,由于C'C是原始相关矩阵R,我们可以通过将两边乘以C'然后乘以R的倒数来求解t。这导致

t = C'R ^ -1 c

对于您的示例,请

R <- Harman74.cor$cov
P <- principal(Harman74.cor$cov,nfactors=4,rotate="none")
p <- principal(Harman74.cor$cov,nfactors=4)  #the default is to do varimax
rotation <- t(P$loadings) %*% solve(R) %*% p$loadings

然后,看看这是否正确

P$loadings %*% rotation - p$loadings

此外,即将发布的psych现在会报告旋转矩阵,因此我们可以将旋转与p $ rot.mat进行比较

round(p$rot.mat,2)
round(rotation,2)

这些组件的顺序不同,但这是因为心理在旋转后对组件进行重新排序以反映旋转组件的平方和的大小。