Question

我有一个数据帧，其中行作为时间，列作为主要成分

（PC1至PC10）。可以在以下提供的答案中找到一个示例：Rolling PCA

对于每一行，我想提取达到0.90的最小总和所需的PC数。在示例表中，对于每一行，将三列加起来得出的最小值为0.90；所以我想将数字3提取到单独的列中。在我的特定情况下，达到0.9所需的列数因行而异。

我想要的结果示例在最后一列（PC_N）中。

Answer 1

数据：（您应该提供随时可用的数据）

set.seed(1337)    
df1 <- as.data.frame(matrix(runif(6*4), 6, 4))

代码：

df1$PC_N <-
    apply(df1[1:4], 1, function(x) {which(cumsum(x) >= .9)[1]})

结果：

#         V1        V2         V3        V4 PC_N
#1 0.8455612 0.5753591 0.04045594 0.1168015    2
#2 0.3623455 0.7868502 0.34512398 0.5304800    2
#3 0.9092146 0.5210399 0.48515698 0.2770135    1
#4 0.6730770 0.1798602 0.45335329 0.7649627    3
#5 0.3068619 0.3963743 0.98232933 0.9653852    3
#6 0.2104455 0.7860896 0.42140667 0.7954002    2

更多细节：

apply(    # use apply over rows (1)
df1[1:4], # apply only on PC1 to PC4 (first to 4th col)
1,        # go row-wise
function(x) {
which(cumsum(x) >= .9)[1]  # get first index of the cummulated sum that is at least 0.9
})        # the end

请确保您进一步了解所使用的功能：例如?which，?apply ...

Answer 2

我要编写一个函数，该函数返回向量的元素数，这些向量加起来至少为0.9，na.rm = T，然后将其按行应用于df的适当列：

get.length <- function(x) {
  ind <- which.max(x)
  sum <- max(x)
  if (sum >= .9) {
    return(1)
  } else {
    while (sum < .9 & length(ind) != length(x)) {
      ind <- c(ind, which.max(x[-ind]))
      sum <- sum(x[ind], na.rm = T)
    }
  }
  if (sum < .9) return(NA) else return(length(ind))
}

该函数查找向量的最大值，如果小于.9，则添加下一个最大值并重复。一旦达到.9，它将返回总计至少为0.9所需的元素数。如果没有，则返回NA。

注意。即使您的PC的价值会下降，即使元素未按降序排列，该功能也能正常工作。

您可以像下面这样将函数应用于数据框df的列索引：

apply(df[ , col_indices], 1, get.length)

Answer 3

我怀疑您可能有一个prcomp对象而不是数据框，但是没关系

exampldf <- data.frame(PC1 = c(0.97, 0.40, 0.85, 0.75),
                       PC2 = c(0.01, 0.20, 0.10, 0.10),
                       PC3 = c(0.01, 0.20, 0.03, 0.10),
                       PC4 = c(0.01, 0.20, 0.02, 0.05))
rownames(exampldf) <- c("WEEK1", "WEEK2", "WEEK3", "WEEK4")
library(matrixStats)
exampldf$PC_N <- 1 + rowSums(rowCumsums(as.matrix(exampldf)) < 0.9)

产生

> exampldf
       PC1  PC2  PC3  PC4 PC_N
WEEK1 0.97 0.01 0.01 0.01    1
WEEK2 0.40 0.20 0.20 0.20    4
WEEK3 0.85 0.10 0.03 0.02    2
WEEK4 0.75 0.10 0.10 0.05    3

要达到最小总和的列数，按行

3 个答案: