Question

问题：将用于回归的数据集减少为几个较小的集合，其中变量在矩阵内相关但在矩阵之间独立。我有一个包含1000个虚拟变量的大型数据集，但每行只有几个“正数”，而内存限制了我构建不同模型的能力。因此，我试图将数据集拆分为多个集合，其中集合中的变量之间存在线性相关性，而其他集合中则没有任何相关性。

小例子：

M1 <- c(1L,0L,0L,0L,1L,1L,0L,0L,0L,0L,1L,1L,0L,0L,1L,0L)
dim(M1) <- c(4,4)

此处M1可分为两个“独立矩阵”：

M2 <- c(1,0,1,1)
M3 <- c(1,1,1,0)

但是将M1更改为

M1[3,2] <- 1

将使所有行相关，因此无法拆分。

理想情况下，我想要的是一个长度（行数nr）的向量，用于指定一行属于哪个子集，以便可以对每个子集应用回归。因此，原始情况下的结果将是一个向量：

R <- c(1,1,2,2)

问题与等级有关，但是我能够找到的所有答案都与减小矩阵的暗淡有关，而不是将矩阵分为独立的部分。

Answer 1

通过矩阵迭代是一个解决方案，它由以下函数（仅2d）实现。不美观，也不使用矩阵信息。但是发布作为解决问题的一种方式：

`%ni%` <- Negate(`%in%`)
data <- hjlpmidMatrix


getRow <- function(data, col)
  {
    as.vector(which(data[,col] == 1))

  }
getCol <- function(data, row)
{
    as.vector(which(data[row,] == 1))
}


splitmatrix <- function(data) {
if (!is.matrix(data)) {
  stop("no data frame assigned to function")
  }
if (dim(data)[2] < 1) {
  stop("no columns in data")
}
vector <- dim(c(1,2))
i <- 1
col <- 1

repeat {
  rowIndex <- NULL
  colIndex <- NULL
repeat {
col <- col[col %ni% colIndex]
if (is_empty(col)) {break}
colIndex <- c(colIndex, col)
if (length(col) != 0) { row <- sapply(col,FUN = getRow, data = data) %>% unlist %>% unique()}

row <- row[row %ni% rowIndex]
if (is_empty(row)) {break}
  rowIndex <- c(rowIndex, row)
if (length(row) != 0) { col <- sapply(row,FUN = getCol, data = data) %>% unlist %>% unique()}

}

vector <- rbind(vector, cbind(i, rowIndex))
if (dim(vector)[1] < dim(data)[1])
  {
  i <- i + 1
  col <- (1:dim(data)[2])[1:dim(data)[2] %ni% colIndex]
}
else
  {break}
}
return(vector[,1])

}

将稀疏矩阵拆分为线性独立子矩阵以进行回归

1 个答案: