显示模型矩阵中的缺失级别

时间:2018-01-11 02:53:35

标签: r

我想知道是否有办法将列插入矩阵,以便..

p1 <- c("a","b","c","e","d","a","c")
p2 <- c("a","b","c","e","e","a","c")

p1mat <- model.matrix(~p1 + 0)
p2mat <- model.matrix(~p2 + 0)
colnames(p1mat) <- gsub("p1","",colnames(p1mat))
colnames(p2mat) <- gsub("p2","",colnames(p2mat))

这会给我p1mat

  a b c d e
1 1 0 0 0 0
2 0 1 0 0 0
3 0 0 1 0 0
4 0 0 0 0 1
5 0 0 0 1 0
6 1 0 0 0 0
7 0 0 1 0 0

p2mat

  a b c e
1 1 0 0 0
2 0 1 0 0
3 0 0 1 0
4 0 0 0 1
5 0 0 0 1
6 1 0 0 0
7 0 0 1 0

我的问题是,有没有办法潜入列表向量d只包含零到矩阵p2mat?这样

   d
   0
   0
   0
   0
   0
   0
   0

并自动对矢量进行排序,并将其放置在c列和e列之间,从而生成以下p2mat的矩阵

  a b c d e
1 1 0 0 0 0
2 0 1 0 0 0
3 0 0 1 0 0
4 0 0 0 0 1
5 0 0 0 0 1
6 1 0 0 0 0
7 0 0 1 0 0

基本上我希望矩阵p2mat查看p1mat中的每一列,以创建相同大小的矩阵,并通过虚拟矩阵跟踪数据。

谢谢。

2 个答案:

答案 0 :(得分:4)

您可以factor输入两次,确保它们都具有相同的级别。然后model.matrix应该按预期工作。

示例:

p1 <- c("a","b","c","e","d","a","c")
p2 <- c("a","b","c","e","e","a","c")

levs <- sort(unique(c(p1, p2)))
f1 <- factor(p1, levs)
f2 <- factor(p2, levs)

model.matrix(~f1 + 0)
#   f1a f1b f1c f1d f1e
# 1   1   0   0   0   0
# 2   0   1   0   0   0
# 3   0   0   1   0   0
# 4   0   0   0   0   1
# 5   0   0   0   1   0
# 6   1   0   0   0   0
# 7   0   0   1   0   0
# attr(,"assign")
# [1] 1 1 1 1 1
# attr(,"contrasts")
# attr(,"contrasts")$f1
# [1] "contr.treatment"

model.matrix(~f2 + 0)
#   f2a f2b f2c f2d f2e
# 1   1   0   0   0   0
# 2   0   1   0   0   0
# 3   0   0   1   0   0
# 4   0   0   0   0   1
# 5   0   0   0   0   1
# 6   1   0   0   0   0
# 7   0   0   1   0   0
# attr(,"assign")
# [1] 1 1 1 1 1
# attr(,"contrasts")
# attr(,"contrasts")$f2
# [1] "contr.treatment"

如果您真的想要编写函数,可能需要查看以下内容:

myfun <- function(..., overwrite = FALSE) {
  l <- setNames(list(...), sapply(substitute(list(...))[-1], deparse))
  cols <- sort(unique(unlist(lapply(l, colnames), use.names = FALSE)))
  out <- lapply(l, function(x) {
    cols_x <- c(colnames(x), setdiff(cols, colnames(x)))
    temp <- `colnames<-`(x[, match(cols, colnames(x))], cols_x)[, cols]
    replace(temp, is.na(temp), 0)
  })
  if (isTRUE(overwrite)) list2env(out, envir = .GlobalEnv)
  out
}

这会将任意数量的项目作为输入,比较所有项目中的列,并在必要时添加缺少的列。输出存储为list,如果要继续对所有矩阵执行类似操作,这是一种方便的结构。如果要覆盖原始对象,则可以将“overwrite”参数更改为TRUE

以下是一些可供使用的示例数据。

set.seed(1)
p1 <- c("a","b","c","e","d","a","c"); p2 <-c("a","b","x","e","e","a","x")
p3 <- sample(c(cols, "z"), 7, TRUE)

p1mat <- model.matrix(~p1 + 0)
p2mat <- model.matrix(~p2 + 0)
p3mat <- model.matrix(~p3 + 0)
colnames(p1mat) <- gsub("p1","",colnames(p1mat))
colnames(p2mat) <- gsub("p2","",colnames(p2mat))
colnames(p3mat) <- gsub("p3","",colnames(p3mat))

尝试输出功能:

myfun(p1mat, p2mat)
myfun(p2mat, p1mat)
myfun(p3mat, p1mat)
myfun(p3mat, p1mat, p2mat)

答案 1 :(得分:0)

此函数需要2个矩阵,并比较它们的尺寸。如果它们的尺寸不同,则将新的零列插入到具有较少列的矩阵中,在缺少的确切列位置处。因此,它产生了一个与另一个相同尺寸的新矩阵。

match_matrices <- function(matrix1, matrix2) {
    if(ncol(matrix1) != ncol(matrix2)) {
    get_cols <- function(x) { l <- list(); for(i in 1:ncol(x)) { l[i] <- list(as.numeric(x[,i])) };  return(l) }
    k <- get_cols(matrix2)
    odd_one_out <- setdiff(colnames(matrix1), colnames(matrix2))
    insert_at <- which(colnames(matrix1) == odd_one_out)
    res <- t(do.call('rbind', append(k, list(rep(0, nrow(matrix2))), insert_at-1)))
    colnames(res) <- colnames(matrix1)
    }
    return(res)
    }

使用您的矩阵:

match_matrices(p1mat, p2mat)

enter image description here