Question

我想创建一个指标变量矩阵。我最初的想法是使用model.matrix，这也在这里建议：Automatically expanding an R factor into a collection of 1/0 indicator variables for every factor level

但是，如果因子只有一个级别，则model.matrix似乎不起作用。

以下是一个示例数据集，其中包含三个级别的因子'region'：

dat = read.table(text = "
    reg1    reg2    reg3   
      1       0       0
      1       0       0
      1       0       0
      1       0       0
      1       0       0
      1       0       0
      0       1       0
      0       1       0
      0       1       0
      0       0       1
      0       0       1
      0       0       1
      0       0       1
", sep = "", header = TRUE)

# model.matrix works if there are multiple regions:

region <- c(1,1,1,1,1,1,2,2,2,3,3,3,3)

df.region <- as.data.frame(region)

df.region$region <- as.factor(df.region$region)

my.matrix <- as.data.frame(model.matrix(~ -1 + df.region$region, df.region))
my.matrix


# The following for-loop works even if there is only one level to the factor
# (one region):

# region <- c(1,1,1,1,1,1,1,1,1,1,1,1,1)

my.matrix <- matrix(0, nrow=length(region), ncol=length(unique(region)))

for(i in 1:length(region)) {my.matrix[i,region[i]]=1}
my.matrix

for循环很有效，看起来很简单。但是，我一直在努力想出一个不涉及循环的解决方案。我可以使用上面的循环，但一直努力让自己摆脱它们。还有更好的方法吗？

Answer 1

我会使用矩阵索引。来自?"["：

第三种索引形式是通过数字矩阵，每个维度都有一列：索引矩阵的每一行然后选择数组的单个元素，结果是一个向量。

利用这个不错的功能：

my.matrix <- matrix(0, nrow=length(region), ncol=length(unique(region)))
my.matrix[cbind(seq_along(region), region)] <- 1

#       [,1] [,2] [,3]
#  [1,]    1    0    0
#  [2,]    1    0    0
#  [3,]    1    0    0
#  [4,]    1    0    0
#  [5,]    1    0    0
#  [6,]    1    0    0
#  [7,]    0    1    0
#  [8,]    0    1    0
#  [9,]    0    1    0
# [10,]    0    0    1
# [11,]    0    0    1
# [12,]    0    0    1
# [13,]    0    0    1

Answer 2

我通过修改类似问题的答案来提出这个解决方案：

Reshaping a column from a data frame into several columns using R

region <- c(1,1,1,1,1,1,2,2,2,3,3,3,3)
site <- seq(1:length(region))
df <- cbind(site, region)
ind <- xtabs( ~ site + region, df)
ind

region <- c(1,1,1,1,1,1,1,1,1,1,1,1,1)
site <- seq(1:length(region))
df <- cbind(site, region)
ind <- xtabs( ~ site + region, df)
ind

编辑：

以下行将从ind：

中提取指标变量的数据框

ind.matrix <- as.data.frame.matrix(ind)

创建指标变量矩阵

2 个答案: