Question

我有一个数据框，我想用它来生成设计矩阵。

>ct<-read.delim(filename, skip=0, as.is=TRUE, sep="\t", row.names = 1)
> ct
      s2 s6 S10 S14 S3 S7 S11 S15 S4 S8 S12 S16
group  1  1   1   1  2  2   2   2  3  3   3   3
donor  1  2   3   4  1  2   3   4  1  2   3   4
>factotum<-apply(ct,1,as.factor) # to turn rows into factors. 
>design <- model.matrix(~0 + factotum[,1] + factotum[,2])

最终，我将生成一个字符串并使用as.formula（）而不是硬编码公式。无论如何，这工作并产生一个设计矩阵。它留下了一个专栏。

>design
   factotum[, 1]1 factotum[, 1]2 factotum[, 1]3 factotum[, 2]2 factotum[, 2]3 factotum[, 2]4
1               1              0              0              0              0              0
2               1              0              0              1              0              0
3               1              0              0              0              1              0
4               1              0              0              0              0              1
5               0              1              0              0              0              0
6               0              1              0              1              0              0
7               0              1              0              0              1              0
8               0              1              0              0              0              1
9               0              0              1              0              0              0
10              0              0              1              1              0              0
11              0              0              1              0              1              0
12              0              0              1              0              0              1

根据我的推理，列名应为： factotum [，1] 1 factotum [，1] 2 factotum [，1] 3，factotum [，2] 1，factotum [，2] 2 factotum [，2] 3 factotum [，2] 4。这些将被重命名为group1，group2，group3，donor1，donor2，donor3，donor4。

这意味着缺少factotum [，2] 1或donor1。我在做什么，这会丢失？任何帮助将不胜感激。

干杯本。

Answer 1

这里有几件事。

（1）apply(ct,1,as.factor)不一定将行变成因子。试试str(factotum)，您会发现它失败了。我不确定最快的方法是什么，但这应该有效：

factotum <- data.frame(lapply(data.frame(t(ct)), as.factor))

（2）由于您正在使用因子，model.matrix会创建虚拟编码。在这种情况下，donor有四个值。如果您是2，那么您会在1列中获得factotum[,2]2。如果您是3或4，则会在各自的列中获得1。那么如果你是1怎么办？嗯，这只是意味着你在所有三列中都是0。这样，您只需要三列即可创建四个组。 1的值donor在此处称为参考组，即与其他组进行比较的组。

（3）现在问题是......为什么group（或factotum[,1]）只有两列？我们可以轻松地用两列编写三个级别，对吧？嗯......是的，这正是你使用时发生的事情：

design <- model.matrix(~ factotum[,1] + factotum[,2])

但是，由于您指定没有拦截，因此您将获得group的额外列。

（4）通常您不必自己创建设计矩阵。我不确定你接下来要使用什么功能，但在大多数情况下，这些功能会为你处理它。

model.matrix丢弃一列

1 个答案: