如何在R中的lm中获得有序分类变量的0-1个假人?

时间:2017-10-16 09:10:30

标签: r lm categorical-data

R中运行带有分类因变量的线性模型时,此变量在内部被重新编码为虚拟变量:

unord <- data.frame(y = c(1, 2, 3, 12, 11, 13, 101, 103, 102, 1003, 1002, 1001),
             cat = factor(rep(LETTERS[1:4], each = 3), ordered = FALSE))
model.matrix(lm(y~cat, data = unord))

   (Intercept) catB catC catD
1            1    0    0    0
2            1    0    0    0
3            1    0    0    0
4            1    1    0    0
5            1    1    0    0
6            1    1    0    0
7            1    0    1    0
8            1    0    1    0
9            1    0    1    0
10           1    0    0    1
11           1    0    0    1
12           1    0    0    1

我喜欢什么 但是,如果对分类因变量进行排序,则虚拟变量由于某种原因不太直观:

ord <- data.frame(y = c(1, 2, 3, 12, 11, 13, 101, 103, 102, 1003, 1002, 1001),
                cat = factor(rep(LETTERS[1:4], each = 3), ordered = TRUE))
model.matrix(lm(y~cat, data = ord))

   (Intercept)      cat.L cat.Q      cat.C
1            1 -0.6708204   0.5 -0.2236068
2            1 -0.6708204   0.5 -0.2236068
3            1 -0.6708204   0.5 -0.2236068
4            1 -0.2236068  -0.5  0.6708204
5            1 -0.2236068  -0.5  0.6708204
6            1 -0.2236068  -0.5  0.6708204
7            1  0.2236068  -0.5 -0.6708204
8            1  0.2236068  -0.5 -0.6708204
9            1  0.2236068  -0.5 -0.6708204
10           1  0.6708204   0.5  0.2236068
11           1  0.6708204   0.5  0.2236068
12           1  0.6708204   0.5  0.2236068

问题是如何为有序的分类变量获取“常用”虚拟变量?注意:问题不在于如何正确使用排序中的信息(https://stats.stackexchange.com/questions/33413/continuous-dependent-variable-with-ordinal-independent-variable)。

1 个答案:

答案 0 :(得分:2)

您可以在contrasts中使用lm参数或model.matrix中的lm强制使用哪个对比度(因为我删除了额外的model.matrix(y ~ cat, data = ord, contrasts = list(cat=contr.treatment)) 调用)

nms <- names(ord)[sapply(ord, is.factor)] # get names of factor variables
model.matrix(y ~ cat, data = ord, 
                contrasts = sapply(nms, function(x) list(contr.treatment)))

如果您有多个因子列,则可以执行

╔═ markl@macbook: /var/folders/xp/n5tbdrrs761ck82qqychcf61ptmq9d/T
╚═ ♪ tree tmp.7cHYDVc8rX
tmp.7cHYDVc8rX
├── file
└── subdir
    ├── anothersubdir
    │   └── file
    └── file

2 directories, 3 files

╔═ markl@macbook: /var/folders/xp/n5tbdrrs761ck82qqychcf61ptmq9d/T
╚═ ♪ find ./**/*/file -type f
./tmp.7cHYDVc8rX/file
./tmp.7cHYDVc8rX/subdir/anothersubdir/file
./tmp.7cHYDVc8rX/subdir/file

╔═ markl@macbook: /var/folders/xp/n5tbdrrs761ck82qqychcf61ptmq9d/T
╚═ ♪ node
> require('child_process').execSync("find ./**/*/file -type f").toString()
'./tmp.7cHYDVc8rX/subdir/file\n'
> %

╔═ markl@macbook: /var/folders/xp/n5tbdrrs761ck82qqychcf61ptmq9d/T
╚═ ♪ node -v
v7.9.0