我正在寻找生成虚拟变量的方法,这些变量将给定的类别分成所有可能的分组组合。例如,如果我们有三个类别(比如A,B和C),则有五种可能的分组:
Three groups: A / B / C
Two groups: A&B / C
Two groups: A&C / B
Two groups: A / B&C
One group: A&B&C
然后,每个分组的虚拟变量将输出到数据帧的不同列。所以我想要的最终输出如下表所示:
sample_num category grouping1 grouping2 grouping3 grouping4 grouping5
A; B; C A&B; C A&C; B A; B&C A&B&C
-----------+---------+------------+-----------+-----------+-----------+----------
1 A 1 1 1 1 1
2 A 1 1 1 1 1
3 A 1 1 1 1 1
4 A 1 1 1 1 1
5 B 2 1 2 2 1
6 B 2 1 2 2 1
7 B 2 1 2 2 1
8 C 3 2 1 2 1
9 C 3 2 1 2 1
10 C 3 2 1 2 1
11 C 3 2 1 2 1
12 C 3 2 1 2 1
答案 0 :(得分:2)
stats
包中的model.matrix函数(默认加载)将构造“虚拟变量”,尽管不是您描述的那种。第一个参数是R“公式”:
>dat <- read.table(text="sample_num category
+ 1 A
+ 2 A
+ 3 A
+ 4 A
+ 5 B
+ 6 B
+ 7 B
+ 8 C
+ 9 C
+ 10 C
+ 11 C
+ 12 C", header=TRUE)
> model.matrix( ~category, data=dat)
(Intercept) categoryB categoryC
1 1 0 0
2 1 0 0
3 1 0 0
4 1 0 0
5 1 1 0
6 1 1 0
7 1 1 0
8 1 0 1
9 1 0 1
10 1 0 1
11 1 0 1
12 1 0 1
attr(,"assign")
[1] 0 1 1
attr(,"contrasts")
attr(,"contrasts")$category
[1] "contr.treatment"
我(强烈)怀疑你的四列假人必须是线性依赖的,其中一个会被回归函数拒绝。其他对比论据是可能的。你应该学习:
?model.matrix
?contrasts
这是总和 - 没有拦截:
> model.matrix(~category+0, data=dat, contrasts = list(category = "contr.sum"))
categoryA categoryB categoryC
1 1 0 0
2 1 0 0
3 1 0 0
4 1 0 0
5 0 1 0
6 0 1 0
7 0 1 0
8 0 0 1
9 0 0 1
10 0 0 1
11 0 0 1
12 0 0 1
attr(,"assign")
[1] 1 1 1
attr(,"contrasts")
attr(,"contrasts")$category
[1] "contr.sum"
如果您想要查看不同级别的交互的自动计算,您将需要三个变量,而不是一个具有三个级别的变量:
> dat <- expand.grid(A=letters[1:3], B=letters[4:6], C=letters[7:9])
> str(model.matrix( ~ A*B*C))
Error in str(model.matrix(~A * B * C)) :
error in evaluating the argument 'object' in selecting a method for function 'str': Error in model.frame.default(object, data, xlev = xlev) :
invalid type (closure) for variable 'C'
> str(model.matrix( ~ A*B*C, data=dat))
num [1:27, 1:27] 1 1 1 1 1 1 1 1 1 1 ...
- attr(*, "dimnames")=List of 2
..$ : chr [1:27] "1" "2" "3" "4" ...
..$ : chr [1:27] "(Intercept)" "Ab" "Ac" "Be" ...
- attr(*, "assign")= int [1:27] 0 1 1 2 2 3 3 4 4 4 ...
- attr(*, "contrasts")=List of 3
..$ A: chr "contr.treatment"
..$ B: chr "contr.treatment"
..$ C: chr "contr.treatment"
model.matrix( ~ A*B*C, data=dat)
omitted output