将列中的因子列转换为R中的模型矩阵

时间:2018-01-09 18:00:16

标签: r matrix model.matrix

所以我有一个data.frame,我想把它变成一个model.matrix。这一切都在R

以下是我的例子:

   col1
1 "factor1","factor2"
2 "factor1"
3 "factor3"
4 "factor1","factor2"

我想创建以下输出:

   factor1 factor2 factor3
1    1       1       0
2    1       0       0
3    0       0       1
4    1       1       0

我很感激任何建议!我一直在使用sparse.model.matrix无济于事,因为它为每个列表创建了唯一的因子列,而不是将它们识别为类似因素的列表。

这是数据开头的dput()(它要大得多):

dd = structure(list(id = c("rs62224609", "", "", "", "rs62224609", "",        "", "", "", 
"", "", "", "", "", "", "", "rs587626763", "", "", "", "rs62224609", "", "", "", "", 
"", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", 
"", "", "", "", "", "", "", "", "", "rs62224609,rs587626763", "", "", "", "", "", "", 
"", "", "", "", "rs587626763", "", "", "", "", "", "", "", "", "", "", "", 
"", "", "", "rs587626763", "", "", "", "", "", "", "", "", "", "", "", "", 
"", "", "", "", "", "", "", "", "", "", ""), library = structure(c(4L, 
4L, 4L, 5L, 5L, 5L, 4L, 4L, 5L, 5L, 4L, 4L, 4L, 4L, 4L, 4L, 5L, 
5L, 3L, 5L, 2L, 4L, 4L, 2L, 2L, 3L, 1L, 4L, 2L, 2L, 5L, 2L, 2L, 
2L, 4L, 3L, 3L, 4L, 5L, 3L, 4L, 3L, 4L, 5L, 4L, 5L, 2L, 5L, 5L, 
2L, 2L, 4L, 3L, 5L, 3L, 5L, 5L, 4L, 1L, 5L, 2L, 3L, 5L, 5L, 1L, 
4L, 1L, 2L, 4L, 5L, 1L, 3L, 4L, 4L, 2L, 1L, 4L, 2L, 5L, 5L, 1L, 
5L, 2L, 3L, 3L, 1L, 1L, 3L, 5L, 4L, 5L, 5L, 5L, 4L, 2L, 1L, 3L, 
3L, 2L, 1L), .Label = c("42", "43", "44", "45_1", "45_2"), class = "factor")), .Names = c("id", 
"library"), row.names = c(NA, 100L), class = "data.frame")

head(dd, 8)
#           id library
# 1 rs62224609    45_1
# 2               45_1
# 3               45_1
# 4               45_2
# 5 rs62224609    45_2
# 6               45_2
# 7               45_1
# 8               45_1

期望的输出

    rs62224609 
1            1
2            0
3            0
4            0
5            1
6            0
7            0
8            0

0 个答案:

没有答案