所以我有一个data.frame,我想把它变成一个model.matrix。这一切都在R
中以下是我的例子:
col1
1 "factor1","factor2"
2 "factor1"
3 "factor3"
4 "factor1","factor2"
我想创建以下输出:
factor1 factor2 factor3
1 1 1 0
2 1 0 0
3 0 0 1
4 1 1 0
我很感激任何建议!我一直在使用sparse.model.matrix无济于事,因为它为每个列表创建了唯一的因子列,而不是将它们识别为类似因素的列表。
这是数据开头的dput()(它要大得多):
dd = structure(list(id = c("rs62224609", "", "", "", "rs62224609", "", "", "", "",
"", "", "", "", "", "", "", "rs587626763", "", "", "", "rs62224609", "", "", "", "",
"", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "",
"", "", "", "", "", "", "", "", "", "rs62224609,rs587626763", "", "", "", "", "", "",
"", "", "", "", "rs587626763", "", "", "", "", "", "", "", "", "", "", "",
"", "", "", "rs587626763", "", "", "", "", "", "", "", "", "", "", "", "",
"", "", "", "", "", "", "", "", "", "", ""), library = structure(c(4L,
4L, 4L, 5L, 5L, 5L, 4L, 4L, 5L, 5L, 4L, 4L, 4L, 4L, 4L, 4L, 5L,
5L, 3L, 5L, 2L, 4L, 4L, 2L, 2L, 3L, 1L, 4L, 2L, 2L, 5L, 2L, 2L,
2L, 4L, 3L, 3L, 4L, 5L, 3L, 4L, 3L, 4L, 5L, 4L, 5L, 2L, 5L, 5L,
2L, 2L, 4L, 3L, 5L, 3L, 5L, 5L, 4L, 1L, 5L, 2L, 3L, 5L, 5L, 1L,
4L, 1L, 2L, 4L, 5L, 1L, 3L, 4L, 4L, 2L, 1L, 4L, 2L, 5L, 5L, 1L,
5L, 2L, 3L, 3L, 1L, 1L, 3L, 5L, 4L, 5L, 5L, 5L, 4L, 2L, 1L, 3L,
3L, 2L, 1L), .Label = c("42", "43", "44", "45_1", "45_2"), class = "factor")), .Names = c("id",
"library"), row.names = c(NA, 100L), class = "data.frame")
head(dd, 8)
# id library
# 1 rs62224609 45_1
# 2 45_1
# 3 45_1
# 4 45_2
# 5 rs62224609 45_2
# 6 45_2
# 7 45_1
# 8 45_1
期望的输出
rs62224609
1 1
2 0
3 0
4 0
5 1
6 0
7 0
8 0