Question

我的sid_set，data.table中有一个名为toytable的变量：

toytable <- data.table(id = c(1, 2, 3, 4),
                       sid_set = c("a, b, c", 
                                   "c, b", 
                                   "a", 
                                   "d, b") 
                       )
> toytable
   id sid_set
1:  1 a, b, c
2:  2    c, b
3:  3       a
4:  4    d, b

所以sid_set是一个可变长度的字符串，其中每个字符串由一组不同的值组成。在sid_set中可以观察到大约1,500个不同的可能值。

我正在尝试为每个不同的可能值获取虚拟变量，如下所示：

dummy variables:    a     b     c     d
row1:               1     1     1     0
row2:               0     1     1     0
row3:               1     0     0     0
row4:               0     1     0     1

鉴于我的数据格式以及我想要实现的目标，有人可以就可能提供帮助的软件包分享一些潜在客户吗？

我已经尝试或调查过：Matrix的{{1}}，sparse.model.matrix()的{{1}}并试图考虑一些caret解决方案而没有太大进展

我尝试过Split column at delimiter in data frame和Convert a dataframe to presence absence matrix的组合：

createDataPartition

编辑：

感谢下面的评论者，我想出了：

dplyr

这很不错。但有没有办法避免价值＆gt; > foo <- data.frame(do.call('rbind', strsplit(as.character(toytable$sid_set),', ',fixed=TRUE))) Warning message: In rbind(c("a", "b", "c"), c("c", "b"), "a", c("d", "b")) : number of columns of result is not a multiple of vector length (arg 2) > head(foo) X1 X2 X3 1 a b c 2 c b c 3 a a a 4 d b d > df2 <- melt(foo, id.var = "X1") Warning message: attributes are not identical across measure variables; they will be dropped > with(df2, table(V1, value)) Error in table(V1, value) : object 'V1' not found中的1？

R：一串可变长度值

0 个答案: