使用重复键将字符串列转换为虚拟变量

时间:2017-09-20 17:14:03

标签: r sparse-matrix dummy-variable

我想转换这个 -

> df.orig <- data.frame(id = c('foo', 'bar', 'foo'), action = c('abc','def','ghi'))
> df.orig
   id action
1 foo    abc
2 bar    def
3 foo    ghi

分为:

> df.new <- data.frame(id = c('foo', 'bar'), action_abc = c(1,0), action_def = c(0,1), action_ghi = c(1,0))
> df.new
   id action_abc action_def action_ghi
1 foo          1          0          1
2 bar          0          1          0

sparse.model.matrixdcast似乎无法很好地处理多个密钥('foo')。

> sparse.model.matrix(id ~ action - 1, df.orig)
3 x 3 sparse Matrix of class "dgCMatrix"
  actionabc actiondef actionghi
1         1         .         .
2         .         1         .
3         .         .         1

1 个答案:

答案 0 :(得分:2)

使用table

  df <- data.frame(id = c('foo', 'bar', 'foo'), action = c('abc','def','ghi'),stringsAsFactors = F)

  table(df$id,df$action)

      abc def ghi
  bar   0   1   0
  foo   1   0   1