将分类变量转换为多个二进制变量

时间:2017-10-06 05:58:59

标签: r

我希望将部分数据转换为二进制宽格式。

这是我的意见:

mydf <- data.frame( transaction =c (1,0,1,1,1,0,0), quality = c("NEW", "OLD","OLD", "OLD","OLD","NEW","NEW"), brand = c(1,2,3,1,2,2,1))

     transaction quality brand
1           1     NEW     1
2           0     OLD     2
3           1     OLD     3
4           1     OLD     1
5           1     OLD     2
6           0     NEW     2
7           0     NEW     1
> 

我希望将brand列转换为宽格式,以便具有以下输出

     transaction quality brand_1 brand_2 brand_3
1           1     NEW     1        0      0
2           0     OLD     0        1      0
3           1     OLD     0        0      1
4           1     OLD     1        0      0
5           1     OLD     0        1      0
6           0     NEW     0        1      0
7           0     NEW     1        0      0

我尝试了不同的方法,例如model.matrix功能,但无法达到我想要的输出。

2 个答案:

答案 0 :(得分:2)

对于每一行,我们选择相应的列,需要将其更改为1.我们使用seq(用于选择行)和paste0(用于选择列)生成行/列组合。对于所有这些行/列组合,我们使用mapply使用不那么着名的全局赋值运算符将所有相应的值更改为1。

#Generate new columns to be added
cols <- paste0("brand-", 1:3)
#Initialise the columns to 0
mydf[cols] <- 0

mapply(function(x, y) mydf[x, y] <<- 1, seq(nrow(mydf)), 
                                        paste0("brand-", mydf$brand))

mydf

#  transaction quality brand brand-1 brand-2 brand-3
#1           1     NEW     1       1       0       0
#2           0     OLD     2       0       1       0
#3           1     OLD     3       0       0       1
#4           1     OLD     1       1       0       0
#5           1     OLD     2       0       1       0
#6           0     NEW     2       0       1       0
#7           0     NEW     1       1       0       0

如果我们不再需要使用

,我们可以移除原始brand
mydf$brand <- NULL

答案 1 :(得分:1)

对于library(dplyr) library(tidyr) library(tibble) mydf %>% rownames_to_column() %>% group_by(rowname, transaction, quality, brand) %>% summarise(count = n()) %>% spread(brand, count, sep = "-", fill = 0) %>% ungroup() %>% select(-rowname) # # A tibble: 7 x 5 # transaction quality `brand-1` `brand-2` `brand-3` # * <dbl> <fctr> <dbl> <dbl> <dbl> # 1 1 NEW 1 0 0 # 2 0 OLD 0 1 0 # 3 1 OLD 0 0 1 # 4 1 OLD 1 0 0 # 5 1 OLD 0 1 0 # 6 0 NEW 0 1 0 # 7 0 NEW 1 0 0 方法

{{1}}