我希望将部分数据转换为二进制宽格式。
这是我的意见:
mydf <- data.frame( transaction =c (1,0,1,1,1,0,0), quality = c("NEW", "OLD","OLD", "OLD","OLD","NEW","NEW"), brand = c(1,2,3,1,2,2,1))
transaction quality brand
1 1 NEW 1
2 0 OLD 2
3 1 OLD 3
4 1 OLD 1
5 1 OLD 2
6 0 NEW 2
7 0 NEW 1
>
我希望将brand
列转换为宽格式,以便具有以下输出
transaction quality brand_1 brand_2 brand_3
1 1 NEW 1 0 0
2 0 OLD 0 1 0
3 1 OLD 0 0 1
4 1 OLD 1 0 0
5 1 OLD 0 1 0
6 0 NEW 0 1 0
7 0 NEW 1 0 0
我尝试了不同的方法,例如model.matrix
功能,但无法达到我想要的输出。
答案 0 :(得分:2)
对于每一行,我们选择相应的列,需要将其更改为1.我们使用seq
(用于选择行)和paste0
(用于选择列)生成行/列组合。对于所有这些行/列组合,我们使用mapply
使用不那么着名的全局赋值运算符将所有相应的值更改为1。
#Generate new columns to be added
cols <- paste0("brand-", 1:3)
#Initialise the columns to 0
mydf[cols] <- 0
mapply(function(x, y) mydf[x, y] <<- 1, seq(nrow(mydf)),
paste0("brand-", mydf$brand))
mydf
# transaction quality brand brand-1 brand-2 brand-3
#1 1 NEW 1 1 0 0
#2 0 OLD 2 0 1 0
#3 1 OLD 3 0 0 1
#4 1 OLD 1 1 0 0
#5 1 OLD 2 0 1 0
#6 0 NEW 2 0 1 0
#7 0 NEW 1 1 0 0
如果我们不再需要使用
,我们可以移除原始brand
列
mydf$brand <- NULL
答案 1 :(得分:1)
对于library(dplyr)
library(tidyr)
library(tibble)
mydf %>%
rownames_to_column() %>%
group_by(rowname, transaction, quality, brand) %>%
summarise(count = n()) %>%
spread(brand, count, sep = "-", fill = 0) %>%
ungroup() %>%
select(-rowname)
# # A tibble: 7 x 5
# transaction quality `brand-1` `brand-2` `brand-3`
# * <dbl> <fctr> <dbl> <dbl> <dbl>
# 1 1 NEW 1 0 0
# 2 0 OLD 0 1 0
# 3 1 OLD 0 0 1
# 4 1 OLD 1 0 0
# 5 1 OLD 0 1 0
# 6 0 NEW 0 1 0
# 7 0 NEW 1 0 0
方法
{{1}}