如何将单热编码变量恢复为单列?

时间:2016-10-18 10:03:23

标签: r dataframe

我有一个数据集:

data$a <- c(1,0,0,1,0)
data$b <- c(0,1,1,0,0)
data$c <- c(0,0,0,0,1)

我如何将其转换为如下所示的单个catergorical列:

data$tranformed <- c(A,B,B,A,C)

2 个答案:

答案 0 :(得分:2)

你可以这样做:

w <- which(data==1, arr.ind = T)
data$tranformed <- toupper(names(data)[w[order(w[,1]),2]])

#  a b c tranformed
#1 1 0 0          A
#2 0 1 0          B
#3 0 1 0          B
#4 1 0 0          A
#5 0 0 1          C

最好这样做,因为它适用于列名,字母不是硬编码的。如果更改列名,您将相应地看到更改。

你甚至可以更好地做到这一点:

data$tranformed <- toupper(names(data)[max.col(data)])

如果允许data包含没有任何1的行,请执行以下操作:

#  a b c
#1 1 0 0
#2 0 1 0
#3 0 0 0
#4 1 0 0
#5 0 0 1

data <- structure(list(a = c(1, 0, 0, 1, 0), b = c(0, 1, 0, 0, 0), c = c(0, 
0, 0, 0, 1)), .Names = c("a", "b", "c"), row.names = c(NA, -5L
), class = "data.frame")

你可以这样做:

inds <- which(rowSums(data)==0)
data$tranformed <- toupper(names(data)[max.col(data)])
data$tranformed[inds] <- NA

哪个会给你:

#  a b c tranformed
#1 1 0 0          A
#2 0 1 0          B
#3 0 0 0       <NA>
#4 1 0 0          A
#5 0 0 1          C

答案 1 :(得分:1)

data$transformed<-factor(apply(data, 1, function(x) which(x == 1)),labels = colnames(data)) 

或(小写字母)

factor(LETTERS[apply(data, 1, function(x) which(x == 1))])

编辑:如果第3行的行中只有0,如下例所示。

df=data.frame(a =c(1,0,0,1,0),
               b=c(0,1,0,0,0),
               c =c(0,0,0,0,1)
)
  a b c
1 1 0 0
2 0 1 0
3 0 0 0
4 1 0 0
5 0 0 1

您无法使用上述解决方案,因为apply函数将输出0长度的列表 解决方法:

LETTERS[unlist(ifelse(sapply(apply(df, 1, function(x) which(x == 1)),length)==1,apply(df, 1, function(x) which(x == 1)),NA))]
[1] "A" "B" NA  "A" "C"