将列拆分为多个列,其中包含R

时间:2017-01-25 15:29:44

标签: r encoding binary analytics xgboost

我想将一组分类值编码为二进制值。首先,我使用intToBin(x$y)

将它们转换为二进制

现在我想将这个二进制文件分成不同的列

0101
0100
0110
0101
0101
0100

0 1 0 1
0 1 0 0
0 1 1 0
0 1 0 1

依此类推,同时想将其转换为数值。它应该可以扩展到更多的字符串。

我使用separate(x$y, sep = l)进行转换。但是我收到了一个错误。请帮我纠正代码或提供任何其他替代方案。将值更改为二进制的目的是使用XGBoost构建模型。

An example of my requirement

2 个答案:

答案 0 :(得分:1)

这是一种方式:

d=c("0101","0111","0011","1101")  
# Split into columns
d2=do.call(rbind, strsplit(as.character(d), split="")) #see elmo's comments
# Make numeric and transform to dataframe (instead of matrix)
d2=as.data.frame(apply(d2,2, function(x) as.numeric(as.character(x))))

答案 1 :(得分:1)

a = c("0101","0100","0110","0101","0101","0100")
data.frame(t(matrix(unlist(strsplit(a,"")),nrow = 4)))

OR

data.frame(t(sapply(a, function(x) unlist(strsplit(x,"")))))
#You may get a warning about identical row names

或者如果你想要的东西在a中的元素的位数不均匀时有效,

a = c("01101","0100","0110","0101","0101","0100") #Note 1st element has 5 digits
b = sapply(a, function(x) unlist(strsplit(x,"")))
data.frame(t(sapply(b, '[', seq(max(sapply(b,length))))))
#You may get a warning about identical row names