我有四类产品:
A类: {X800818, X822707, X822708, X870082, X800810, X800323, X800835, X893890, X822541, X800831}
B类: {X830742, X841223, X841449, X870138, X810352, X870146, X800850, X841236, X811712, X893314}
C类: {X893609, X890188, X893313, X841271, X891250, X811820, X728538, X727220, X960804, X728904}
D类: {X727345, X800875, X727302, X870426, X729002, X727300, X759042, X728495, X897198, X790190}
我有一个二进制数据框,每行包含一个或一组属于不同类别的产品:这是一个例子:
X800818 X822707 X822708 X870082 X800810 X800323 X800835 X893890 X822541 X800831 X830742 X841223 X841449
1 0 0 0 1 0 0 0 0 0 1 1 0 0
2 1 0 0 0 0 0 0 0 0 0 1 0 0
3 1 0 0 0 0 0 0 0 0 0 0 0 0
4 0 0 0 0 0 0 0 0 0 0 0 0 1
我想得到这样的结果:
2 A,1 B
1 A,1 B
1 A
1 B
如何让R从二进制数据框中扣除此结果?有任何建议吗?
答案 0 :(得分:1)
我尝试制作一些可重现的代码,但我的解决方案并不尽可能干净:
#Data Creation
A <- c("X800818","X822707","X822708","X870082","X800810","X800323","X800835","X893890","X822541","X800831")
B <- c("X830742","X841223","X841449","X870138","X810352","X870146","X800850","X841236","X811712","X893314")
C <- c("X893609","X890188","X893313","X841271","X891250","X811820","X728538","X727220","X960804","X728904")
D <- c("X727345","X800875","X727302","X870426","X729002","X727300","X759042","X728495","X897198","X790190")
df <-data.frame(c(1,0,0,1),c(0,0,0,0),c(1,1,1,1),c(1,0,1,1),c(1,0,0,1),c(0,1,0,0),c(1,0,1,1))
names(df) <- c("X800818","X800323","X841223","X811820","X960804","X727300","X728495")
#Transforming binaries to letters
for( col in names(df)) {
for(L in LETTERS[1:4]) {
if(col %in% get(L)) df[df[,col] == 1,col] <- L
}
}
#Transpose
tdf <- data.frame(t(df))
#Get Results df
res <- NULL
for (col in names(tdf)) {
levels(tdf[,col]) <- c("0",LETTERS[1:4])
res <- rbind(res,table(tdf[,col]))
}
0 A B C D
[1,] 2 1 1 2 1
[2,] 5 1 1 0 0
[3,] 4 1 1 1 0
[4,] 2 1 1 2 1
希望这会有所帮助