如何从二进制数据框中扣除结果?

时间:2018-04-09 13:45:02

标签: r dataframe categories dummy-variable

我有四类产品:

  • A类: {X800818, X822707, X822708, X870082, X800810, X800323, X800835, X893890, X822541, X800831}

  • B类: {X830742, X841223, X841449, X870138, X810352, X870146, X800850, X841236, X811712, X893314}

  • C类: {X893609, X890188, X893313, X841271, X891250, X811820, X728538, X727220, X960804, X728904}

  • D类: {X727345, X800875, X727302, X870426, X729002, X727300, X759042, X728495, X897198, X790190}

我有一个二进制数据框,每行包含一个或一组属于不同类别的产品:这是一个例子:

    X800818 X822707 X822708 X870082 X800810 X800323 X800835 X893890 X822541 X800831 X830742 X841223 X841449 

1          0       0       0       1       0       0       0       0       0       1       1       0       0

2          1       0       0       0       0       0       0       0       0       0       1       0       0

3          1       0       0       0       0       0       0       0       0       0       0       0       0

4          0       0       0       0       0       0       0       0       0       0       0       0       1

我想得到这样的结果:

  1. 2 A,1 B

  2. 1 A,1 B

  3. 1 A

  4. 1 B

  5. 如何让R从二进制数据框中扣除此结果?有任何建议吗?

1 个答案:

答案 0 :(得分:1)

我尝试制作一些可重现的代码,但我的解决方案并不尽可能干净:

#Data Creation

A <- c("X800818","X822707","X822708","X870082","X800810","X800323","X800835","X893890","X822541","X800831")
B <- c("X830742","X841223","X841449","X870138","X810352","X870146","X800850","X841236","X811712","X893314")
C <- c("X893609","X890188","X893313","X841271","X891250","X811820","X728538","X727220","X960804","X728904")
D <- c("X727345","X800875","X727302","X870426","X729002","X727300","X759042","X728495","X897198","X790190")
df <-data.frame(c(1,0,0,1),c(0,0,0,0),c(1,1,1,1),c(1,0,1,1),c(1,0,0,1),c(0,1,0,0),c(1,0,1,1))
names(df) <- c("X800818","X800323","X841223","X811820","X960804","X727300","X728495")

#Transforming binaries to letters

for( col in names(df)) {
 for(L in LETTERS[1:4]) {
  if(col %in% get(L)) df[df[,col] == 1,col] <- L
 }
}

#Transpose 

tdf <- data.frame(t(df))

#Get Results df
res <- NULL
for (col in names(tdf)) {
levels(tdf[,col]) <- c("0",LETTERS[1:4])
res <- rbind(res,table(tdf[,col]))
}


     0 A B C D
[1,] 2 1 1 2 1
[2,] 5 1 1 0 0
[3,] 4 1 1 1 0
[4,] 2 1 1 2 1

希望这会有所帮助