Question

我正在R中搜索一个函数，可以在其中使用二进制变量创建不同的组。我的数据看起来像

data <- data.frame(Var1 = c(rep("A",5),rep("B",2),rep("C",3)))
data

，结果表应如下所示：

result <- data.frame(Var1 = c(rep("A",5),rep("B",2),rep("C",3)),
                     Group1 = rep(1,10),
                     Group2 = c(rep(0,5),rep(1,5)),
                     Group3 = c(rep(1,5),rep(0,2),rep(1,3)),
                     Group4 = c(rep(1,7),rep(0,3)))
result

对于每个可能的组，其中一列的创建组引用是第一列（Var1）

谢谢您的帮助！

我还有一件事：现在我们只排除一组。我们如何创建真正考虑所有可能组合（例如，排除2个，排除3个...）的组？

result <- data.frame(Var1 = c(letters[1:5]),
                     Group1 = rep(1,5),
                     Group2 = c(0,rep(1,4)),
                     Group3 = c(1,0,rep(1,3)),
                     Group4 = c(rep(1,2),0,rep(1,2)),
                     Group5 = c(rep(1,3),0,1),
                     Group6 = c(rep(1,4),0),
                     Group7 = c(rep(0,2),rep(1,3)),
                     Group8 = c(rep(0,3),rep(1,2)))
result

这不是所有可能的组合，仅是示例如何进行...

Answer 1

这是一种解决方案，基本上可以创建您的Var1的所有组合，并将它们与原始Var1进行数学计算，即

#Get unique Var1
i1 <- unique(as.character(data$Var1))
#Get all combinations
l1 <- sapply(2:length(i1), function(i) combn(i1, i, FUN = toString))
#match to see which Var1 is in each group
df2 <- sapply(unlist(l1), function(i)sapply(i1, function(j)grepl(j, i)*1))
#Merge
merge(data, df2, by.x = 'Var1', by.y = 'row.names')

给出，

   Var1 A, B A, C B, C A, B, C
1     A    1    1    0       1
2     A    1    1    0       1
3     A    1    1    0       1
4     A    1    1    0       1
5     A    1    1    0       1
6     B    1    0    1       1
7     B    1    0    1       1
8     C    0    1    1       1
9     C    0    1    1       1
10    C    0    1    1       1

Answer 2

这里是base R的解决方案，其中使用了ifelse：

df <- cbind(data,
            Group1 = 1,
            `colnames<-`(sapply(levels(data$Var1), 
                                function(v) ifelse(data$Var1==v,1,0)),paste0("Group",1+seq(levels(data$Var1)))))

如此

> df
   Var1 Group1 Group2 Group3 Group4 Group5
1     A      1      1      0      0      0
2     A      1      1      0      0      0
3     A      1      1      0      0      0
4     A      1      1      0      0      0
5     A      1      1      0      0      0
6     B      1      0      1      0      0
7     B      1      0      1      0      0
8     C      1      0      0      1      0
9     C      1      0      0      1      0
10    C      1      0      0      1      0
11    D      1      0      0      0      1
12    D      1      0      0      0      1

数据

data <- data.frame(Var1 = c(rep("A",5),rep("B",2),rep("C",3),rep("D",2)))

> data
   Var1
1     A
2     A
3     A
4     A
5     A
6     B
7     B
8     C
9     C
10    C
11    D
12    D

不同的组组合

2 个答案: