Question

让我们说我有一个二进制数据集，我想找出仍然没有发生的组合。例如：

X1 X2 X3
1  0  1
0  1  1

可以看到X1=1，X2=1和X3=0的组合没有发生。顺序无关紧要。是否有任何软件包可以做到这一点，或者有其他解决方案吗？

Answer 1

使用setdiff，如图所示。不使用任何软件包。

DF <- data.frame(X1 = 1:0, X2 = 0:1, X3 = c(1L, 1L)) # test input

g <- do.call("expand.grid", rep(list(0:1), ncol(DF)))
names(g) <- names(DF)

setdiff(g, DF)

给予：

如果意图是DF的每一行都具有相同的1，那么我们应该只包括具有1的数目的行，然后使用combn这样。同样，不使用任何软件包。

nc <- ncol(DF)
k <- sum(DF[1, ])  # no of 1's in each row of DF

g <- t(combn(nc, k, function(x) +(seq(nc) %in% x)))
g <- as.data.frame(g)

# now repeat the last two lines of the prior approach like this:
names(g) <- names(DF)
setdiff(g, DF)

给予：

X1 X2 X3 
 1  1  0

Answer 2

生成所有可能的二进制排列，然后对数据进行反联接似乎是最简单的方法。

library(gtools)
library(dplyr)

test <- data.frame(V1 = c(1,0), V2 = c(0,1), V3 = c(1,1))

all_perm <- data.frame(permutations(n = 2, r = 3, v = c(0,1), repeats.allowed = TRUE))
colnames(all_perm) <- colnames(test)

anti_join(all_perm, test)

Answer 3

一种有效的解决方案，可以很好地扩展（至少比创建所有排列的方法更好），可以使用1个值的位置。

#the data
m <- matrix(c(1, 0, 0, 1, 1, 1), 2)
#     [,1] [,2] [,3]
#[1,]    1    0    1
#[2,]    0    1    1

#number of 1 per row
n <- 2

#find positions of 1s
library(Matrix)
M <- Matrix(t(m), sparse = TRUE)
inds <- matrix(M@i + 1L, n, byrow = TRUE)
#     [,1] [,2]
#[1,]    1    3
#[2,]    2    3


#all possible positions
combs <- combn(seq_len(ncol(m)), n, simplify = FALSE)
#[[1]]
#[1] 1 2
#
#[[2]]
#[1] 1 3
#
#[[3]]
#[1] 2 3

#missing combs
setdiff(combs, asplit(inds, 1))
#[[1]]
#[1] 1 2

sparseMatrix(j = unlist(mis), 
             i = rep(seq_along(mis), each = n), 
             dims = c(length(mis), ncol(m)))
#1 x 3 sparse Matrix of class "ngCMatrix"
#
#[1,] | | .

查找给定数据集中不存在的组合

3 个答案: