Question

我试图删除＆＃34;单身＆＃34;来自二进制矩阵。在这里，单身指的是唯一的＆＃34; 1＆＃34;行中的值和它们出现的列。例如，给定以下矩阵：

> matrix(c(0,1,0,1,0,0,1,0,0,1,0,0,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0,1,1,0,0,0,0,1,0,0,0,0,0,1,1), nrow=6)
     [,1] [,2] [,3] [,4] [,5] [,6] [,7]
[1,]    0    1    0    0    0    0    0
[2,]    1    0    1    0    0    0    0
[3,]    0    0    0    1    0    0    0
[4,]    1    1    0    0    0    0    0
[5,]    0    0    0    0    1    1    1
[6,]    0    0    0    0    1    0    1

...我想删除第3行（如果可能的话，第4列的全部），因为[3,4]中的1是该行/列组合中的唯一1。 [1,2]很好，因为列[，2]中还有其他1个;类似地，[2,3]很好，因为行[2，]中还有其他1个。任何帮助将不胜感激 - 谢谢！

Answer 1

您首先要查找哪些行和列是单例，然后检查是否存在共享索引的单例行和列对。以下是完成此任务的一小段代码：

foo <- matrix(c(0,1,0,...))
singRows <- which(rowSums(foo) == 1)
singCols <- which(colSums(foo) == 1)
singCombinations <- expand.grid(singRows, singCols)
singPairs <- singCombinations[apply(singCombinations, 1,
    function(x) which(foo[x[1],] == 1) == x[2]),]
noSingFoo <- foo[-unique(singPairs[,1]), -unique(singPairs[,2])]

对于许多sinlgeton ros或列，您可能需要使其更有效率，但它可以完成这项工作。

更新：这是我知道可以完成的更有效的版本。这样，您只能在行（或列，如果需要）上循环，而不是所有组合。因此，对于具有许多单行/列的矩阵，它更有效。

## starting with foo and singRows as before
singPairRows <- singRows[sapply(singRows, function(singRow)
    sum(foo[,foo[singRow,] == 1]) == 1)]
singPairs <- sapply(singPairRows, function(singRow)
    c(singRow, which(foo[singRow,] == 1)))
noSingFoo <- foo[-singPairs[1,], -singPairs[2,]]

更新2：我使用rbenchmark软件包比较了两种方法（我的= nonsparse和@Chris＆＃39; s =稀疏）。我使用了一系列矩阵大小（从10到1000行/列;仅限方形矩阵）和稀疏度级别（每行/每列0.1到5个非零条目）。相对性能水平显示在下面的热图中。等效性能（运行时间的log2比率）由白色指定，稀疏方法为红色更快，非稀疏方法为蓝色。请注意，我没有在性能计算中包含转换为稀疏矩阵，因此这将为稀疏方法添加一些时间。只是觉得值得花一点力气看看这个边界在哪里。 Relative Performance

Answer 2

cr1msonB1ade的方式是一个很好的答案。对于更加计算密集的矩阵（数百万x百万），您可以使用此方法：

以稀疏表示法对矩阵进行编码：

DT <- structure(list(i = c(1, 2, 2, 3, 4, 4, 5, 5, 5, 6, 6), j = c(2, 
                                                             1, 3, 4, 1, 2, 5, 6, 7, 5, 7), val = c(1, 1, 1, 1, 1, 1, 1, 1, 
                                                                                                    1, 1, 1)), .Names = c("i", "j", "val"), row.names = c(NA, -11L
                                                                                                    ), class = "data.frame")

给予（0是隐含的）

然后我们可以使用：

进行过滤

DT <- data.table(DT)

DT[, rowcount := .N, by = i]
DT[, colcount := .N, by = j]

，并提供：

>DT[!(rowcount*colcount == 1)]
    i j val rowcount colcount
 1: 1 2   1        1        2
 2: 2 1   1        2        2
 3: 2 3   1        2        1
 4: 4 1   1        2        2
 5: 4 2   1        2        2
 6: 5 5   1        3        2
 7: 5 6   1        3        1
 8: 5 7   1        3        2
 9: 6 5   1        2        2
10: 6 7   1        2        2

（注意（3,4）行现在缺失了）

从二进制矩阵中仅删除一个元素的行/列

2 个答案: