Question

假设一个大的稀疏邻接矩阵。我想要两个矩阵：

第一个值矩阵，其中第一列为最大元素，第二列为第二大元素，第三列为第三大元素
第二索引矩阵，其中第一列为最大元素的索引，第二列为第二大元素的索引，第三列为第三大元素的索引

其中每个向量的部分排序可能最有效地使用sort的partial指令完成，例如https://stackoverflow.com/a/2453619/164148。

是否存在一些现成的解决方案，用于通过其最大元素到相应的值/索引矩阵来对部分矩阵进行部分排序？

Answer 1

max.col 适用于最大头寸，而下面的订单方法适用于任意数量的最大头寸：我们订购指数矩阵以减少订单并选择想要的最大的职位数。与max.col类似，我们可以使用 which.max 功能

> which.max(c(100,1:10))
[1] 1
> max.col(c(100,1:10))
 [1] 1 1 1 1 1 1 1 1 1 1 1

并反复找到最大位置直到不需要。接下来我们将专注于订购方法。备选方案A示出了值矩阵和索引矩阵相等的情况。备选方案B显示了值矩阵和索引矩阵不相等的情况。

备选方案A 显示索引矩阵等于值矩阵的情况。

> set.seed(123); r<-10; c<-10; m2 <- matrix(rbinom(r*c,70,0.5),r,c)
> m2
      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
 [1,]   35   37   41   40   37   34   36   31   35    33
 [2,]   31   36   38   31   33   30   42   38   38    35
 [3,]   39   34   31   43   28   32   37   28   35    32
 [4,]   34   33   37   29   38   37   44   33   42    39
 [5,]   39   41   40   45   34   39   34   33   38    30
 [6,]   32   34   35   30   38   29   33   33   32    31
 [7,]   40   32   34   39   34   28   39   37   35    41
 [8,]   39   37   33   36   27   38   37   33   32    38
 [9,]   35   40   36   39   40   25   31   32   40    38
[10,]   29   41   39   35   34   38   34   34   29    38

> t(apply(m2, 1, function(x) order(x,decreasing=T)[1:3]))
      [,1] [,2] [,3]
 [1,]    3    4    2
 [2,]    7    3    8
 [3,]    4    1    7
 [4,]    7    9   10
 [5,]    4    2    3
 [6,]    5    3    2
 [7,]   10    1    4
 [8,]    1    6   10
 [9,]    2    5    9
[10,]    2    3    6

这是索引矩阵，它的值矩阵在这种情况下是相同的。

备选B 显示索引矩阵不等于值矩阵的位置。

max.col 返回矩阵每行的最大位置，随机断开关系，但它没有部分指令

require(quanteda) 
mytext <- c("Let the big dogs hunt honor", "No holds barred, it is a honor child child child.", "My child is an honor student")     
myMatrix <-dfm(mytext, ignoredFeatures = stopwords("english"), stem = TRUE) 
> myMatrix %*% t(myMatrix)     #Tells how sentences are related to each other
# 3 x 3 sparse Matrix of class "dgCMatrix"
# text1 text2 text3
# text1     5     1     1
# text2     1    12     4
# text3     1     4     3
> t(myMatrix) %*% myMatrix     #Tells how words are related to each other
# 9 x 9 sparse Matrix of class "dgCMatrix"
# let big dog hunt honor hold bar child student
# let       1   1   1    1     1    .   .     .       .
# big       1   1   1    1     1    .   .     .       .
# dog       1   1   1    1     1    .   .     .       .
# hunt      1   1   1    1     1    .   .     .       .
# honor     1   1   1    1     3    1   1     4       1
# hold      .   .   .    .     1    1   1     3       .
# bar       .   .   .    .     1    1   1     3       .
# child     .   .   .    .     4    3   3    10       1
# student   .   .   .    .     1    .   .     1       1
#
> max.col(t(myMatrix) %*% myMatrix)
[1] 5 2 2 3 8 8 8 8 9
> 
> max.col(myMatrix %*% t(myMatrix))
[1] 1 2 2

和

> n<-length(myMatrix);max.col(t(myMatrix) %*% myMatrix,partial=n-1)
Error in max.col(t(myMatrix) %*% myMatrix, partial = n - 1) : 
   unused argument (partial = n - 1)

不幸的是，max.col命令中没有定义部分指令。所以我们测试方法A：

> mytext <- c("Let the big dogs hunt honor", "No holds barred, it is a honor child child child.", "My child is an honor student")     
> myMatrix <-dfm(mytext, ignoredFeatures = stopwords("english"), stem = TRUE) 
> aa<- myMatrix %*% t(myMatrix)
> aa
3 x 3 sparse Matrix of class "dgCMatrix"
      text1 text2 text3
text1     6     1     1
text2     1    18     5
text3     1     5     6
> t(apply(aa, 1, function(x) order(x,decreasing=T)[1:3]))
      [,1] [,2] [,3]
text1    1    2    3
text2    2    3    1
text3    3    2    1

这是正确的，我们得到其他方式

> aaa<- t(myMatrix) %*% myMatrix
> aaa
18 x 18 sparse Matrix of class "dgCMatrix"
   [[ suppressing 18 column names ‘let’, ‘the’, ‘big’ ... ]]

let     1 1 1 1 1 1 . . . . . . .  . . . . .
the     1 1 1 1 1 1 . . . . . . .  . . . . .
big     1 1 1 1 1 1 . . . . . . .  . . . . .
dog     1 1 1 1 1 1 . . . . . . .  . . . . .
hunt    1 1 1 1 1 1 . . . . . . .  . . . . .
honor   1 1 1 1 1 3 1 1 1 1 1 2 1  4 1 1 1 1
no      . . . . . 1 1 1 1 1 1 1 1  3 1 . . .
hold    . . . . . 1 1 1 1 1 1 1 1  3 1 . . .
bar     . . . . . 1 1 1 1 1 1 1 1  3 1 . . .
,       . . . . . 1 1 1 1 1 1 1 1  3 1 . . .
it      . . . . . 1 1 1 1 1 1 1 1  3 1 . . .
is      . . . . . 2 1 1 1 1 1 2 1  4 1 1 1 1
a       . . . . . 1 1 1 1 1 1 1 1  3 1 . . .
child   . . . . . 4 3 3 3 3 3 4 3 10 3 1 1 1
.       . . . . . 1 1 1 1 1 1 1 1  3 1 . . .
my      . . . . . 1 . . . . . 1 .  1 . 1 1 1
an      . . . . . 1 . . . . . 1 .  1 . 1 1 1
student . . . . . 1 . . . . . 1 .  1 . 1 1 1
> mmm <- t(apply(aaa, 1, function(x) order(x,decreasing=T)[1:3]))
        [,1] [,2] [,3]
let        1    2    3
the        1    2    3
big        1    2    3
dog        1    2    3
hunt       1    2    3
honor     14    6   12
no        14    6    7
hold      14    6    7
bar       14    6    7
,         14    6    7
it        14    6    7
is        14    6   12
a         14    6    7
child     14    6   12
.         14    6    7
my         6   12   14
an         6   12   14
student    6   12   14

和相应的值矩阵

> t(apply(mmm, 1, function(x) colnames(aaa)[x]))
        [,1]    [,2]    [,3]   
let     "let"   "the"   "big"  
the     "let"   "the"   "big"  
big     "let"   "the"   "big"  
dog     "let"   "the"   "big"  
hunt    "let"   "the"   "big"  
honor   "child" "honor" "is"   
no      "child" "honor" "no"   
hold    "child" "honor" "no"   
bar     "child" "honor" "no"   
,       "child" "honor" "no"   
it      "child" "honor" "no"   
is      "child" "honor" "is"   
a       "child" "honor" "no"   
child   "child" "honor" "is"   
.       "child" "honor" "no"   
my      "honor" "is"    "child"
an      "honor" "is"    "child"
student "honor" "is"    "child"

其中值矩阵不等于早期的索引矩阵。

R：最大元素的索引矩阵和值矩阵的部分矩阵？

1 个答案: