Question

我有一个以列表（**t** - approximate nearest neighbors）作为参数并计算两个向量之间的the similarity的函数：

例如：

对于the list f2，f2[[6]]和f2[[7]]之间的相似度是：3/4 = 0.75
the rows/vectors 6和7之间的公共元素数为intersect(6,7)=3，公共元素为（3,4,5），其中**t** = 4。
n = 7。

我开发了相似性函数，如下所示：

similarity<-function(p,q,mat,t){
  if(is.list(mat)){
    mat=list.as.matrix(mat, byrow=TRUE )
    p=mat[p,]
    q=mat[q,]
    p=p[!is.na(p)]
    q=q[!is.na(q)]
    return(length(intersect(p,q))/t)
  }

  if (p==q) return(0) 
  }

其中：p & q是 length = t 的向量，mat是表示t-approximate nearest neighbors matrix的列表。

我知道相似性矩阵是symmetric：

相似度（p，q，mat，t）=相似度（q，p，mat，t）

所以相似矩阵的代码如下：

  similarity_matrix<-function(tann_matr,n,t){

  similarity_matr=matrix(data=NA,nrow=n,ncol=n)

  for(i in 1:n){
    for(j in 1:n){
     similarity_matr[i,j]=similarity(i,j,tann_matr,t)
    }   
  }
  diag(similarity_matr)=0
  return(similarity_matr)
}

问题：

由于时间复杂性，我试图更改此功能，我们只需要填充矩阵的上部。我想outer function可能是一个很好的解决方案，我尝试过：

similarity_matrix<-function(tann_matr,n,t){

  n1=n
  row=1:n1
  col=1:n1
  similarity_matr=matrix(data=NA,nrow=n1,ncol=n1)
  fun <- function(i,j,arg_1=tann_matr,arg_2=t) similarity(i,j,arg_1,arg_2)
  return(outer(col,row,FUN=fun))
}

该解决方案无法达到预期的效果，在该替代方案中输出会有所不同。

我希望这很清楚，谢谢您的帮助！

Answer 1

1），如果对它进行矢量化处理，即使不做其他任何事情，它也可能足够快。

denom <-  lengths(f2)[1]  # 4
f2na <- lapply(f2, na.omit)
len <- function(x, y) length(intersect(x, y))
m <- outer(f2na, f2na, Vectorize(len)) / denom
diag(m) <- 0

给予：

> m
     [,1] [,2] [,3] [,4] [,5] [,6] [,7]
[1,]    0    0 0.00 0.00 0.00 0.00 0.00
[2,]    0    0 0.00 0.00 0.00 0.00 0.00
[3,]    0    0 0.00 0.75 0.75 0.75 0.75
[4,]    0    0 0.75 0.00 0.75 0.75 0.75
[5,]    0    0 0.75 0.75 0.00 0.75 0.75
[6,]    0    0 0.75 0.75 0.75 0.00 0.75
[7,]    0    0 0.75 0.75 0.75 0.75 0.00

2）另一种可能性是将f2的每个分量编码为0/1向量，然后取其交叉prod：

mx <- max(unlist(f2), na.rm = TRUE) # 7
M <- crossprod(sapply(f2, tabulate, mx)) / denom
diag(M) <- 0

identical(m, M)
## [1] TRUE

注意

可重复输入的形式假定为：

f2 <- list(structure(c(2, NA, NA, NA), .Dim = c(1L, 4L)), structure(c(1, 
NA, NA, NA), .Dim = c(1L, 4L)), structure(4:7, .Dim = c(1L, 4L
)), structure(c(3, 5, 6, 7), .Dim = c(1L, 4L)), structure(c(3L, 
4L, 6L, 7L), .Dim = c(1L, 4L)), structure(c(3, 4, 5, 7), .Dim = c(1L, 
4L)), structure(3:6, .Dim = c(1L, 4L)))

我如何使用外部功能？

1 个答案:

注意