Question

我在R中有以下问题（对于马尔可夫链）。假设存在具有唯一整数向量（状态）行的状态空间矩阵S.我从这个矩阵给出了一个向量s，并想确定与该向量对应的行的索引。有几种解决方案：

使用all.equal的解决方案，如：

which(apply(S,1,function(x){ isTRUE(all.equal(s,x)) }) )

将向量映射到唯一字符串并使用此字符串标识它们：

statecodes <- apply(S,1,function(x) paste(x,collapse=" ") ) 
check.equal <- function(s) {
    z <- which(statecodes == paste(s, collapse=" "))
    return(z)
}
check.equal(s)

第一个（通常建议的）解决方案是彻头彻尾的糟糕;对于长度为4的16,000个向量的状态空间，它已经需要2.16秒。第二个解决方案要快得多，对于相同的状态空间需要0-0.01秒。然而，当向量的长度增加时，它变得越来越慢。我觉得我的字符串方法是合理的，但必须有更好的东西。什么是更快的方式进行这样的比较？

为了完整起见，我的问题的状态空间可以如下生成。如果向量具有N个元素，并且I表示向量的每个元素可以达到的最大值（例如，10），则由下式给出：

I <- rep(10,N)
S <- as.matrix(expand.grid( lapply(1:N, function(i) { 0:I[i]}) ) )

为了尽可能快地进行比较，如何利用各州的完整性？

Answer 1

执行此操作的一种方法是which(colSums(abs(t(S)-V))==0)，其中V是您要查找的矢量。

Answer 2

获取每个状态的整数值的一种简单方法是将值转换为整数，然后将每列乘以右基。

我的版本是makecheck2;使用paste的版本为makecheck2。我还修改了paste版本以使用match，因此它可以同时检查多个值。两个版本现在都返回一个用于获得匹配的函数。

我的版本设置更快; 0.065秒vs 1.552秒

N <- 5
I <- rep(10,N)
S <- as.matrix(expand.grid( lapply(1:N, function(i) { 0:I[i]}) ) )
system.time(f1 <- makecheck1(S))
#   user  system elapsed 
#  1.547   0.000   1.552 
system.time(f2 <- makecheck2(S))
#   user  system elapsed 
#  0.063   0.000   0.065

这里我测试1到10000个值来检查。对于小值，paste版本更快;对于大值，我的版本更快。

> set.seed(5)
> k <- lapply(0:4, function(idx) sample(1:nrow(S), 10^idx))
> s <- lapply(k, function(idx) S[idx,])
> t1 <- sapply(s, function(x) unname(system.time(for(i in 1:100) f1(x))[1]))
> t2 <- sapply(s, function(x) unname(system.time(for(i in 1:100) f2(x))[1]))
> data.frame(n=10^(0:4), time1=t1, time2=t2)
      n time1 time2
1     1 0.761 1.512
2    10 0.772 1.523
3   100 0.857 1.552
4  1000 1.592 1.547
5 10000 9.651 1.848

两个版本的代码如下：

makecheck2 <- function(m) {
  codes <- vector("list", length=ncol(m))
  top <- vector("integer", length=ncol(m)+1)
  top[1L] <- 1L
  for(idx in 1:ncol(m)) {
    codes[[idx]] <- unique(m[,idx])
    top[idx+1L] <- top[idx]*length(codes[[idx]])
  }
  getcode <- function(x) {
    out <- 0L
    for(idx in 1:length(codes)) {
      out <- out + top[idx]*match(x[,idx], codes[[idx]])
    }
    out
  }
  key <- getcode(m)
  f <- function(x) {
    if(!is.matrix(x)) {
      x <- matrix(x, ncol=length(codes))
    }
    match(getcode(x), key)
  }
  rm(m) # perhaps there's a better way to remove these from the closure???
  rm(idx)
  f
}

makecheck1 <- function(m) {
  n <- ncol(m)
  statecodes <- apply(m,1,function(x) paste(x,collapse=" ") )
  rm(m)
  function(x) {
    if(!is.matrix(x)) {
      x <- matrix(x, ncol=n)
    }
    x <- apply(x, 1, paste, collapse=" ")
    match(x, statecodes)
  }
}

在R中的矩阵中找到整数向量的索引的最快方法

2 个答案: