Question

我正在尝试使用R

中的COMPoissonReg进行Conway-Maxwell-Poisson回归

但是，大型数据集的速度非常慢。因此，我试图剖析并检查源代码。

大多数时间（> 95％）花在函数COMPoissonReg:::computez上，相当于： enter image description here

test <- function (lambda, nu, max=100) 
{
    forans <- matrix(0, ncol = max + 1, nrow = length(lambda))
    for (j in 1:max) {
        temp <- matrix(0, ncol = j, nrow = length(lambda))
        for (i in 1:j) {
            temp[, i] <- lambda/(i^nu)
        }
        for (k in 1:length(lambda)) {
            forans[k, j + 1] <- prod(temp[k, ])
        }
    }
    forans[, 1] <- rep(1, length(lambda))
    ans <- rowSums(forans)
    return(ans)
}

v在这里是nu，lambda是一个向量，max是s的上限（这里它设置为100作为近似无穷大）。

这个问题并不需要特殊的背景统计知识，但link或link2只是以防万一。

一个测试性能的简单脚本，这需要8秒，如果我懒得cmpfun编译它，则需要4秒。我相信它有可能得到进一步改善。（没有在C中重写，我的目标是大约0.05秒，这样我就不必重构迭代调用此函数的包中的代码。）

lambda <- rnorm(10000, 1.5, 0.3)
Rprof(tmp <- tempfile())
sum(log(test(lambda, 1.2)))
Rprof()
summaryRprof(tmp)

更新

我意识到另一个问题：浮点运算限制。做电源系列是危险的，它很快就会溢出，特别是如果我们必须进行矢量化。例如。如果lambda ^ 100＆gt; NAN肯定是lambda 10000.如果我用其他语言编程，也许我会使用reduce，但我担心R减速很慢。

Answer 1

通过避免循环，您可以比使用的功能更快。例如：

test2<-function(lambda,nu,max=100){
  len<-length(lambda)
  mm<-matrix(rep(lambda,each=max+1),max+1,len)
  mm<-mm^(0:max)
  mm<-mm/factorial(0:max)^nu
  colSums(mm)
}

使用长度为100的lambda，运行速度提高约50倍：

> require(microbenchmark)
> lam<-rnorm(100)
> max(abs(test(lam,1.2)-test2(lam,1.2)))
[1] 4.510281e-16
> microbenchmark(test(lam,1.2),test2(lam,1.2),times=10)
Unit: milliseconds
            expr       min        lq    median        uq       max neval
  test(lam, 1.2) 77.124705 77.422619 78.241945 79.635746 81.260280    10
 test2(lam, 1.2)  1.335716  1.373116  1.401411  1.507765  1.562447    10

你可以更多地优化它，但这应该获得大部分收益，除非你可以利用某种内置函数而不是明确地进行求和。

输入长度10000，我的机器需要0.148秒，而test需要6.850秒：

> lam<-rnorm(10000)
> max(abs(test(lam,1.2)-test2(lam,1.2)))
[1] 3.552714e-15
> system.time(test2(lam,1.2))
   user  system elapsed 
  0.132   0.016   0.148 
> system.time(test(lam,1.2))
   user  system elapsed 
  6.780   0.056   6.850

Answer 2

好的，这是Rcpp的答案。正如预期的那样，它比其他任何一个都要快得多。

require(Rcpp)
rcppfun<-"
Rcpp::NumericVector myfun(Rcpp::NumericVector lambda,
Rcpp::NumericVector weights)
{
  int num = lambda.size();
  int max = weights.size();
  std::vector<double> r(num);
  for(int i=0; i<num; i++){
    double total = 0;
    double prod = 1;
    for(int j=0; j<max; j++){
      total += prod/weights[j];
      prod *= lambda[i];
    }
    r[i]=total;
  }
  return Rcpp::wrap(r);
}
"
testRcpp<-cppFunction(rcppfun)
test5<-function(lambda,nu,max=100){
    wts<-factorial(0:max)^nu
    testRcpp(lambda,wts)    
}

这比我原来的test2快约40倍，比@ NealFultz改进的对数实现快约12倍。

> lam<-abs(rnorm(10000))
> max(abs(test5(lam,1.2)-test2(lam,1.2)))
[1] 7.105427e-15
> microbenchmark(test2(lam,1.2),test3(lam,1.2),test4(lam,1.2),test5(lam,1.2))
Unit: milliseconds
            expr        min         lq     median         uq        max neval
 test2(lam, 1.2) 125.601616 126.790516 127.700099 135.182263 222.340179   100
 test3(lam, 1.2) 125.523424 126.666410 126.921035 131.316254 178.633839   100
 test4(lam, 1.2)  41.734015  42.640340  43.190553  50.932952  97.765219   100
 test5(lam, 1.2)   3.432029   3.501046   3.519007   3.532603   3.754232   100

在编辑时，这里还有一个Rcpp版本，它应该至少部分地解决溢出问题，通过递增计算每个项，而不是分别计算分子和分母。

rcppfun2<-"
Rcpp::NumericVector myfun2(Rcpp::NumericVector lambda, Rcpp::NumericVector nu){
int num = lambda.size();
int max = nu.size();
std::vector<double> r(num);
for(int i=0; i<num; i++){
  double term = 1;
  double total = 1;
  for(int j=0; j< max; j++){
    term *= (lambda[i]/nu[j]);
    total += term;
  }
  r[i]=total;
}
 return Rcpp::wrap(r);
}
"

testRcpp2<-cppFunction(rcppfun2)
test6<-function(lambda,nu,max=100){
    testRcpp2(lambda,(1:max)^nu)
}

> lam<-abs(rnorm(10000))
> max(abs(test2(lam,1.2)-test6(lam,1.2)))
[1] 1.065814e-14
> microbenchmark(test5(lam,1.2),test6(lam,1.2))
Unit: milliseconds
            expr      min       lq   median       uq      max neval
 test5(lam, 1.2) 3.416786 3.426013 3.435492 3.444196 3.604486   100
 test6(lam, 1.2) 3.554147 3.572285 3.580865 3.588030 3.840713   100

Answer 3

跟进@mrips，有时处理日志可能会更快一些，因为你可以乘以而不是取幂：

test4 <- function(lambda,nu,max=100){
  mm<-matrix(log(lambda),max,length(lambda), byrow=TRUE) 
  mm<-mm * 1:max  -  nu*lfactorial(1:max)
  1 + colSums(exp(mm))
}

我还考虑了s = 0的特殊情况。这是我的时间：

R>microbenchmark(test2(1:50,5), test4(1:50,5))
Unit: microseconds
           expr     min        lq    median        uq      max neval
 test2(1:50, 5) 952.360 1432.6600 1436.4525 1440.1860 3467.981   100
 test4(1:50, 5) 695.189 1041.4785 1042.8315 1045.6525 2970.441   100

Answer 4

我睡着了，如果你可以使用gsl包，我想出了另一项重大改进。您所做的只是评估多项式：

require(gsl)
test5 <- function(lambda, nu, max=100){
gsl_poly(factorial(0:max)^-nu, lambda)
}

R>microbenchmark(test2(1:50,5.1), test4(1:50,5.1), test5(1:50,5.1))
Unit: microseconds
             expr      min        lq    median        uq       max neval
 test2(1:50, 5.1) 4518.957 4838.5185 5318.5040 5617.6330 19978.039   100
 test4(1:50, 5.1) 2043.422 2268.3490 2472.0430 2727.1045 10328.376   100
 test5(1:50, 5.1)  311.144  407.2465  476.0755  540.6095  1138.766   100

如何在R中加速这个简单的功能

更新

4 个答案: