为什么RcppArmadillo的fastLmPure在输出中产生NA,但fastLm却没有?

时间:2015-11-02 19:59:40

标签: c++ r regression rcpp armadillo

我在R中使用滚动回归非常多,我的初始设置类似于:

dolm <- function(x) coef(lm(x[,1] ~ x[,2] + 0, data = as.data.frame(x)))
rollingCoef = rollapply(someData, 100, dolm)

上面的例子非常有效,但如果你有很多迭代它会很慢。

为了加快速度,我决定试用Rcpp包。

首先我用lm替换fastLm,结果有点快但速度仍然很慢。因此,这促使我尝试在c ++中编写整个滚动回归的系数函数作为循环,而不是将其与Rcpp帮助集成在R中。

所以我把原来的RcppArmadillo的函数fastLm更改为:

// [[Rcpp::depends(RcppArmadillo)]] 

#include <RcppArmadillo.h>

using namespace Rcpp;

// [[Rcpp::export]]

List rollCoef(const arma::mat& X, const arma::colvec& y, double window ) {

    double cppWindow = window - 1;
    double matRows = X.n_rows;
    double matCols = X.n_cols - 1;

    arma::mat coef( matRows - cppWindow, X.n_cols);   // matrix for estimated coefficients

    //for loop for rolling regression.
    for( double i = 0 ; i < matRows - cppWindow ; i++  )
    {
        coef.row(i) = arma::trans(arma::solve(X( arma::span(i,i + cppWindow), arma::span(0,matCols)) , y.rows(i,i + cppWindow)));
    }

  return List::create(_["coefficients"] = coef);
}

而不是使用sourceCpp(file=".../rollCoef.cpp")

将其下载到R.

所以它比rollapply快得多,它在小例子上工作得很好,但是我把它应用到大约200000个数据观察中,它产生了〜输出中NA的一半,同时rollapply / fastLm组合没有产生任何结果。

所以我需要一些帮助。我的功能出了什么问题?为什么我的函数输出中有NA,而rollapply / fastLm中没有NA,但是,如果我理解正确,它们都基于arma::solve?任何帮助都非常感谢。

更新
这是可重现的代码:

require(Rcpp)
require(RcppArmadillo)
require(zoo)
require(repmis)
myData <- source_DropboxData(file = "example.csv", 
                              key = "cbrmkkbssu5bn96", sep = ",", header = TRUE)

## in order to use my custom function "rollCoef" you should download it to R. 
## The c++ code is presented above in the main question.
## Download it where you want as "rollCoef.cpp" and then download it to R with:

sourceCpp(file=".../rollCoeff.cpp"). # there should be your actual path. 

myCoef = rollCoef(as.matrix(myData[,2]),myData[,1],260)

summary(unlist(myCoef)) # 80923 NA's

dolm = function(x) coef(fastLmPure(as.matrix(x[,2]), x[,1]))

myCoef2 = rollapply(myData, 260, dolm, by.column = FALSE)

summary(myCoef2) # 80923 NA's

dolm2 = function(x) coef(fastLm(x[,1] ~ x[,2] + 0, data = as.data.frame(x)))

myCoef3 = rollapply(myData, 260, dolm2, by.column = FALSE)

summary(myCoef3) # !!! No NA's !!!

head(unlist(myCoef)) ; head(unlist(myCoef2)) ; head(myCoef3)

因此,我的函数的输出与RcppArmadillo的fastLmPurerollapply的输出相同,并且它们都产生NA,但rollapplyfastLm没有。据我所知,例如来自HEREHERE fastLm基本上是调用fastLmPure,但为什么第三种方法中没有NA呢? fastLm中是否有一些额外的功能可以阻止我没有发现的NA?

1 个答案:

答案 0 :(得分:1)

整个软件包RcppRoll只能进行自定义滚动 - 您应该能够扩展它及其public override int BcpCall(string table, DataTable sourceTable, WorkUnit workUnit) { int rowsCopied = 0; using (SqlBulkCopy bulkCopy = new SqlBulkCopy((SqlConnection)workUnit.WorkConnection, SqlBulkCopyOptions.FireTriggers, (SqlTransaction)workUnit.WorkTransaction)) { bulkCopy.DestinationTableName = table; bulkCopy.NotifyAfter = 10000; bulkCopy.WriteToServer(sourceTable); rowsCopied = GetRowsCopied(bulkCopy); } return rowsCopied; } 函数来进行滚动rollit()