Question

我正在为特定问题开发一个自定义引导算法，因为我想要大量的重复，我确实关心性能。在这方面，我对如何正确使用runif有一些疑问。我知道我自己可以运行基准测试，但C ++优化往往很难，我也想了解任何差异的原因。

第一个问题：

第一个代码块是否比第二个更快？

for (int i = 0; i < n_boot; i++) {
  new_random = runif(n);  //new_random is pre-allocated in class
  // do something with the random numbers
}

for (int i = 0; i < n_boot; i++) {
  NumericVector new_random = runif(n);
  // do something with the random numbers
}

这可能归结为runif是填充左侧还是分配并传递新的NumericVector。

第二个问题：

如果两个版本都分配了一个新的向量，我可以通过在标量模式下一次生成一个随机数来改进吗？

如果您想知道，内存分配占用了我处理时间的相当大一部分。通过优化其他不必要的内存分配，我将运行时间减少了30％，因此它很重要。

Answer 1

我设置了以下struct以尝试准确地表示您的情景＆amp;促进基准测试：

#include <Rcpp.h>
// [[Rcpp::plugins(cpp11)]]

struct runif_test {

  size_t runs;
  size_t each;

  runif_test(size_t runs, size_t each)
  : runs(runs), each(each)
  {}
  // Your first code block
  void pre_init() {
    Rcpp::NumericVector v = no_init();
    for (size_t i = 0; i < runs; i++) {
      v = Rcpp::runif(each);
    }
  }
  // Your second code block
  void post_init() {
    for (size_t i = 0; i < runs; i++) {
      Rcpp::NumericVector v = Rcpp::runif(each);
    }
  }
  // Generate 1 draw at a time  
  void gen_runif() {
    Rcpp::NumericVector v = no_init();
    for (size_t i = 0; i < runs; i++) {
      std::generate_n(v.begin(), each, []() -> double {
        return Rcpp::as<double>(Rcpp::runif(1));
      });
    }
  }
  // Reduce overhead of pre-allocated vector
  inline Rcpp::NumericVector no_init() {
    return Rcpp::NumericVector(Rcpp::no_init_vector(each));
  } 
};

我对以下导出函数进行了基准测试：

// [[Rcpp::export]]
void do_pre(size_t runs, size_t each) {
  runif_test obj(runs, each);
  obj.pre_init();
}

// [[Rcpp::export]]
void do_post(size_t runs, size_t each) {
  runif_test obj(runs, each);
  obj.post_init();
}

// [[Rcpp::export]]
void do_gen(size_t runs, size_t each) {
  runif_test obj(runs, each);
  obj.gen_runif();
}

以下是我得到的结果：

R>  microbenchmark::microbenchmark(
    do_pre(100, 10e4)
    ,do_post(100, 10e4)
    ,do_gen(100, 10e4)
    ,times=100L)
Unit: milliseconds
                 expr      min       lq      mean   median        uq       max neval
  do_pre(100, 100000) 109.9187 125.0477  145.9918 136.3749  152.9609  337.6143   100
 do_post(100, 100000) 103.1705 117.1109  132.9389 130.4482  142.7319  204.0951   100
  do_gen(100, 100000) 810.5234 911.3586 1005.9438 986.8348 1062.7715 1501.2933   100

R>  microbenchmark::microbenchmark(
    do_pre(100, 10e5)
    ,do_post(100, 10e5)
    ,times=100L)
Unit: seconds
                  expr      min       lq     mean   median       uq      max neval
  do_pre(100, 1000000) 1.355160 1.614972 1.740807 1.723704 1.815953 2.408465   100
 do_post(100, 1000000) 1.198667 1.342794 1.443391 1.429150 1.519976 2.042511   100

所以，假设我解释/准确地表达了你的第二个问题，

如果两个版本都分配了一个新的向量，我可以通过改进在标量模式下一次生成一个随机数？

使用我的gen_runif()成员函数，我认为我们可以自信地说这不是最佳方法 - 比其他两个函数慢7.5倍。

更重要的是，要解决您的第一个问题，似乎只需要更快一点就可以初始化＆amp;为NumericVector的输出分配新的Rcpp::runif(n)。我当然不是C ++专家，但我相信第二种方法（分配给新的本地对象）比copy elision更快。在第二种情况下，看起来好像正在创建两个对象 - =，v左侧的对象和（临时的？rvalue？）对象在=的右侧，这是Rcpp::runif()的结果。但实际上，编译器很可能会优化这个不必要的步骤 - 我认为这一点在我链接的文章中有所解释：

当无名的临时，未绑定任何引用时，将被移动或者复制到相同类型的对象中......省略了复制/移动。那个临时的时候在构造中，它直接在存储器中构造否则被移动或复制到。

这至少是我对结果的解释。希望能够精通语言的人能够确认/否认/纠正这个结论。

Answer 2

使用一些实施细节添加到@nrussell的答案......

使用源代码，Luke！绝对适用于此，让我们来看看Rcpp::runif here的实现：

inline NumericVector runif( int n, double min, double max ){
    if (!R_FINITE(min) || !R_FINITE(max) || max < min) return NumericVector( n, R_NaN ) ;
    if( min == max ) return NumericVector( n, min ) ;
    return NumericVector( n, stats::UnifGenerator( min, max ) ) ;
}

我们看到使用NumericVector对象调用stats::UnifGenerator的有趣构造函数。该类的定义是here：

    class UnifGenerator__0__1 : public ::Rcpp::Generator<double> {
    public:

        UnifGenerator__0__1() {}

        inline double operator()() const {
            double u;
            do {u = unif_rand();} while (u <= 0 || u >= 1);
            return u;
        }
    } ;

因此，该类只是一个仿函数 - 它实现了operator()，因此可以“调用”该类的对象。

最后，关联的NumericVector构造函数为here：

template <typename U>
Vector( const int& size, const U& u) {
    RCPP_DEBUG_2( "Vector<%d>( const int& size, const U& u )", RTYPE, size )
    Storage::set__( Rf_allocVector( RTYPE, size) ) ;
    fill_or_generate( u ) ;
}

fill_or_generate函数最终将向下发送here：

template <typename T>
inline void fill_or_generate__impl( const T& gen, traits::true_type) {
    iterator first = begin() ;
    iterator last = end() ;
    while( first != last ) *first++ = gen() ;
}

因此我们可以看到提供了一个（模板化的）生成器函数来填充向量，并且operator()对象的相应gen用于填充向量 - 即，在这种情况下，stats::UnifGenerator对象。

那么，问题是，这次电话会是如何结合在一起的呢？

NumericVector x = runif(10);

我总是因为某种原因忘记了这一点，但我认为这基本上是x调用结果中runif(10)的复制构造，但@nrussell也详细说明了这一点。但是，我的理解：

runif使用NumericVector元素生成长度为10的runif - 调用此临时右手对象tmp，
x的副本构造与上述tmp相同。

我相信编译器将能够忽略该复制结构，因此x实际上是根据runif(10)的结果直接构造的，因此应该是有效的（至少在任何合理的优化时）水平），但我可能是错的......

runif的表现

2 个答案: