问题包括在一年365天n
天内抽样,以这种方式
min_dist
使用n= 12
和min_dist= 20
,正确的结果可能是向量
[1] 4 43 69 97 129 161 192 215 243 285 309 343
此向量的diff
为[1] 39 26 28 32 32 31 23 28 42 24 34
,所有值均大于或等于min_dist= 20
。
我用
解决了这个问题sample_r()
R
函数
sample_cpp()
界面包c++
Rcpp
中运行c++
Rccp
解决方案的速度要慢得多(我的Mac因子为60倍)。我是c++
新手,因此我自己的研究能力有限 - 请原谅。
如何重构R
代码比原始.cpp file
代码更快?
#include <Rcpp.h>
using namespace Rcpp;
using namespace std;
// [[Rcpp::export]]
IntegerVector sample_cpp(int n, int min_dist= 5L, int seed= 42L) {
IntegerVector res_empty= Rcpp::rep(NA_INTEGER, n);
IntegerVector res;
IntegerVector available_days_full= Rcpp::seq(1, 365);
IntegerVector available_days;
IntegerVector forbidden_days;
IntegerVector forbidden_space = Rcpp::seq(-(min_dist-1), (min_dist-1));
bool fail;
Environment base("package:base");
Function set_seed = base["set.seed"];
set_seed(seed);
do {
res= res_empty;
available_days = available_days_full;
fail= FALSE;
for(int i= 0; i < n; ++i) {
res[i]= sample(available_days, 1, FALSE)[0];
forbidden_days= res[i]+forbidden_space;
available_days= setdiff(available_days, forbidden_days);
if(available_days.size() <= 1){
fail= TRUE;
break;
}
}
}
while(fail== TRUE);
std::sort(res.begin(), res.end());
return res;
}
/*** R
# c++ function
(r= sample_cpp(n= 12, min_dist= 20, seed=1))
diff(r)
# R function
sample_r= function(n= 12, min_dist=5, seed= 42){
if(n*min_dist>= 365) stop("Infeasible.")
set.seed(seed)
repeat{
res= numeric(n)
fail= FALSE
available_days= seq(365)
for(i in seq(n)){
if(length(available_days) <= 1){
fail= TRUE
break()
}
res[i]= sample(available_days, 1)
forbidden_days= res[i]+(-(min_dist-1):(min_dist-1))
available_days= setdiff(available_days, forbidden_days)
}
if(fail== FALSE) return(sort(res))
}
}
(r= sample_r(n= 12, min_dist= 20, seed= 40))
diff(r)
# Benchmark
library(rbenchmark)
benchmark(cpp= sample_cpp(n= 12, min_dist = 28),
r= sample_r(n= 12, min_dist = 28),
replications = 50)[,1:4]
*/
) test replications elapsed relative
1 cpp 50 28.005 63.217
2 r 50 0.443 1.000
基准:
c++
好的,我尝试优化(据我所知c++
),#include <Rcpp.h>
using namespace Rcpp;
using namespace std;
// [[Rcpp::export]]
IntegerVector sample_cpp(int n, int min_dist= 5L, int seed= 42L) {
IntegerVector res;
IntegerVector available_days;
IntegerVector forbidden_days;
IntegerVector forbidden_space = Rcpp::seq(-(min_dist-1), (min_dist-1));
bool fail;
Environment base("package:base");
Function set_seed = base["set.seed"];
set_seed(seed);
do {
res= Rcpp::rep(NA_INTEGER, n);
available_days = Rcpp::seq(1, 365);
fail= FALSE;
for(int i= 0; i < n; ++i) {
if(available_days.size() < n-i){
fail= TRUE;
break;
}
int temp= sample(available_days, 1, FALSE)[0];
res[i]= temp;
forbidden_days= unique(pmax(0, temp + forbidden_space));
available_days= setdiff(available_days, forbidden_days);
}
}
while(fail== TRUE);
std::sort(res.begin(), res.end());
return res;
}
/*** R
# R function
sample_r= function(n= 12, min_dist=5, seed= 42){
if(n*min_dist>= 365) stop("Infeasible.")
set.seed(seed)
repeat{
res= numeric(n)
fail= FALSE
available_days= seq(365)
for(i in seq(n)){
if(length(available_days) <= n-i){
fail= TRUE
break()
}
res[i]= sample(available_days, 1)
forbidden_days= res[i]+(-(min_dist-1):(min_dist-1))
available_days= setdiff(available_days, forbidden_days)
}
if(fail== FALSE) return(sort(res))
}
}
# Benchmark
library(rbenchmark)
benchmark(cpp= sample_cpp(n= 12, min_dist = 28),
r= sample_r(n= 12, min_dist = 28),
replications = 50)[,1:4]
*/
实施仍然落后,但现在只是略有下降。
test replications elapsed relative
1 cpp 50 0.643 1.475
2 r 50 0.436 1.000
基准:
{{1}}
答案 0 :(得分:0)
您可以通过一次性采样最大可能天数来优化您的R版本。
以下代码比您的快。从统计学上来说,我对循环前的大部分时间进行了抽样。剩余天数在循环中进行采样,但循环可能只运行一次。也许两次。
此外,使用Rcpp很容易重写。
#MyCompany\MyBundle2\Resources\Config\services.yml
services:
ipad.registerchecksum:
class: MyCompany\MyBundle2\Services\Registerchecksum
ipad.download_history:
class: Your Service
arguments: ["@ipad.registerchecksum"]
顺便说一下,我的代码中可能存在一些问题。但我认为这个想法是正确的。