我使用 base :: order()命令有大量代码,我真的懒得在rcpp中编写代码。由于Rcpp只支持排序,而不支持订单,我花了2分钟创建了这个功能:
// [[Rcpp::export]]
Rcpp::NumericVector order_cpp(Rcpp::NumericVector invec){
int leng = invec.size();
NumericVector y = clone(invec);
for(int i=0; i<leng; ++i){
y[sum(invec<invec[i])] = i+1;
}
return(y);
}
它以某种方式起作用。如果向量包含唯一数字,我得到与order()相同的结果。如果它们不是唯一的,结果是不同的,但没有错(实际上没有唯一的解决方案)。
使用它:
c=sample(1:1000,500)
all.equal(order(c),order_cpp(c))
microbenchmark(order(c),order_cpp(c))
Unit: microseconds
expr min lq median uq max neval
order(c) 33.507 36.223 38.035 41.356 78.785 100
order_cpp(c) 2372.889 2427.071 2466.312 2501.932 2746.586 100
哎哟! 我需要一个有效的算法。 好的,所以我dug up一个bubbleort实现并对其进行了调整:
// [[Rcpp::export]]
Rcpp::NumericVector bubble_order_cpp2(Rcpp::NumericVector vec){
double tmp = 0;
int n = vec.size();
Rcpp::NumericVector outvec = clone(vec);
for (int i = 0; i <n; ++i){
outvec[i]=static_cast<double>(i)+1.0;
}
int no_swaps;
int passes;
passes = 0;
while(true) {
no_swaps = 0;
for (int i = 0; i < n - 1 - passes; ++i) {
if(vec[i] > vec[i+1]) {
no_swaps++;
tmp = vec[i];
vec[i] = vec[i+1];
vec[i+1] = tmp;
tmp = outvec[i];
outvec[i] = outvec[i+1];
outvec[i+1] = tmp;
};
};
if(no_swaps == 0) break;
passes++;
};
return(outvec);
}
嗯,这样更好 - 但不是很好:
microbenchmark(order(c),order_cpp(c),bubble_order_cpp2(c),sort(c),c[order(c)])
Unit: microseconds
expr min lq median uq max neval
order(c) 33.809 38.034 40.1475 43.3170 72.144 100
order_cpp(c) 2339.080 2435.675 2478.5385 2526.8350 3535.637 100
bubble_order_cpp2(c) 219.752 231.977 234.5430 241.1840 322.383 100
sort(c) 59.467 64.749 68.2205 75.4645 148.815 100
c[order(c)] 38.336 41.204 44.3735 48.1460 93.878 100
另一个发现:订购比订购更快。
那么,对于更短的向量:
c=sample(1:100)
microbenchmark(order(c),order_cpp(c),bubble_order_cpp2(c),sort(c),c[order(c)])
Unit: microseconds
expr min lq median uq max neval
order(c) 10.566 11.4710 12.8300 14.1880 63.089 100
order_cpp(c) 95.689 100.8200 102.7825 107.3105 198.018 100
bubble_order_cpp2(c) 9.962 11.1700 12.0750 13.2830 64.598 100
sort(c) 39.242 41.5065 42.5620 46.3355 155.758 100
c[order(c)] 11.773 12.6790 13.5840 15.9990 82.710 100
哦,我忽略了一个RcppArmadillo功能:
// [[Rcpp::export]]
Rcpp::NumericVector ordera(arma::vec x) {
return(Rcpp::as<Rcpp::NumericVector>(Rcpp::wrap(arma::sort_index( x )+1)) );
}
microbenchmark(order(c),order_(c),ordera(c))
Unit: microseconds
expr min lq median uq max neval
order(c) 9.660 11.169 11.773 12.377 46.185 100
order_(c) 4.529 5.133 5.736 6.038 34.413 100
ordera(c) 4.227 4.830 5.434 6.038 60.976 100
答案 0 :(得分:9)
这是一个利用Rcpp sugar实现order
函数的简单版本。我们检查了重复项,以便我们保证事情按预期工作。 (当有sort
时,Rcpp的NA
方法也存在一个错误,因此可能也需要检查 - 这将在下一个版本中修复。)
#include <Rcpp.h>
using namespace Rcpp;
// [[Rcpp::export]]
IntegerVector order_(NumericVector x) {
if (is_true(any(duplicated(x)))) {
Rf_warning("There are duplicates in 'x'; order not guaranteed to match that of R's base::order");
}
NumericVector sorted = clone(x).sort();
return match(sorted, x);
}
/*** R
library(microbenchmark)
x <- runif(1E5)
identical( order(x), order_(x) )
microbenchmark(
order(x),
order_(x)
)
*/
给了我
> Rcpp::sourceCpp('~/test-order.cpp')
> set.seed(456)
> library(microbenchmark)
> x <- runif(1E5)
> identical( order(x), order_(x) )
[1] TRUE
> microbenchmark(
+ order(x),
+ order_(x)
+ )
Unit: milliseconds
expr min lq median uq max neval
order(x) 15.48007 15.69709 15.86823 16.21142 17.22293 100
order_(x) 10.81169 11.07167 11.40678 11.87135 48.66372 100
>
当然,如果您对输出不匹配R感到满意,可以删除重复检查 - x[order_(x)]
仍然会正确排序;更具体地说,all(x[order(x)] == x[order_(x)])
应该返回TRUE
。
答案 1 :(得分:5)
另一种基于C ++ 11的解决方案:
// [[Rcpp::plugins(cpp11)]]
#include <Rcpp.h>
using namespace Rcpp;
template <int RTYPE>
IntegerVector order_impl(const Vector<RTYPE>& x, bool desc) {
auto n = x.size();
IntegerVector idx = no_init(n);
std::iota(idx.begin(), idx.end(), static_cast<size_t>(1));
if (desc) {
auto comparator = [&x](size_t a, size_t b){ return x[a - 1] > x[b - 1]; };
std::stable_sort(idx.begin(), idx.end(), comparator);
} else {
auto comparator = [&x](size_t a, size_t b){ return x[a - 1] < x[b - 1]; };
std::stable_sort(idx.begin(), idx.end(), comparator);
// simulate na.last
size_t nas = 0;
for (size_t i = 0; i < n; ++i, ++nas)
if (!Vector<RTYPE>::is_na(x[idx[i] - 1])) break;
std::rotate(idx.begin(), idx.begin() + nas, idx.end());
}
return idx;
}
// [[Rcpp::export]]
IntegerVector order2(SEXP x, bool desc = false) {
switch(TYPEOF(x)) {
case INTSXP: return order_impl<INTSXP>(x, desc);
case REALSXP: return order_impl<REALSXP>(x, desc);
case STRSXP: return order_impl<STRSXP>(x, desc);
default: stop("Unsupported type.");
}
}
/***R
int <- sample.int(1000, 1E5, replace = TRUE)
dbl <- runif(1E5)
chr <- sample(letters, 1E5, replace = TRUE)
library(benchr)
benchmark(order(int), order2(int))
benchmark(order(dbl), order2(dbl))
benchmark(order(chr), order2(chr))
*/
比较效果:
R> int <- sample.int(1000, 1E5, replace = TRUE)
R> dbl <- runif(1E5)
R> chr <- sample(letters, 1E5, replace = TRUE)
R> library(benchr)
R> benchmark(order(int), order2(int))
Benchmark summary:
Time units : microseconds
expr n.eval min lw.qu median mean up.qu max total relative
order(int) 100 442 452 464 482 486 1530 48200 1.0
order2(int) 100 5150 5170 5220 5260 5270 6490 526000 11.2
R> benchmark(order(dbl), order2(dbl))
Benchmark summary:
Time units : milliseconds
expr n.eval min lw.qu median mean up.qu max total relative
order(dbl) 100 13.90 14.00 14.20 14.80 15.8 17.4 1480 1.98
order2(dbl) 100 7.11 7.13 7.15 7.26 7.3 8.8 726 1.00
R> benchmark(order(chr), order2(chr))
Benchmark summary:
Time units : milliseconds
expr n.eval min lw.qu median mean up.qu max total relative
order(chr) 100 128.0 131.0 133.0 133.0 134.0 148.0 13300 7.34
order2(chr) 100 17.7 17.9 18.1 18.2 18.3 22.2 1820 1.00
请注意,基础radix
中的order
方法要快得多。
答案 2 :(得分:4)
以下是使用std::sort
的另一种方法。
typedef std::pair<int, double> paired;
bool cmp_second(const paired & left, const paired & right) {
return left.second < right.second;
}
Rcpp::IntegerVector order(const Rcpp::NumericVector & x) {
const size_t n = x.size();
std::vector<paired> pairs;
pairs.reserve(n);
for(size_t i = 0; i < n; i++)
pairs.push_back(std::make_pair(i, x(i)));
std::sort(pairs.begin(), pairs.end(), cmp_second<paired>);
Rcpp::IntegerVector result = Rcpp::no_init(n);
for(size_t i = 0; i < n; i++)
result(i) = pairs[i].first;
return result;
}