Question

为什么我从这两个功能得到不同的结果？ math.h::pow()是否有特定的内容我做错了？或者我是否需要使用与abs()不同的cmath::abs()函数？

library(Rcpp)
sourceCpp(...) # below C++ function

#include <Rcpp.h>
#include <cmath>
#include <math.h>
#include <iostream>
using namespace Rcpp;

// [[Rcpp::export]]
double dist_q (NumericVector& x, NumericVector& y, int& q) {
  int nx= x.size(), ny = y.size();

  if (nx != ny) {
    std::cout << "ERROR: Length of x and y differ." << std::endl;
    return -1;
  }

  double temp = 0.0;
  int m = 0;
  for (int i = 0; i < nx; i++) {
    if (!NumericVector::is_na(x[i]) && !NumericVector::is_na(y[i])) {
      ++m;
      temp += pow(abs(x[i] - y[i]), (double) q);
    }
  }
  temp = (1 / (double) m * temp);
  return pow(temp, (1 / (double) q));
}

#--------------------------------------------------
# R function
dist_qR <- function(x, y, q= 2) {
  if (!is.numeric(x) | !is.numeric(y)) stop("Both x and y must be numeric.")
  if (q < 1 | q %% 1 != 0) stop("q must be an integer >= 1")

  m <- sum(!is.na(x) & !is.na(y))

  return((1 / m * sum(abs(x - y)^q, na.rm=TRUE))^(1/q))
}


# test
set.seed(1415)
x <- rnorm(1000)
y <- rnorm(1000)
x2 <- x; x2[x2>1.5] <- NA
y2 <- y; y2[y2 > 1.5] <- NA

> dist_q(x,y,2)
[1] 1.089495
> dist_qR(x,y,2)
[1] 1.438455
> dist_q(x2,y2,2)
[1] 0.9119293
> dist_qR(x2,y2,2)
[1] 1.249269

回复：@Konrad - R> ?"&&"：

＆安培;和＆amp;＆amp;表示逻辑AND和|和||表示逻辑OR。较短的形式以与算术运算符大致相同的方式执行元素比较。较长的形式从左到右评估仅检查每个向量的第一个元素。评估仅在确定结果之前进行。较长的形式适用于编程控制流程，通常在if子句中是首选。

时序

使用fabs，如下所示：

x <- list(x= rnorm(100),
          x2= rnorm(1000),
          x3= rnorm(10000),
          x4= rnorm(100000))

y <- list(x= rnorm(100),
          x2= rnorm(1000),
          x3= rnorm(10000),
          x4= rnorm(100000))


x2 <- lapply(x, function(l) {l[l>1.5] <- NA; return(l)})
y2 <- lapply(y, function(l) {l[l>1.5] <- NA; return(l)})

library(microbenchmark)

microbenchmark(n100_r= dist_qR(x[[1]], y[[1]], 3),
               n1000_r= dist_qR(x[[2]], y[[2]], 3),
               n10000_r= dist_qR(x[[3]], y[[3]], 3),
               n100000_r= dist_qR(x[[4]], y[[4]], 3),
               n100_c= dist_q(x[[1]], y[[1]], 3),
               n1000_c= dist_q(x[[2]], y[[2]], 3),
               n10000_c= dist_q(x[[3]], y[[3]], 3),
               n100000_c= dist_q(x[[4]], y[[4]], 3), times= 50)

Unit: microseconds
      expr       min        lq        mean     median        uq       max neval  cld
    n100_r    33.431    42.521    72.43820    83.8690    95.599   120.525    50 a   
   n1000_r   125.803   133.720   163.82538   167.1510   189.731   208.792    50 a   
  n10000_r   954.812  1061.260  1190.66434  1086.6260  1131.053  3577.318    50  b  
 n100000_r 10169.212 10806.144 11702.11890 11041.3280 12827.495 15562.607    50    d
    n100_c    12.317    14.662    20.83844    20.5270    26.099    39.002    50 a   
   n1000_c    91.200    97.651   104.58960   105.1295   109.674   119.938    50 a   
  n10000_c   875.928   939.270   953.09926   951.4395   970.060  1015.807    50  b  
 n100000_c  8272.492  9227.011  9472.04164  9339.7640  9531.108 12799.929    50   c 

# missing values not shown, but I get roughly the same timing

看起来大型向量的加速速度最小 - 可能因为R中的数学函数是用C实现的。但是对于短向量来说加速是很重要的。我猜这可能是由于我在R中用于类型检查的开销。

Answer 1

由于某种原因，abs类似乎感到困惑numericVector。将其更改为C版fabs可以解决问题：

    ...
    temp += pow(fabs(x[i] - y[i]), (double) q);  // fabs instead of abs
    ...

> dist_qR(x,y,2)
[1] 1.438455
> dist_q(x,y,2)
[1] 1.438455
> dist_q(x2,y2,2)
[1] 1.249269
> dist_qR(x2,y2,2)
[1] 1.249269

Rcpp vs R函数。我的代码中的错误？

时序

1 个答案: