为什么我从这两个功能得到不同的结果? math.h::pow()
是否有特定的内容我做错了?或者我是否需要使用与abs()
不同的cmath::abs()
函数?
library(Rcpp)
sourceCpp(...) # below C++ function
#include <Rcpp.h>
#include <cmath>
#include <math.h>
#include <iostream>
using namespace Rcpp;
// [[Rcpp::export]]
double dist_q (NumericVector& x, NumericVector& y, int& q) {
int nx= x.size(), ny = y.size();
if (nx != ny) {
std::cout << "ERROR: Length of x and y differ." << std::endl;
return -1;
}
double temp = 0.0;
int m = 0;
for (int i = 0; i < nx; i++) {
if (!NumericVector::is_na(x[i]) && !NumericVector::is_na(y[i])) {
++m;
temp += pow(abs(x[i] - y[i]), (double) q);
}
}
temp = (1 / (double) m * temp);
return pow(temp, (1 / (double) q));
}
#--------------------------------------------------
# R function
dist_qR <- function(x, y, q= 2) {
if (!is.numeric(x) | !is.numeric(y)) stop("Both x and y must be numeric.")
if (q < 1 | q %% 1 != 0) stop("q must be an integer >= 1")
m <- sum(!is.na(x) & !is.na(y))
return((1 / m * sum(abs(x - y)^q, na.rm=TRUE))^(1/q))
}
# test
set.seed(1415)
x <- rnorm(1000)
y <- rnorm(1000)
x2 <- x; x2[x2>1.5] <- NA
y2 <- y; y2[y2 > 1.5] <- NA
> dist_q(x,y,2)
[1] 1.089495
> dist_qR(x,y,2)
[1] 1.438455
> dist_q(x2,y2,2)
[1] 0.9119293
> dist_qR(x2,y2,2)
[1] 1.249269
回复:@Konrad - R> ?"&&"
:
&安培;和&amp;&amp;表示逻辑AND和|和||表示逻辑OR。较短的形式以与算术运算符大致相同的方式执行元素比较。较长的形式从左到右评估仅检查每个向量的第一个元素。评估仅在确定结果之前进行。较长的形式适用于编程控制流程,通常在if子句中是首选。
使用fabs
,如下所示:
x <- list(x= rnorm(100),
x2= rnorm(1000),
x3= rnorm(10000),
x4= rnorm(100000))
y <- list(x= rnorm(100),
x2= rnorm(1000),
x3= rnorm(10000),
x4= rnorm(100000))
x2 <- lapply(x, function(l) {l[l>1.5] <- NA; return(l)})
y2 <- lapply(y, function(l) {l[l>1.5] <- NA; return(l)})
library(microbenchmark)
microbenchmark(n100_r= dist_qR(x[[1]], y[[1]], 3),
n1000_r= dist_qR(x[[2]], y[[2]], 3),
n10000_r= dist_qR(x[[3]], y[[3]], 3),
n100000_r= dist_qR(x[[4]], y[[4]], 3),
n100_c= dist_q(x[[1]], y[[1]], 3),
n1000_c= dist_q(x[[2]], y[[2]], 3),
n10000_c= dist_q(x[[3]], y[[3]], 3),
n100000_c= dist_q(x[[4]], y[[4]], 3), times= 50)
Unit: microseconds
expr min lq mean median uq max neval cld
n100_r 33.431 42.521 72.43820 83.8690 95.599 120.525 50 a
n1000_r 125.803 133.720 163.82538 167.1510 189.731 208.792 50 a
n10000_r 954.812 1061.260 1190.66434 1086.6260 1131.053 3577.318 50 b
n100000_r 10169.212 10806.144 11702.11890 11041.3280 12827.495 15562.607 50 d
n100_c 12.317 14.662 20.83844 20.5270 26.099 39.002 50 a
n1000_c 91.200 97.651 104.58960 105.1295 109.674 119.938 50 a
n10000_c 875.928 939.270 953.09926 951.4395 970.060 1015.807 50 b
n100000_c 8272.492 9227.011 9472.04164 9339.7640 9531.108 12799.929 50 c
# missing values not shown, but I get roughly the same timing
看起来大型向量的加速速度最小 - 可能因为R中的数学函数是用C实现的。但是对于短向量来说加速是很重要的。我猜这可能是由于我在R中用于类型检查的开销。
答案 0 :(得分:4)
由于某种原因,abs
类似乎感到困惑numericVector
。将其更改为C版fabs
可以解决问题:
...
temp += pow(fabs(x[i] - y[i]), (double) q); // fabs instead of abs
...
> dist_qR(x,y,2)
[1] 1.438455
> dist_q(x,y,2)
[1] 1.438455
> dist_q(x2,y2,2)
[1] 1.249269
> dist_qR(x2,y2,2)
[1] 1.249269