我创建了一个for循环,我想使用Rcpp库来加速它。我对C ++不太熟悉。你能帮助我更快地完成我的功能吗? 谢谢你的帮助!
我已将算法,代码以及输入和输出与sessionInfo包含在内。
这是我的算法:
如果当前价格高于之前的价格,请在名为TR
的列中标记(+1)如果当前价格低于之前的价格,请在名为TR
的列中标记(-1)如果当前价格与之前的价格相同, 在名为TR
的列中标记与先前价格相同的内容这是我的代码:
price <- c(71.91, 71.82, 71.81, 71.81, 71.81, 71.82, 71.81, 71.81, 71.81,
71.82, 71.81, 71.81, 71.8, 71.81, 71.8, 71.81, 71.8, 71.8, 71.8,
71.8, 71.81, 71.81, 71.81, 71.81, 71.81, 71.81, 71.81, 71.81,
71.81, 71.82, 71.81, 71.81, 71.81, 71.81, 71.81, 71.81, 71.8,
71.8, 71.81, 71.81, 71.81, 71.81, 71.82, 71.82, 71.81, 71.81,
71.81, 71.81, 71.81, 71.81, 71.81, 71.82, 71.82, 71.82, 71.82,
71.82, 71.82, 71.82, 71.81, 71.82, 71.82, 71.82, 71.82, 71.82,
71.82, 71.82, 71.82, 71.82, 71.81, 71.81, 71.81, 71.82, 71.82,
71.81, 71.82, 71.82, 71.82, 71.81, 71.82, 71.82, 71.82, 71.81,
71.81, 71.82, 71.82, 71.82, 71.82, 71.82, 71.82, 71.82, 71.82,
71.82, 71.82, 71.82, 71.82, 71.82, 71.82, 71.82, 71.82, 71.82,
71.82, 71.82, 71.82, 71.82, 71.82, 71.82, 71.82, 71.82, 71.82,
71.82, 71.82, 71.82, 71.82, 71.82, 71.82, 71.82, 71.82, 71.82,
71.82, 71.82, 71.82, 71.82, 71.82, 71.82, 71.82, 71.82, 71.82,
71.82, 71.82, 71.82, 71.82, 71.82, 71.82, 71.82, 71.82, 71.81,
71.82, 71.82, 71.82, 71.82, 71.83, 71.82, 71.82, 71.82, 71.81,
71.81, 71.81, 71.81, 71.81, 71.81, 71.81, 71.82, 71.82, 71.82,
71.81, 71.81, 71.81, 71.82, 71.82, 71.82, 71.82, 71.82, 71.82,
71.82, 71.82, 71.82, 71.82, 71.82, 71.83, 71.83, 71.83, 71.83,
71.83, 71.83, 71.83, 71.83, 71.83, 71.83, 71.83, 71.83, 71.83,
71.83, 71.83, 71.83, 71.83, 71.83, 71.83, 71.83, 71.83, 71.83,
71.83, 71.83, 71.83, 71.83, 71.83, 71.83, 71.83, 71.83, 71.83,
71.83)
TR <- numeric(length(price)-1)
TR <- c(NA,TR)
for (i in 1: (length(price)-1)){
if (price[i] == price[i+1]) {TR[i+1] = TR[i]}
if (price[i] < price[i+1]) {TR[i+1] = 1}
if (price[i] > price[i+1]) {TR[i+1] = -1}
}
这是我的输出: dput(TR)产生
c(NA, -1, -1, -1, -1, 1, -1, -1, -1, 1, -1, -1, -1, 1, -1, 1,
-1, -1, -1, -1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, -1, -1, -1, -1,
-1, -1, -1, -1, 1, 1, 1, 1, 1, 1, -1, -1, -1, -1, -1, -1, -1,
1, 1, 1, 1, 1, 1, 1, -1, 1, 1, 1, 1, 1, 1, 1, 1, 1, -1, -1, -1,
1, 1, -1, 1, 1, 1, -1, 1, 1, 1, -1, -1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, -1, 1, 1, 1, 1, 1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
-1, 1, 1, 1, -1, -1, -1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1)
这是我的sessionInfo:
> sessionInfo()
R version 3.1.2 (2014-10-31)
Platform: x86_64-w64-mingw32/x64 (64-bit)
locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C
[5] LC_TIME=English_United States.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] data.table_1.9.4
loaded via a namespace (and not attached):
[1] chron_2.3-45 plyr_1.8.1 Rcpp_0.11.1 reshape2_1.4 stringr_0.6.2 tools_3.1.2
答案 0 :(得分:21)
你可以直接翻译for循环:
library(Rcpp)
cppFunction(
"IntegerVector proc(NumericVector x) {
const int n = x.size();
IntegerVector y(n);
y[0] = NA_INTEGER;
for (int i=1; i < n; ++i) {
if (x[i] == x[i-1]) y[i] = y[i-1];
else if (x[i] > x[i-1]) y[i] = 1;
else y[i] = -1;
}
return y;
}")
与往常一样,与基础R中的for循环相比,使用Rcpp可以获得相当大的加速:
proc.for <- function(price) {
TR <- numeric(length(price)-1)
TR <- c(NA,TR)
for (i in 1: (length(price)-1)){
if (price[i] == price[i+1]) {TR[i+1] = TR[i]}
if (price[i] < price[i+1]) {TR[i+1] = 1}
if (price[i] > price[i+1]) {TR[i+1] = -1}
}
return(TR)
}
proc.aaron <- function(price) {
change <- sign(diff(price))
good <- change != 0
goodval <- change[good]
c(NA, goodval[cumsum(good)])
}
proc.jbaums <- function(price) {
TR <- sign(diff(price))
TR[TR==0] <- TR[which(TR != 0)][findInterval(which(TR == 0), which(TR != 0))]
TR
}
all.equal(proc(price), proc.for(price), proc.aaron(price), proc.jbaums(price))
# [1] TRUE
library(microbenchmark)
microbenchmark(proc(price), proc.for(price), proc.aaron(price), proc.jbaums(price))
# Unit: microseconds
# expr min lq mean median uq max neval
# proc(price) 1.871 2.5380 3.92111 3.1110 4.5880 15.318 100
# proc.for(price) 408.200 448.2830 542.19766 484.1265 546.3255 1821.104 100
# proc.aaron(price) 23.916 25.5770 33.53259 31.5420 35.8575 190.372 100
# proc.jbaums(price) 33.536 38.8995 46.80109 43.4510 49.3555 112.306 100
与for循环相比,我们看到加速度超过100倍,与提供的载体的向量化替代品相比,我们看到加速速度超过10倍。
使用更大的矢量(此处测试的长度为100万),加速更加显着:
price.big <- rep(price, times=5000)
all.equal(proc(price.big), proc.for(price.big), proc.aaron(price.big), proc.jbaums(price.big))
# [1] TRUE
microbenchmark(proc(price.big), proc.for(price.big), proc.aaron(price.big), proc.jbaums(price.big))
# Unit: milliseconds
# expr min lq mean median uq max neval
# proc(price.big) 1.442119 1.818494 5.094274 2.020437 2.771903 56.54321 100
# proc.for(price.big) 2639.819536 2699.493613 2949.962241 2781.636460 3062.277930 4472.35369 100
# proc.aaron(price.big) 91.499940 99.859418 132.519296 140.521212 147.462259 207.72813 100
# proc.jbaums(price.big) 117.242451 138.528214 170.989065 170.606048 180.337074 487.13615 100
现在,与for循环相比,我们有1000倍的加速比,与向量化的R函数相比,加速速度提高了~70倍。即使在这个大小,也不清楚Rcpp相对于矢量化R解决方案有多大优势,如果该函数只调用一次,因为编译Rcpp代码肯定需要至少100 ms。如果这是一段在分析中反复调用的代码,那么加速非常有吸引力。
答案 1 :(得分:18)
您可以尝试字节编译。查看使用与Rcpp代码相同的if-else-if-else
逻辑的R循环也很有用。随着R 3.1.2我得到
f1 <- function(price) {
TR <- numeric(length(price)-1)
TR <- c(NA,TR)
for (i in 1: (length(price)-1)){
if (price[i] == price[i+1]) {TR[i+1] = TR[i]}
if (price[i] < price[i+1]) {TR[i+1] = 1}
if (price[i] > price[i+1]) {TR[i+1] = -1}
}
return(TR)
}
f2 <- function(price) {
TR <- numeric(length(price)-1)
TR <- c(NA,TR)
for (i in 1: (length(price)-1)){
if (price[i] == price[i+1]) {TR[i+1] = TR[i]}
else if (price[i] < price[i+1]) {TR[i+1] = 1}
else {TR[i+1] = -1}
}
return(TR)
}
library(compiler)
f1c <- cmpfun(f1)
f2c <- cmpfun(f2)
library(microbenchmark)
microbenchmark(f1(price), f2(price), f1c(price), f2c(price), times = 1000)
## Unit: microseconds
## expr min lq mean median uq max neval cld
## f1(price) 536.619 570.3715 667.3520 586.2465 609.9280 45046.462 1000 d
## f2(price) 328.592 351.2070 386.5895 365.0245 381.4850 1302.497 1000 c
## f1c(price) 167.570 182.4645 218.9537 192.4780 204.7810 7843.291 1000 b
## f2c(price) 96.644 107.4465 124.1324 113.5470 121.5365 1019.389 1000 a
R-devel,将于4月发布为R 3.2.0,在字节代码引擎上有许多改进,用于标量计算,如下所示;我得到了
microbenchmark(f1(price), f2(price), f1c(price), f2c(price), times = 1000)
## Unit: microseconds
## expr min lq mean median uq max neval cld
## f1(price) 490.300 520.3845 559.19539 533.2050 548.6850 1330.219 1000 d
## f2(price) 298.375 319.7475 348.71384 330.4535 342.6405 1813.113 1000 c
## f1c(price) 61.947 66.3255 68.01555 67.7270 69.5470 138.308 1000 b
## f2c(price) 36.334 38.9500 40.45085 40.1830 41.8610 55.909 1000 a
这使您进入与此示例中的矢量化解决方案相同的通用球场。字节码引擎仍有进一步改进的空间,应该将其用于未来版本。
所有解决方案在处理NA/NaN
值方面都有所不同,这些值对您来说可能或不重要。
答案 2 :(得分:13)
也许先尝试矢量化。虽然这可能不会像Rcpp一样快,但它更直接。
f2 <- function(price) {
change <- sign(diff(price))
good <- change != 0
goodval <- change[good]
c(NA, goodval[cumsum(good)])
}
它仍然比R for循环快得多。
f1 <- function(price) {
TR <- numeric(length(price)-1)
TR <- c(NA,TR)
for (i in 1: (length(price)-1)){
if (price[i] == price[i+1]) {TR[i+1] = TR[i]}
if (price[i] < price[i+1]) {TR[i+1] = 1}
if (price[i] > price[i+1]) {TR[i+1] = -1}
}
TR
}
microbenchmark(f1(price), f2(price), times=100)
## Unit: microseconds
## expr min lq mean median uq max neval cld
## f1(price) 550.037 592.9830 756.20095 618.7910 703.8335 3042.530 100 b
## f2(price) 36.915 39.3285 56.45267 45.5225 60.1965 184.536 100 a
答案 3 :(得分:4)
这可以在R中轻松地进行矢量化。
例如diff
和findInterval
:
TR <- sign(diff(price))
TR[TR==0] <- TR[which(TR != 0)][findInterval(which(TR == 0), which(TR != 0))]