实际上,已编译的R代码比启用了JIT的纯R代码要慢

时间:2019-03-01 11:20:54

标签: r bytecode jit microbenchmark

Efficient R programming the byte compilerR docment r byte compiler中,我了解到cmpfun可用于将纯R函数编译为字节码以加快速度,而enableJIT则可加速通过启用just-in-time编译来实现。

因此,我决定使用以下代码像the first link一样进行基准测试:

library("compiler")
library("rbenchmark")

enableJIT(3)

my_mean = function(x) {
    total = 0
    n = length(x)
    for (each in x)
        total = total + each
    total / n
}

cmp_mean = cmpfun(my_mean, list(optimize = 3))

## Generate some data
x = rnorm(100000)
benchmark(my_mean(x), cmp_mean(x), mean(x), columns = c("test", "elapsed", "relative"), order = "relative", replications = 5000)

不幸的是,结果与the first link所示的结果不同。 my_mean的性能甚至优于cmp_mean

         test elapsed relative
3     mean(x)   1.468    1.000
1  my_mean(x)  35.402   24.116
2 cmp_mean(x)  36.817   25.080

我不知道发生了什么事。

编辑:

我计算机上的R版本是3.5.2

操作系统debian 9.8。我的计算机上的每个软件都是最新的debian提供的稳定资源。

linux内核版本4.9.0-8-amd64

Eidt5:

我重写了脚本以测试optimizeJIT的不同组合:

#!/usr/bin/env Rscript

library("compiler")
library("microbenchmark")
library("rlist")

my_mean = function(x) {
    total = 0
    n = length(x)
    for (each in x)
    total = total + each
    total / n
}

do_cmpfun = function(f, f_name, optimization_level) {
    cmp_f = cmpfun(f, list(optimize = optimization_level))
    list(cmp_f, f_name, optimize = optimization_level)
}

do_benchmark = function(f, f_name, optimization_level, JIT_level, x) {
    result = summary(microbenchmark(f(x), times = 1000, unit = "us", control = list(warmup = 100)))
    data.frame(fun = f_name, optimize = optimization_level, JIT = JIT_level, mean = result$mean)
}

means = list(list(mean, "mean", optimize = -1), list(my_mean, "my_mean", optimize = -1))

for (optimization_level in 0:3)
    means = list.append(means, do_cmpfun(my_mean, "my_mean", optimization_level))

# Generate some data
x = rnorm(100000)

# Benchmark in different JIT levels
result = c()
for (JIT_level in 0:3) {
    enableJIT(JIT_level)

    for (f in means) {
    result = rbind(result, do_benchmark(f[[1]], f[[2]], f[[3]], JIT_level, x))
    }
}


# Sort result
sorted_result = result[order(result$mean), ]
rownames(sorted_result) = NULL

print("Unit = us, optimize = -1 means it is not processed by cmpfun")
print(sorted_result)

我在运行R脚本之前运行了sudo cpupower frequency-set --governor performance,并得到了这个信息:

[1] "Unit = us, optimize = -1 means it is not processed by cmpfun"
       fun optimize JIT       mean
1     mean       -1   2   229.1841
2     mean       -1   1   229.3910
3     mean       -1   3   236.3680
4     mean       -1   0   252.9416
5  my_mean       -1   2  5242.0413
6  my_mean        3   0  5279.9710
7  my_mean        2   2  5297.5323
8  my_mean        2   1  5327.0324
9  my_mean       -1   1  5333.6941
10 my_mean        3   1  5336.4559
11 my_mean        2   0  5362.6644
12 my_mean        3   3  5410.1963
13 my_mean        2   3  5414.4616
14 my_mean       -1   3  5418.3823
15 my_mean        3   2  5437.3233
16 my_mean        1   2  9947.7897
17 my_mean        1   1 10101.6464
18 my_mean        1   3 10204.3253
19 my_mean        1   0 10323.0782
20 my_mean        0   0 26557.3808
21 my_mean        0   2 26728.5222
22 my_mean       -1   0 26901.4200
23 my_mean        0   3 26984.5200
24 my_mean        0   1 27060.6188

但是,我update-alternativelibblas.so.3liblapack.so.3 openblas 0.2.19-3my_meanoptimize = 3和{{1 }}成为性能最好的({{1}除外):

JIT = 0

mean相同:

[1] "Unit = us, optimize = -1 means it is not processed by cmpfun"
       fun optimize JIT       mean
1     mean       -1   0   228.9361
2     mean       -1   1   229.1223
3     mean       -1   2   233.9757
4     mean       -1   3   241.7835
5  my_mean        3   0  5246.8089
6  my_mean       -1   1  5261.3951
7  my_mean       -1   2  5330.6310
8  my_mean        2   3  5362.2055
9  my_mean        3   1  5400.9983
10 my_mean        2   0  5418.7674
11 my_mean        2   1  5460.8133
12 my_mean        3   3  5464.8280
13 my_mean       -1   3  5520.7021
14 my_mean        2   2  5591.7352
15 my_mean        3   2  5610.6446
16 my_mean        1   3 10244.2832
17 my_mean        1   0 10274.7504
18 my_mean        1   1 10311.6423
19 my_mean        1   2 10735.6449
20 my_mean        0   2 26904.1858
21 my_mean       -1   0 26961.0536
22 my_mean        0   0 27115.8191
23 my_mean        0   3 27538.7224
24 my_mean        0   1 28133.6159

1 个答案:

答案 0 :(得分:2)

虽然我还没有弄清楚为什么 JIT 编译没有加速您的代码,但我们可以通过使用 Rcpp 包进行编译来加速相同的函数。

这样做会得到以下结果(其中 mean_cpp 是使用 Rcpp 编写和编译的函数:

         test elapsed relative
4 mean_cpp(x)    0.67    1.000
3     mean(x)    1.00    1.493
1  my_mean(x)   14.00   20.896
2 cmp_mean(x)   14.50   21.642

生成这个函数的代码如下。

library("compiler")
library("rbenchmark")
library("Rcpp")


enableJIT(3)

my_mean = function(x) {
  total = 0
  n = length(x)
  for (each in x)
    total = total + each
  total / n
}

cmp_mean = cmpfun(my_mean, list(optimize = 3))


#we can also write this same function using the Rcpp package
cppFunction('double mean_cpp(NumericVector x) {
  double total = 0;
  int n = x.size();
  for(int i = 0; i < n; i++) {
    total += x[i];
  }
  return total / n;
}')


#run once to compile
mean_cpp(c(1))


## Generate some data
x = rnorm(100000)
benchmark(my_mean(x), cmp_mean(x), mean(x), mean_cpp(x),
          columns = c("test", "elapsed", "relative"),
          order = "relative", replications = 5000)