Question

我可以使用`microbenchmark来计算在R中执行代码所需的大致时间吗？我正在运行一些代码，我可以看到它需要花费很多时间才能执行？我不想一直运行我的代码。我希望看到大致的执行时间而不实际运行R中的代码。

Answer 1

尝试在较小的问题上运行代码，看看它是如何扩展的

cp -R ./build/* /website-on-host/ # Copy to the web root

将问题大小加倍导致执行速度呈指数级增长;可视化为

> fun0 = function(n) { x = integer(); for (i in seq_len(n)) x = c(x, i); x }
> p = microbenchmark(fun0(1000), fun0(2000), fun0(4000), fun0(8000), fun0(16000),
+                    times=20)
> p
Unit: milliseconds
        expr        min         lq       mean     median         uq        max
  fun0(1000)   1.627601   1.697958   1.995438   1.723522   2.289424   2.935609
  fun0(2000)   5.691456   6.333478   6.745057   6.928060   7.056893   8.040366
  fun0(4000)  23.343611  24.487355  24.987870  24.854968  25.554553  26.088183
  fun0(8000)  92.517691  95.827525 104.900161  97.305930 112.924961 136.434998
 fun0(16000) 365.495320 369.697953 380.981034 374.456565 390.829214 411.203191
 neval
    20
    20
    20
    20
    20

对于重大问题，这是一个可怕的消息！

调查在返回相同答案时可以更好地扩展的替代实现，两者都会随着问题的大小和给定的问题大小而增加。首先确保您的算法/实现得到相同的答案

library(ggplot2)
ggplot(p, aes(x=expr, y=log(time))) + geom_point() + 
    geom_smooth(method="lm", aes(x=as.integer(expr)))

然后看看新算法如何根据问题大小进行扩展

> ## linear, ok
> fun1 = function(n) { x = integer(n); for (i in seq_len(n)) x[[i]] = i; x }
> identical(fun0(100), fun1(100))
[1] TRUE

探索更多算法，尤其是那些用矢量化取代迭代的算法

> microbenchmark(fun1(100), fun1(1000), fun1(10000))
Unit: microseconds
        expr      min       lq      mean    median         uq       max neval
   fun1(100)   86.260   97.558  121.5591  102.6715   107.6995  1058.321   100
  fun1(1000)  845.160  902.221  932.7760  922.8610   945.6305  1915.264   100
 fun1(10000) 8776.673 9100.087 9699.7925 9385.8560 10310.6240 13423.718   100

比较特定尺寸的算法

> ## linear, faster -- *nano*seconds
> fun2 = seq_len
> identical(fun1(100), fun2(100))
[1] TRUE
> microbenchmark(fun2(100), fun2(1000), fun2(10000))
Unit: nanoseconds
        expr   min      lq     mean median    uq   max neval
   fun2(100)   417   505.0   587.53    553   618  2247   100
  fun2(1000)  2126  2228.5  2774.90   2894  2986  5511   100
 fun2(10000) 19426 19741.0 25390.93  27177 28209 43418   100

表明随着问题规模的增加，合理实施的重要性日益增加。

如何在不实际运行R中的代码的情况下测量代码的执行时间？

1 个答案: