负整数与正整数索引的时间效率

时间:2018-03-27 11:39:17

标签: r algorithm performance indexing

@Moody_Mudskipper和我讨论了负整数索引和正整数索引的相对优点。考虑一个包含1行的大矩阵:

m <- matrix(integer(2^20), 1)

负向或正向索引向量是获得除最后一列以外的所有(时间)效率更高的方式吗?

我们的候选人到目前为止

neg <- function(m) m[,-ncol(m)]
pos <- function(m) m[,seq_len(ncol(m) - 1)]

对于矢量:v <- integer(2^28)

怎么样?
neg_v <- function(v) v[-length(v)]
pos_v <- function(v) v[seq_len(length(v) - 1)]

如果对一种方法有好处,它是否会延续到更一般的情况,你想要除了一些任意索引之外的一切,i

我们的灵感部分是通过讨论获得最后一个元素的最有效方式here

2 个答案:

答案 0 :(得分:3)

[非常多才多艺。您需要以较低的性能成本付费。我仍然鼓励你用它来完成这项任务。如果算法中此步骤的执行对您至关重要,那么您使用的是错误的语言。

矢量的替代方案:

len_v <- function(v) `length<-`(v, length(v) - 1)

对于矩阵(注意这不会丢弃尺寸):

dim_m <- function(m) matrix(`length<-`(m, length(m) - nrow(m)), nrow = nrow(m))

基准:

[1] 10
Unit: nanoseconds
     expr  min   lq     mean median   uq     max neval cld
 neg_v(v) 3841 4098  4747.78   4353 4609   36363   100   a
 pos_v(v) 4866 5378  5572.30   5633 5634    7426   100   a
   neg(m) 4353 4866  5165.02   5122 5378    6914   100   a
   pos(m) 5121 5378  5792.41   5634 5890   15364   100   a
 len_v(v)  768 1024  1065.31   1024 1280    2049   100   a
 dim_m(m) 2817 3329 24219.64   3585 3841 2063960   100   a

<snip>

[1] 20
Unit: microseconds
     expr      min       lq      mean    median       uq        max neval cld
 neg_v(v) 3125.901 3210.406 5997.0644 4080.9305 4726.878 102998.274   100   b
 pos_v(v) 4510.752 4572.721 6308.3921 5097.8025 5980.619 104768.516   100   b
   neg(m) 3369.172 3438.952 6286.7153 4657.4825 4976.423 102606.735   100   b
   pos(m) 4499.996 4518.306 4895.5423 4534.5670 5059.904   6577.785   100  ab
 len_v(v)  562.852  591.020  859.3246  630.3275  680.006   2381.493   100  a 
 dim_m(m) 1072.184 1120.070 3528.7498 1167.1880 2377.523  99962.255   100  ab

答案 1 :(得分:1)

两种解决方案都不依赖于输入的大小。随着对象的大小变大,正索引向量获胜。这可能有点令人惊讶,因为我们假设负整数索引立即将事物发送到C,并且正整数索引在seq_len中进行了一次额外的R调用,但这是一个非常快速的原语。但请注意,我们甚至没有谈论因子2.

对于一般情况,但是,正向索引向量必须使用两个seq调用构建,即使是基元seq_lenseq.int,也可以在任何时候提高效率索引丢失了。

这是我们的初始候选人的微基准数据,编辑包括@ DavidArenburg的建议,head(v, -1)用于比较,适用于矢量问题,但不适用于矩阵问题:

for (size in 10:20){
  v <- integer(2^size)
  m <- matrix(v, 1)
  print(size)
  print(microbenchmark(neg_v(v), pos_v(v), head(v, -1), neg(m), pos(m)))
}


[1] 10
Unit: microseconds
        expr    min      lq     mean  median      uq    max neval cld
    neg_v(v)  7.665  7.9435  8.41404  8.1680  8.4995 20.977   100 a  
    pos_v(v)  7.684  7.9900  8.94586  8.2090  8.4895 57.970   100 ab 
 head(v, -1) 18.371 19.2280 20.77184 19.7155 20.6470 58.672   100   c
      neg(m)  9.013  9.3440  9.99337  9.6675 10.0070 23.603   100  b 
      pos(m)  8.457  8.9640  9.36145  9.1965  9.5035 16.494   100 ab 
[1] 11
Unit: microseconds
        expr    min      lq     mean  median      uq    max neval cld
    neg_v(v) 13.905 14.2010 15.36906 14.3640 14.6355 25.910   100 a  
    pos_v(v) 14.140 14.4275 15.83083 14.5965 14.8975 55.185   100 ab 
 head(v, -1) 25.068 25.8605 28.92266 26.6320 27.6395 79.200   100   c
      neg(m) 15.769 16.1215 17.52703 16.3175 16.7270 51.691   100  b 
      pos(m) 15.149 15.5375 16.68138 15.7605 16.0315 49.755   100 ab 
[1] 12
Unit: microseconds
        expr    min      lq     mean  median      uq     max neval cld
    neg_v(v) 26.190 27.3690 31.18507 29.4325 31.4510  48.557   100  a 
    pos_v(v) 25.296 28.4645 35.51150 29.3030 32.1255 421.324   100  a 
 head(v, -1) 37.928 39.8455 48.19410 40.9510 46.2560 375.465   100   b
      neg(m) 29.437 31.3505 35.00288 32.6420 35.0445  69.953   100  a 
      pos(m) 26.738 28.9945 37.35408 29.4220 34.1825 462.919   100  ab
[1] 13
Unit: microseconds
        expr    min      lq      mean  median      uq      max neval cld
    neg_v(v) 46.506 51.6525  66.96329 56.8475 58.5865  442.329   100   a
    pos_v(v) 48.263 56.0625  70.38045 56.4500 58.1370  498.686   100   a
 head(v, -1) 58.836 66.9540  75.79951 68.0400 71.4055  179.082   100   a
      neg(m) 51.101 57.8895 153.67522 61.9160 63.4910 8280.099   100   a
      pos(m) 49.914 55.1205  65.61441 55.7930 60.3385  395.184   100   a
[1] 14
Unit: microseconds
        expr     min       lq     mean   median       uq      max neval cld
    neg_v(v)  91.234 111.0975 201.0988 112.6285 114.5950 8577.642   100   a
    pos_v(v)  95.394 110.4745 213.2071 111.5600 113.9150 7812.403   100   a
 head(v, -1) 107.164 122.1795 146.4730 123.7635 127.4225  567.144   100   a
      neg(m) 101.615 120.1835 141.3496 121.6715 124.0385  564.711   100   a
      pos(m)  98.133 106.4320 115.3708 108.5990 109.5940  563.723   100   a
[1] 15
Unit: microseconds
        expr     min       lq     mean   median       uq       max neval cld
    neg_v(v) 183.586 572.8205 752.0629 590.8605 668.0415 14919.153   100   a
    pos_v(v) 207.188 466.6390 605.7880 503.4750 541.7855  8832.833   100   a
 head(v, -1) 210.091 512.0090 540.2439 537.2850 563.8335   766.843   100   a
      neg(m) 219.635 594.0065 689.7076 610.0625 632.1530  7904.021   100   a
      pos(m) 214.892 400.7375 523.8320 412.9320 485.0875  8831.092   100   a
[1] 16
Unit: microseconds
        expr     min        lq      mean    median        uq       max neval cld
    neg_v(v) 363.163 1024.6745 1326.5858 1046.4375 1120.8760 10668.002   100   b
    pos_v(v) 380.258  884.8680  981.2985  897.1980  960.6395  6879.475   100  a 
 head(v, -1) 388.339  916.7945  970.3009  937.1235  966.2430  1707.400   100  a 
      neg(m) 394.158 1063.4785 1207.5574 1081.2935 1279.0390  7388.591   100  ab
      pos(m) 386.093  724.6455  920.7682  733.4520  829.8680  7127.476   100  a 
[1] 17
Unit: microseconds
        expr     min       lq     mean   median       uq        max neval cld
    neg_v(v) 734.334 2011.042 2156.787 2054.732 2209.009  10029.207   100   a
    pos_v(v) 761.919 1745.547 1828.906 1772.859 1883.327   7059.740   100   a
 head(v, -1) 791.387 1786.015 1940.802 1822.699 1955.429   8123.136   100   a
      neg(m) 794.872 1968.399 3905.147 2109.351 2315.058 153287.378   100   a
      pos(m) 767.494 1421.462 1516.872 1440.947 1556.770   7150.461   100   a
[1] 18
Unit: milliseconds
        expr      min       lq     mean   median       uq       max neval cld
    neg_v(v) 1.481940 2.240029 4.140381 4.110589 4.528180 11.454875   100   b
    pos_v(v) 1.560234 1.985711 3.588765 3.484492 3.662266 10.422674   100   b
 head(v, -1) 1.588862 2.034140 3.484662 3.544228 3.805630  9.757330   100  ab
      neg(m) 1.597898 2.120536 3.583514 3.937739 4.329667  9.443408   100   b
      pos(m) 1.550572 1.891935 2.832852 2.846605 3.088712  8.821962   100  a 
[1] 19
Unit: milliseconds
        expr      min       lq     mean   median       uq       max neval cld
    neg_v(v) 3.252966 4.080474 6.994025 4.323296 6.660327 160.33926   100   a
    pos_v(v) 3.144045 3.923489 5.330351 4.050275 7.129475  12.49244   100   a
 head(v, -1) 3.233204 3.989537 5.661008 4.246967 7.526656  12.98025   100   a
      neg(m) 3.507776 4.320612 7.669867 4.652584 8.542794 157.08516   100   a
      pos(m) 3.114030 3.685220 6.276472 3.819997 4.896968 156.17061   100   a
[1] 20
Unit: milliseconds
        expr      min        lq     mean    median       uq      max neval cld
    neg_v(v) 8.089944 12.161850 18.88741 13.185043 14.51066 171.9262   100   a
    pos_v(v) 7.918274  8.419380 16.07148 11.899635 13.33752 173.2544   100   a
 head(v, -1) 7.938086  8.478848 17.49432 12.432816 13.54419 165.1829   100   a
      neg(m) 8.645179 11.077800 20.55285 13.502900 14.43028 171.8824   100   a
      pos(m) 7.334725  7.565886 14.32296  8.394255 12.04803 166.6472   100   a

一般情况

neg_mid <- function(m, i) m[,-i]
pos_mid <- function(m, i) m[c(seq_len(i-1), seq.int(i + 1, ncol(m)))]
neg_mid_v <- function(v, i) v[-i]
pos_mid_v <- function(v, i) v[c(seq_len(i - 1), seq.int(i + 1, length(v)))]

for (size in 10:20){
  v <- integer(2^size)
  m <- matrix(v, 1)
  i <- sample(2^size, 1)
  print(size)
  print(microbenchmark(neg_mid(m, i), pos_mid(m, i), neg_mid_v(v, i), pos_mid_v(v, i)))
}

[1] 10
Unit: microseconds
            expr    min      lq     mean  median      uq      max neval cld
   neg_mid(m, i)  8.256  8.6450 23.24169  8.9035  9.2690 1398.011   100   a
   pos_mid(m, i) 11.559 11.8745 42.67081 12.3090 12.5665 2985.724   100   a
 neg_mid_v(v, i)  7.262  7.6350 25.55930  7.9745  8.3010 1706.966   100   a
 pos_mid_v(v, i) 11.162 11.6665 43.02460 11.9085 12.1270 3076.268   100   a

<snip>

[1] 20
Unit: milliseconds
            expr       min        lq     mean   median       uq      max neval cld
   neg_mid(m, i)  8.399471  9.239935 17.68447 13.61378 14.67457 183.7487   100   a
   pos_mid(m, i) 11.217084 12.215356 19.00080 16.47771 17.76067 186.3762   100   a
 neg_mid_v(v, i)  7.832412 10.067575 15.94740 13.14343 14.61393 187.9892   100   a
 pos_mid_v(v, i) 10.890024 13.340772 20.99761 16.47525 18.06857 195.5897   100   a
  • 抛开:在另外一个特殊情况下,除了第一行之外,你正在寻找除了第一行之外的所有东西,当输入变大时,正向索引向量仍会胜过时间效率的负数。