为什么因子向量的效率低于整数甚至是字符向量?

时间:2015-10-07 12:35:57

标签: r

我刚注意到以下内容:

/* 
  Allow angular.js to be loaded in body, hiding cloaked elements until 
  templates compile.  The !important is important given that there may be 
  other selectors that are more specific or come later and might alter display.  
 */
[ng\:cloak], [ng-cloak], .ng-cloak {
  display: none !important;
}

这令我非常惊讶:

set.seed(42)
vec <- sample(c("a", "b", "c"), 1e4, replace=T)
vec_fac <- factor(vec)
vec_int <- as.integer(factor(vec))

library(microbenchmark)
microbenchmark(vec=="b", vec_fac=="b", vec_int==2, vec_fac==2)

我原以为这些因素比简单的字符向量更有效,但实际情况并非如此。 (当然,Unit: microseconds expr min lq mean median uq max neval vec == "b" 2397.150 2406.5925 2499.5715 2470.637 2532.628 2881.588 100 vec_fac == "b" 5706.932 5765.4340 6137.5441 6032.696 6401.567 8889.446 100 vec_int == 2 510.714 541.0935 623.8341 580.506 743.695 845.305 100 vec_fac == 2 5703.237 5772.6185 6339.6577 5975.015 6378.577 31502.869 100 vec_fac占用的内存比vec_int少一半。)

为什么因素不如整数向量有效?

1 个答案:

答案 0 :(得分:4)

测试需要一些转换。看看下面的分析。请注意(levels(vec_fac) == "b")[vec_fac]更快。

set.seed(42)
vec <- sample(c("a", "b", "c"), 1e4, replace=T)
vec_fac <- factor(vec)
vec_int <- as.integer(factor(vec))

library(microbenchmark)
microbenchmark(
  (levels(vec_fac) == "b")[vec_fac],
  vec_int == 2, 
  vec == "b", 
  vec_fac == 2,
  vec_fac == "b"
)
Unit: microseconds
                              expr     min       lq      mean   median      uq     max neval   cld
 (levels(vec_fac) == "b")[vec_fac]  62.861  69.7030  74.20981  71.8410  73.552 131.280   100 a    
                      vec_int == 2  73.124  85.0970  89.96756  86.8070  87.877 125.721   100  b   
                      vec == "b" 129.569 133.8450 138.57510 134.7005 135.129 170.621   100   c  
                      vec_fac == 2 303.611 331.8340 348.90436 334.6135 337.820 482.783   100    d 
                  vec_fac == "b" 347.656 376.7335 393.01326 379.2990 381.224 577.715   100     e

监测:

set.seed(42)
vec <- sample(c("a", "b", "c"), 1e8, replace=T)
vec_fac <- factor(vec)
vec_int <- as.integer(vec_fac)

Rprof()
junk <- vec_int == 2
Rprof(NULL)
summaryRprof()

Rprof()
junk <- vec == "b"
Rprof(NULL)
summaryRprof()

Rprof()
junk <- vec_fac == "b"
Rprof(NULL)
summaryRprof()

Rprof()
junk <- vec_fac == 2
Rprof(NULL)
summaryRprof()