理解矢量化

时间:2017-10-10 01:44:04

标签: r formatting vectorization

我一直在寻找一种方法,将R中的大数字格式化为2.3K5.6M。我在SO上找到了this解决方案。事实证明,它显示了一些输入向量的奇怪行为。

这是我想要了解的内容 -

# Test vector with weird behaviour
x <- c(302.456500093388, 32553.3619756151, 3323.71232001074, 12065.4076372462, 
  0, 6270.87962956305, 383.337515655172, 402.20778095643, 19466.0204345063, 
  1779.05474064539, 1467.09928489114, 3786.27112222457, 2080.08078309959, 
  51114.7097545816, 51188.7710104291, 59713.9414049798)

# Formatting function for large numbers
comprss <- function(tx) { 
  div <- findInterval(as.numeric(gsub("\\,", "", tx)), 
                      c(1, 1e3, 1e6, 1e9, 1e12) )
  paste(round( as.numeric(gsub("\\,","",tx))/10^(3*(div-1)), 1), 
        c('','K','M','B','T')[div], sep = '')
}

# Compare outputs for the following three commands
x
comprss(x)
sapply(x, comprss)

我们可以看到comprss(x)生成0k作为5 th 元素,这很奇怪,但comprss(x[5])给出了预期的结果。 6 th 元素甚至更奇怪。

据我所知,comprss正文中使用的所有函数都是矢量化的。那么为什么我还需要sapply我的出路呢?

1 个答案:

答案 0 :(得分:1)

这是一个改编自pryr:::print.bytes的矢量化版本:

format_for_humans <- function(x, digits = 3){
    grouping <- pmax(floor(log(abs(x), 1000)), 0)
    paste0(signif(x / (1000 ^ grouping), digits = digits), 
           c('', 'K', 'M', 'B', 'T')[grouping + 1])
}

format_for_humans(10 ^ seq(0, 12, 2))
#> [1] "1"    "100"  "10K"  "1M"   "100M" "10B"  "1T"

x <- c(302.456500093388, 32553.3619756151, 3323.71232001074, 12065.4076372462, 
       0, 6270.87962956305, 383.337515655172, 402.20778095643, 19466.0204345063, 
       1779.05474064539, 1467.09928489114, 3786.27112222457, 2080.08078309959, 
       51114.7097545816, 51188.7710104291, 59713.9414049798)

format_for_humans(x)
#>  [1] "302"   "32.6K" "3.32K" "12.1K" "0"     "6.27K" "383"   "402"  
#>  [9] "19.5K" "1.78K" "1.47K" "3.79K" "2.08K" "51.1K" "51.2K" "59.7K"

format_for_humans(x, digits = 1)
#>  [1] "300" "30K" "3K"  "10K" "0"   "6K"  "400" "400" "20K" "2K"  "1K" 
#> [12] "4K"  "2K"  "50K" "50K" "60K"