在Ruby中构建字符串的最快方法是什么?

时间:2011-07-01 14:10:09

标签: ruby string optimization

Ternary operator中,想要加入["foo", "bar", "baz"]逗号和“和”引用The Ruby Cookbook的人

  

如果效率对您很重要,   尽可能不构建新的字符串   将项目附加到现有字符串上。   [等等] ...使用str<< var1<< ''   << var2而不是。

但这本书写于2006年。

在Ruby的所有主要实现中,使用appending(即<<)仍然是在给定一个较小字符串数组的情况下构建大字符串的最快方法吗?

2 个答案:

答案 0 :(得分:23)

尽可能使用Array#join,不能时使用String#<<

使用String#+的问题是它必须创建一个中间(不需要的)字符串对象,而String#<<会改变原始字符串。以下是通过", "Array#joinString#+加入1,000个String#<<次1,000个字符串的时间结果(以秒为单位):

Ruby 1.9.2p180      user     system      total        real
Array#join      0.320000   0.000000   0.320000 (  0.330224)
String#+ 1      7.730000   0.200000   7.930000 (  8.373900)
String#+ 2      4.670000   0.600000   5.270000 (  5.546633)
String#<< 1     1.260000   0.010000   1.270000 (  1.315991)
String#<< 2     1.600000   0.020000   1.620000 (  1.793415)

JRuby 1.6.1         user     system      total        real
Array#join      0.185000   0.000000   0.185000 (  0.185000)
String#+ 1      9.118000   0.000000   9.118000 (  9.118000)
String#+ 2      4.544000   0.000000   4.544000 (  4.544000)
String#<< 1     0.865000   0.000000   0.865000 (  0.866000)
String#<< 2     0.852000   0.000000   0.852000 (  0.852000)

Ruby 1.8.7p334      user     system      total        real
Array#join      0.290000   0.010000   0.300000 (  0.305367)
String#+ 1      7.620000   0.060000   7.680000 (  7.682265)
String#+ 2      4.820000   0.130000   4.950000 (  4.957258)
String#<< 1     1.290000   0.010000   1.300000 (  1.304764)
String#<< 2     1.350000   0.010000   1.360000 (  1.347226)

Rubinius (head)     user     system      total        real
Array#join      0.864054   0.008001   0.872055 (  0.870757)
String#+ 1      9.636602   0.076005   9.712607 (  9.714820)
String#+ 2      6.456403   0.064004   6.520407 (  6.521633)
String#<< 1     2.196138   0.016001   2.212139 (  2.212564)
String#<< 2     2.176136   0.012001   2.188137 (  2.186298)

以下是基准测试代码:

WORDS = (1..1000).map{ rand(10000).to_s }
N = 1000

require 'benchmark'
Benchmark.bmbm do |x|
  x.report("Array#join"){
    N.times{ s = WORDS.join(', ') }
  }
  x.report("String#+ 1"){
    N.times{
      s = WORDS.first
      WORDS[1..-1].each{ |w| s += ", "; s += w }
    }
  }
  x.report("String#+ 2"){
    N.times{
      s = WORDS.first
      WORDS[1..-1].each{ |w| s += ", " + w }
    }
  }
  x.report("String#<< 1"){
    N.times{
      s = WORDS.first.dup
      WORDS[1..-1].each{ |w| s << ", "; s << w }
    }
  }
  x.report("String#<< 2"){
    N.times{
      s = WORDS.first.dup
      WORDS[1..-1].each{ |w| s << ", " << w }
    }
  }
end

在RVM下在Ubuntu上获得的结果。 Windows上RubyInstaller的Ruby 1.9.2p180的结果与上面显示的1.9.2类似。

答案 1 :(得分:3)

如果您的字符串位源不是数组怎么办?

TLDR; 即使您的字符串位源不是巨型数组,您最好先构建一个数组并使用连接。 +在2.1.1和1.9.3中并没有那么糟糕,但它仍然很糟糕(对于这个用例)。 1.9.3实际上在array.join&amp; <<


基准测试的老手可能已经看过@Phrogz的答案,并认为“但是但是......”因为连接基准测试没有其他人所做的数组枚举开销。我很想知道它有多大的不同,所以......

    WORDS = (1..1000).map{ rand(10000).to_s }
    N = 1000

    require 'benchmark'
    Benchmark.bmbm do |x|
      x.report("Array#join"){
        N.times{ s = WORDS.join(', ') }
      }
      x.report("Array#join 2"){
        N.times{
          arr = Array.new(WORDS.length)
          arr[0] = WORDS.first
          WORDS[1..-1].each{ |w| arr << w; }
          s = WORDS.join(', ')
        }
      }
      x.report("String#+ 1"){
        N.times{
          arr = Array.new(WORDS.length)
          s = WORDS.first
          WORDS[1..-1].each{ |w| arr << w; s += ", "; s += w }
        }
      }
      x.report("String#+ 2"){
        N.times{
          arr = Array.new(WORDS.length)
          s = WORDS.first
          WORDS[1..-1].each{ |w| arr << w; s += ", " + w }
        }
      }
      x.report("String#<< 1"){
        N.times{
          arr = Array.new(WORDS.length)
          s = WORDS.first.dup
          WORDS[1..-1].each{ |w| arr << w; s << ", "; s << w }
        }
      }
      x.report("String#<< 2"){
        N.times{
          arr = Array.new(WORDS.length)
          s = WORDS.first.dup
          WORDS[1..-1].each{ |w| arr << w; s << ", " << w }
        }
      }
      x.report("String#<< 2 A"){
        N.times{
          s = WORDS.first.dup
          WORDS[1..-1].each{ |w| s << ", " << w }
        }
      }
    end

小字,红宝石2.1.1

                        user     system      total        real
    Array#join      0.130000   0.000000   0.130000 (  0.128281)
    Array#join 2    0.220000   0.000000   0.220000 (  0.219588)
    String#+ 1      1.720000   0.770000   2.490000 (  2.478555)
    String#+ 2      1.040000   0.370000   1.410000 (  1.407190)
    String#<< 1     0.370000   0.000000   0.370000 (  0.371125)
    String#<< 2     0.360000   0.000000   0.360000 (  0.360161)
    String#<< 2 A   0.310000   0.000000   0.310000 (  0.318130)

小字,红宝石2.1.1

                        user     system      total        real
    Array#join      0.090000   0.000000   0.090000 (  0.092072)
    Array#join 2    0.180000   0.000000   0.180000 (  0.180423)
    String#+ 1      3.400000   0.750000   4.150000 (  4.149934)
    String#+ 2      1.740000   0.370000   2.110000 (  2.122511)
    String#<< 1     0.360000   0.000000   0.360000 (  0.359707)
    String#<< 2     0.340000   0.000000   0.340000 (  0.343233)
    String#<< 2 A   0.300000   0.000000   0.300000 (  0.297420)

我也很好奇基准测试如何受到(有时)长于23个字符的字符串位的影响,所以我重申:

    WORDS = (1..1000).map{ rand(100000).to_s * (rand(15)+1) }

正如我所料,对+的影响非常显着,但令我惊喜的是,它对join<<

影响不大

单词通常超过23个字符,ruby 2.1.1

                        user     system      total        real
    Array#join      0.150000   0.000000   0.150000 (  0.152846)
    Array#join 2    0.230000   0.010000   0.240000 (  0.231272)
    String#+ 1      7.450000   5.490000  12.940000 ( 12.936776)
    String#+ 2      4.200000   2.590000   6.790000 (  6.791125)
    String#<< 1     0.400000   0.000000   0.400000 (  0.399452)
    String#<< 2     0.380000   0.010000   0.390000 (  0.389791)
    String#<< 2 A   0.340000   0.000000   0.340000 (  0.341099)

单词通常超过23个字符,ruby 1.9.3

                        user     system      total        real
    Array#join      0.130000   0.010000   0.140000 (  0.132957)
    Array#join 2    0.220000   0.000000   0.220000 (  0.220181)
    String#+ 1     20.060000   5.230000  25.290000 ( 25.293366)
    String#+ 2      9.750000   2.670000  12.420000 ( 12.425229)
    String#<< 1     0.390000   0.000000   0.390000 (  0.397733)
    String#<< 2     0.390000   0.000000   0.390000 (  0.390540)
    String#<< 2 A   0.330000   0.000000   0.330000 (  0.333791)