优化Ruby数组或哈希

时间:2016-04-10 06:30:06

标签: ruby

我有一个生成模拟打字的程序。该程序获取用户输入文件位置的位置以及文件和扩展名。然后使用迭代将文件分解并将其放入数组中。

def file_to_array(file)
  empty = []
  File.foreach("#{file}") do |line|
    empty << line.to_s.split('')
  end
  return empty.flatten!
end

当程序运行时,它会将密钥发送到文本区域以模拟通过win32ole进行的输入。

在5,000个字符之后,内存开销太大,程序开始变慢。过去5,000个字符越慢。有没有办法可以优化它?

- 编辑 -

require 'Benchmark'

def file_to_array(file)
  empty = []
  File.foreach(file) do |line|
    empty << line.to_s.split('')
  end
  return empty.flatten!
end
def file_to_array_2(file)
  File.read(file).split('')
end

file = 'xxx'

Benchmark.bm do |results|
    results.report { print file_to_array(file) }
    results.report { print file_to_array_2(file) }
end
    user     system      total        real
 0.234000   0.000000   0.234000 (  0.787020)
 0.218000   0.000000   0.218000 (  1.917185)

1 个答案:

答案 0 :(得分:2)

我做了我的基准测试和个人资料,这里是代码:

#!/usr/bin/env ruby
require 'benchmark'
require 'rubygems'
require 'ruby-prof'

def ftoa_1(path)
  empty = []
  File.foreach(path) do |line|
    empty << line.to_s.split('')
  end
  return empty.flatten!
end

def ftoa_2(path)
  File.read(path).split('')
end

def ftoa_3(path)
  File.read(path).chars
end

def ftoa_4(path)
  File.open(path) { |f| f.each_char.to_a }
end

GC.start
GC.disable

Benchmark.bm(6) do |x|
  1.upto(4) do |n|
    x.report("ftoa_#{n}") {send("ftoa_#{n}", ARGV[0])}
  end
end

1.upto(4) do |n|
  puts "\nProfiling ftoa_#{n} ...\n"

  result = RubyProf.profile do
    send("ftoa_#{n}", ARGV[0])
  end

  RubyProf::FlatPrinter.new(result).print($stdout)
end

这是我的结果:

             user     system      total        real
ftoa_1   2.090000   0.160000   2.250000 (  2.250350)
ftoa_2   1.540000   0.090000   1.630000 (  1.632173)
ftoa_3   0.420000   0.080000   0.500000 (  0.505286)
ftoa_4   0.550000   0.090000   0.640000 (  0.630003)

Profiling ftoa_1 ...
Measure Mode: wall_time
Thread ID: 70190654290440
Fiber ID: 70189795562220
Total: 2.571306
Sort by: self_time

 %self      total      self      wait     child     calls  name
 83.39      2.144     2.144     0.000     0.000   103930   String#split
 12.52      0.322     0.322     0.000     0.000        1   Array#flatten!
  3.52      2.249     0.090     0.000     2.159        1   <Class::IO>#foreach
  0.57      0.015     0.015     0.000     0.000   103930   String#to_s
  0.00      2.571     0.000     0.000     2.571        1   Global#[No method]
  0.00      2.571     0.000     0.000     2.571        1   Object#ftoa_1
  0.00      0.000     0.000     0.000     0.000        1   Fixnum#to_s

* indicates recursively called methods

Profiling ftoa_2 ...
Measure Mode: wall_time
Thread ID: 70190654290440
Fiber ID: 70189795562220
Total: 1.855242
Sort by: self_time

 %self      total      self      wait     child     calls  name
 99.77      1.851     1.851     0.000     0.000        1   String#split
  0.23      0.004     0.004     0.000     0.000        1   <Class::IO>#read
  0.00      1.855     0.000     0.000     1.855        1   Global#[No method]
  0.00      1.855     0.000     0.000     1.855        1   Object#ftoa_2
  0.00      0.000     0.000     0.000     0.000        1   Fixnum#to_s

* indicates recursively called methods

Profiling ftoa_3 ...
Measure Mode: wall_time
Thread ID: 70190654290440
Fiber ID: 70189795562220
Total: 0.721246
Sort by: self_time

 %self      total      self      wait     child     calls  name
 99.42      0.717     0.717     0.000     0.000        1   String#chars
  0.58      0.004     0.004     0.000     0.000        1   <Class::IO>#read
  0.00      0.721     0.000     0.000     0.721        1   Object#ftoa_3
  0.00      0.721     0.000     0.000     0.721        1   Global#[No method]
  0.00      0.000     0.000     0.000     0.000        1   Fixnum#to_s

* indicates recursively called methods

Profiling ftoa_4 ...
Measure Mode: wall_time
Thread ID: 70190654290440
Fiber ID: 70189795562220
Total: 0.816140
Sort by: self_time

 %self      total      self      wait     child     calls  name
 99.99      0.816     0.816     0.000     0.000        2   IO#each_char
  0.00      0.000     0.000     0.000     0.000        1   File#initialize
  0.00      0.000     0.000     0.000     0.000        1   IO#close
  0.00      0.816     0.000     0.000     0.816        1   <Class::IO>#open
  0.00      0.000     0.000     0.000     0.000        1   IO#closed?
  0.00      0.816     0.000     0.000     0.816        1   Global#[No method]
  0.00      0.816     0.000     0.000     0.816        1   Enumerable#to_a
  0.00      0.816     0.000     0.000     0.816        1   Enumerator#each
  0.00      0.816     0.000     0.000     0.816        1   Object#ftoa_4
  0.00      0.000     0.000     0.000     0.000        1   Fixnum#to_s

* indicates recursively called methods

结论是ftoa_3是关闭GC时最快的,但我建议使用ftoa_4,因为它使用的内存更少,从而减少了GC的次数。如果您打开GC,您会发现ftoa_4将是最快的。

从个人资料搜索结果中,您可以看到该计划在String#splitftoa_1的{​​{1}}中花费的时间最多。 ftoa_2是最差的,因为ftoa_1多次运行(每行1次),String#split也需要很长时间。