Question

我需要使用各种值填充Hash。有些值经常被访问，而另一些值很少被访问。

问题是，我正在使用一些计算来获取值，并且使用多个键填充哈希变得非常慢。

在我的情况下，使用某种缓存不是一种选择。

我想知道如何只在第一次访问密钥时而不是在添加密钥时使Hash计算值？

这样，很少使用的值不会减慢填充过程。

我正在寻找“有点异步”或懒惰访问的东西。

Answer 1

有很多不同的方法可以解决这个问题。我建议使用您定义的类的实例而不是哈希。例如，而不是......

# Example of slow code using regular Hash.
h = Hash.new
h[:foo] = some_long_computation
h[:bar] = another_long_computation
# Access value.
puts h[:foo]

...创建自己的类并定义方法，就像这样......

class Config
  def foo
    some_long_computation
  end

  def bar
    another_long_computation
  end
end

config = Config.new
puts config.foo

如果你想要一个简单的方法来缓存长计算，或者它绝对必须是一个Hash，而不是你自己的类，你现在可以用一个Hash包装Config实例。

config = Config.new
h = Hash.new {|h,k| h[k] = config.send(k) }
# Access foo.
puts h[:foo]
puts h[:foo]  # Not computed again. Cached from previous access.

上述示例的一个问题是，h.keys不会包含:bar，因为您尚未访问它。因此，您无法迭代h中的所有键或条目，因为它们在实际访问之前不存在。另一个潜在的问题是您的密钥需要是有效的Ruby标识符，因此在Config上定义时，任意带空格的String键都不起作用。

如果这对您很重要，有不同的方法来处理它。一种方法是使用thunks填充哈希值并在访问时强制使用thunk。

class HashWithThunkValues < Hash
  def [](key)
    val = super
    if val.respond_to?(:call)
      # Force the thunk to get actual value.
      val = val.call
      # Cache the actual value so we never run long computation again.
      self[key] = val
    end

    val
  end
end

h = HashWithThunkValues.new
# Populate hash.
h[:foo] = ->{ some_long_computation }
h[:bar] = ->{ another_long_computation }
h["invalid Ruby name"] = ->{ a_third_computation }  # Some key that's an invalid ruby identifier.
# Access hash.
puts h[:foo]
puts h[:foo]  # Not computed again. Cached from previous access.
puts h.keys  #=> [:foo, :bar, "invalid Ruby name"]

最后一个例子的一个警告是，如果你的值是可调用的，它将不起作用，因为它无法区分需要强制的thunk和值之间的区别。

同样，有办法解决这个问题。一种方法是存储一个标记是否已评估值的标志。但这需要为每个条目留出额外的内存。更好的方法是定义一个新类来标记Hash值是未评估的thunk。

class Unevaluated < Proc
end

class HashWithThunkValues < Hash
  def [](key)
    val = super

    # Only call if it's unevaluated.
    if val.is_a?(Unevaluated)
      # Force the thunk to get actual value.
      val = val.call
      # Cache the actual value so we never run long computation again.
      self[key] = val
    end

    val
  end
end

# Now you must populate like so.
h = HashWithThunkValues.new
h[:foo] = Unevaluated.new { some_long_computation }
h[:bar] = Unevaluated.new { another_long_computation }
h["invalid Ruby name"] = Unevaluated.new { a_third_computation }  # Some key that's an invalid ruby identifier.
h[:some_proc] = Unevaluated.new { Proc.new {|x| x + 2 } }

这样做的缺点是，现在你必须记住在填充哈希时使用Unevaluted.new。如果您希望所有值都是惰性的，您也可以覆盖[]=。我认为它实际上不会节省太多输入，因为您仍然需要使用Proc.new，proc，lambda或->{}来创建第一个块地点。但这可能是值得的。如果你这样做，它可能看起来像这样。

class HashWithThunkValues < Hash
  def []=(key, val)
    super(key, val.respond_to?(:call) ? Unevaluated.new(&val) : val)
  end
end

所以这是完整的代码。

class HashWithThunkValues < Hash

  # This can be scoped inside now since it's not used publicly.
  class Unevaluated < Proc
  end

  def [](key)
    val = super

    # Only call if it's unevaluated.
    if val.is_a?(Unevaluated)
      # Force the thunk to get actual value.
      val = val.call
      # Cache the actual value so we never run long computation again.
      self[key] = val
    end

    val
  end

  def []=(key, val)
    super(key, val.respond_to?(:call) ? Unevaluated.new(&val) : val)
  end

end

h = HashWithThunkValues.new
# Populate.
h[:foo] = ->{ some_long_computation }
h[:bar] = ->{ another_long_computation }
h["invalid Ruby name"] = ->{ a_third_computation }  # Some key that's an invalid ruby identifier.
h[:some_proc] = ->{ Proc.new {|x| x + 2 } }

Answer 2

您可以使用以下内容定义自己的索引器：

class MyHash
  def initialize
    @cache = {}
  end

  def [](key)
    @cache[key] || (@cache[key] = compute(key))
  end

  def []=(key, value)
    @cache[key] = value
  end

  def compute(key)
    @cache[key] = 1
  end
end

并按如下方式使用：

1.9.3p286 :014 > hash = MyHash.new
 => #<MyHash:0x007fa0dd03a158 @cache={}> 

1.9.3p286 :019 > hash["test"]
 => 1 

1.9.3p286 :020 > hash
 => #<MyHash:0x007fa0dd03a158 @cache={"test"=>1}>

Answer 3

你可以用这个：

class LazyHash < Hash

  def [] key
    (_ = (@self||{})[key]) ? 
      ((self[key] = _.is_a?(Proc) ? _.call : _); @self.delete(key)) :
      super
  end

  def lazy_update key, &proc
    (@self ||= {})[key] = proc
    self[key] = proc
  end

end

您的懒惰哈希将与普通Hash完全相同，因为它实际上是真实的Hash。

请参阅live demo here

***更新 - 回答嵌套过程问题***

是的，它会起作用，但这很麻烦。

请参阅更新的答案。

使用lazy_update代替[] =为您的哈希添加“懒惰”值。

Answer 4

这不是问题正文的答案，而是Enumerable::Lazy will definitely be a part of Ruby 2.0。这将让你对迭代器组合进行惰性评估：

lazy = [1, 2, 3].lazy.select(&:odd?)
# => #<Enumerable::Lazy: #<Enumerator::Generator:0x007fdf0b864c40>:each>
lazy.to_a 
# => [40, 50]

Ruby中是否有内置的懒惰哈希？

4 个答案:

请参阅live demo here