如何对字符串中的所有元音进行哈希?

时间:2016-09-07 18:20:20

标签: ruby hash

我到处搜索了它(hereherehere)但奇怪的是我找不到它。

我有一个字符串str = "make stackoverflow great again"。如何对该字符串中的所有元音事件进行哈希? {"a"=>5, "e"=>3, "i"=>1, "o"=>2, "u"=>0}

现在我有最钝,最不像红宝石的方法:

def count_vowels(string)
  list = {}
  a = string.count("a")
  e = string.count("e")
  i = string.count("i")
  o = string.count("o")
  u = string.count("u")

  list["a"] = a
  list["e"] = e
  list["i"] = i
  list["o"] = o
  list["u"] = u

  list
end

在任何给定字符串中对所有元音进行哈希的最红宝石式方法是什么?

6 个答案:

答案 0 :(得分:3)

考虑这样的解决方案:

str    = 'make stackoverflow great again'
vowels = %w(a e o u i)
vowels.each_with_object({}) {|vowel, hash| hash[vowel] = str.count(vowel) }
#=> {"a"=>5, "e"=>3, "o"=>2, "u"=>0, "i"=>1}

答案 1 :(得分:3)

%w(a e o u i).map{ |v| {v => str.count(v)} }.reduce(:merge)

答案 2 :(得分:3)

我只是要包含一个O(n)解决方案,因为之前的所有都是O(n ^ 2)。

str.gsub(/[^aeiou]/, "").each_char.each_with_object(Hash.new(0)) { |vowel, hash| hash[vowel] += 1 }
=> {"a"=>5, "e"=>3, "o"=>2, "i"=>1}

确定这个大O符号并不完美,所以请耐心等待。

步骤1 - O(n)

str.gsub(/[^aeiou]/, "")
=> "aeaoeoeaaai"

此步骤循环播放字符串字符并删除辅音。我尽力尝试找到gsub的实际运行时但没有做我自己的基准测试我不能确实确定。我最初在写答案的时候发现this,但它也不是铁定的。 gsub应找到与表达式匹配的所有索引,并将这些值作为新字符串返回。

步骤2 - O(n)

each_char

只需返回返回的字符串并返回一个Enumerator。根据语言的不同,将字符串转换为字符串/字符数组最坏的情况是字符串长度的运行时间(因此为O(n))。对于ruby,返回一个枚举器实际上是懒惰的评估,所以你可以在这里争论我们在O(1)。

步骤3 - O(n)

each_with_object(Hash.new(0)) { |vowel, hash| hash[vowel] += 1 }

这里有多个子步骤:

步骤3a - 实例化一个新的哈希对象O(1),其默认值为0。

步骤3b - 在散列中分配/找到密钥(元音) - 平均O(1),最差情况O(n)

步骤3c - 将该值增加1 - O(1)

步骤3d - 遍历每个字母 - O(n)

所以这意味着从我们的3个步骤开始我们有O(n)+ O(n)+ O(n),它实际上是O(3n)但是大O状态我们可以在这个例子中删除像3这样的常量,所以它只是变成O(n)。

我自己也在学习Big O所以这个解释可能会使用一些社区意见。

答案 3 :(得分:2)

str = "make stackoverflow great again"
vowels = %w{a e i o u}

vowels.map { |v| [v, str.count(v)] }.to_h
#> {"a"=>5, "e"=>3, "i"=>1, "o"=>2, "u"=>0}

答案 4 :(得分:2)

我会使用计算哈希,以便字符串的字符只遍历一次。见Hash::new

VOWELS = "aeiou"

h = str.each_char.with_object(Hash.new(0)) { |c,h| h[c] += 1 if VOWELS.include?(c) }
  #=> {"a"=>5, "e"=>3, "o"=>2, "i"=>1}

请注意h["u"] #=> 0

如果字符串很大,你应该能够通过将元音放在一个集合中来加快速度(并且不做任何其他更改)。

require 'set'

VOWELS = "aeiou".each_char.to_set
  #=> #<Set: {"a", "e", "i", "o", "u"}>

最好是在@Anthony完成时从字符串中删除字符串。

答案 5 :(得分:2)

我会这样做:

str = "make stackoverflow great again"

str.scan(/[aeiou]/).each_with_object(Hash.new{ |h, k| h[k] = 0}) { |v, h| h[v] += 1 }
# => {"a"=>5, "e"=>3, "o"=>2, "i"=>1}

但是,只是为了好玩,这是一个基准:

require 'fruity'
require 'set'

str = "make stackoverflow great again"
vowels = 'aeiou'
vowels_set = vowels.each_char.to_set
vowels_ary = vowels.chars
vowels_regex = /[aeiou]/
vowels_not_regex = /[^aeiou]/

compare do
  ttm      { str.scan(vowels_regex).each_with_object(Hash.new{ |h, k| h[k] = 0}) { |v, h| h[v] += 1 } }
  Andrey_Deineko { vowels_ary.each_with_object({}) {|vowel, hash| hash[vowel] = str.count(vowel) } }
  Anthony  { str.gsub(vowels_not_regex, "").each_char.each_with_object(Hash.new(0)) { |vowel, hash| hash[vowel] += 1 } }
  dimid    { vowels_ary.map{ |v| {v => str.count(v)} }.reduce(:merge) }
  cary_str { str.each_char.with_object(Hash.new(0)) { |c,h| h[c] += 1 if vowels.include?(c) } }
  cary_set { str.each_char.with_object(Hash.new(0)) { |c,h| h[c] += 1 if vowels_set.include?(c) } }
  seph     { vowels_ary.map { |v| [v, str.count(v)] }.to_h }
end

# >> Running each test 2048 times. Test will take about 3 seconds.
# >> Andrey_Deineko is similar to seph
# >> seph is faster than cary_str by 4x ± 1.0 (results differ: {"a"=>5, "e"=>3, "i"=>1, "o"=>2, "u"=>0} vs {"a"=>5, "e"=>3, "o"=>2, "i"=>1})
# >> cary_str is similar to dimid (results differ: {"a"=>5, "e"=>3, "o"=>2, "i"=>1} vs {"a"=>5, "e"=>3, "i"=>1, "o"=>2, "u"=>0})
# >> dimid is similar to ttm (results differ: {"a"=>5, "e"=>3, "i"=>1, "o"=>2, "u"=>0} vs {"a"=>5, "e"=>3, "o"=>2, "i"=>1})
# >> ttm is similar to cary_set
# >> cary_set is similar to Anthony
  

我很好奇str的长度如何影响以下解决方案的性能。

首先,嗯,代码就在那里。复制它,将其粘贴到编辑器中,更改字符串并进行实验,对吗?

将初始str的尺寸更改为100倍:

str = "make stackoverflow great again" * 100

结果变为:

# >> Running each test 512 times. Test will take about 29 seconds.
# >> Andrey_Deineko is similar to seph
# >> seph is faster than dimid by 2x ± 0.1
# >> dimid is faster than ttm by 55x ± 10.0 (results differ: {"a"=>500, "e"=>300, "i"=>100, "o"=>200, "u"=>0} vs {"a"=>500, "e"=>300, "o"=>200, "i"=>100})
# >> ttm is faster than Anthony by 30.000000000000004% ± 10.0%
# >> Anthony is faster than cary_str by 10.000000000000009% ± 10.0%
# >> cary_str is similar to cary_set