如何找到重复字母数最多的单词

时间:2014-02-11 06:01:13

标签: ruby

我的目标是找到给定字符串中包含最多重复字母数的单词。例如,"aabcc ddeeteefef iijjfff"会返回"ddeeteefef",因为"e"在这个单词中重复了五次,而且超过了所有其他重复字符。

到目前为止,这是我得到的,但它有很多问题,并且不完整:

def LetterCountI(str)
  s = str.split(" ")
  i = 0
  result = []
  t = s[i].scan(/((.)\2+)/).map(&:max) 
  u = t.max { |a, b| a.length <=> b.length }
  return u.split(//).count 
end

我只找到连续模式的代码;如果模式被中断(例如使用"aabaaa",它会计算三次而不是五次)。

4 个答案:

答案 0 :(得分:6)

str.scan(/\w+/).max_by{ |w| w.chars.group_by(&:to_s).values.map(&:size).max }
  • scan(/\w+/) - 创建一个包含所有“单词”字符序列的数组
  • max_by{ … } - 找到在此区块内提供最大值的单词
  • chars - 将字符串拆分为字符
  • group_by(&:to_s) - 创建一个哈希,将每个字符映射到所有出现的数组
  • values - 只需获取所有出现的数组
  • map(&:size) - 将每个数组转换为该数组中的字符数
  • max - 找到最大字符并将其用作max_by检查的结果

修改:写得不那么紧凑:

str.scan(/\w+/).max_by do |word|
  word.chars
      .group_by{ |char| char }
      .map{ |char,array| array.size }
      .max
end

功能较少且使用较少的Ruby-isms(使其看起来更像“其他”语言):

words_by_most_repeated = []
str.split(" ").each do |word|
  count_by_char = {} # hash mapping character to count of occurrences
  word.chars.each do |char|
    count_by_char[ char ] = 0 unless count_by_char[ char ]
    count_by_char[ char ] += 1
  end
  maximum_count = 0
  count_by_char.each do |char,count|
    if count > maximum_count then
      maximum_count = count
    end
  end
  words_by_most_repeated[ maximum_count ] = word
end

most_repeated = words_by_most_repeated.last

答案 1 :(得分:4)

我会这样做:

s = "aabcc ddeeteefef iijjfff" 
# intermediate calculation that's happening in the final code
s.split(" ").map { |w| w.chars.max_by { |e| w.count(e) } }
# => ["a", "e", "f"] # getting the max count character from each word
s.split(" ").map { |w| w.count(w.chars.max_by { |e| w.count(e) }) }
# => [2, 5, 3] # getting the max count character's count from each word
# final code
s.split(" ").max_by { |w| w.count(w.chars.max_by { |e| w.count(e) }) }
# => "ddeeteefef"

<强>更新

each_with_object提供的结果优于group_by方法。

require 'benchmark'

s = "aabcc ddeeteefef iijjfff" 

def phrogz(s)
   s.scan(/\w+/).max_by{ |word| word.chars.group_by(&:to_s).values.map(&:size).max }
end

def arup_v1(s)
    max_string = s.split.max_by do |w| 
       h = w.chars.each_with_object(Hash.new(0)) do |e,hsh|
         hsh[e] += 1
       end
       h.values.max
    end
end

def arup_v2(s)
   s.split.max_by { |w| w.count(w.chars.max_by { |e| w.count(e) }) }
end

n = 100_000
Benchmark.bm do |x|
  x.report("Phrogz:") { n.times {|i| phrogz s } }
  x.report("arup_v2:"){ n.times {|i| arup_v2 s } }
  x.report("arup_v1:"){ n.times {|i| arup_v1 s } }
end

输出

            user     system      total        real
Phrogz:   1.981000   0.000000   1.981000 (  1.979198)
arup_v2:  0.874000   0.000000   0.874000 (  0.878088)
arup_v1:  1.684000   0.000000   1.684000 (  1.685168)

答案 2 :(得分:3)

与sawa的回答类似:

"aabcc ddeeteefef iijjfff".split.max_by{|w| w.length - w.chars.uniq.length}
=> "ddeeteefef"

在Ruby 2.x中,这是按原样运行的,因为String#chars返回一个数组。在早期版本的ruby中,String#chars会产生一个枚举器,因此您需要在应用.to_a之前添加uniq。我在Ruby 2.0中进行了测试,并忽略了这一点,直到Stephens指出。

我认为这是有效的,因为问题是“给定字符串中重复字母的最大数量”,而不是给定字符串中单个字母的最大重复次数。

答案 3 :(得分:2)

"aabcc ddeeteefef iijjfff"
.split.max_by{|w| w.chars.sort.chunk{|e| e}.map{|e| e.last.length}.max}
# => "ddeeteefef"