Ruby:如何找到长度为n的最频繁子串?

时间:2016-11-05 09:30:41

标签: ruby set substring find-occurrences

我试图弄清楚是否有更短的红宝石般的方法来找到长度为n的最频繁的子串?

我写了以下代码:

def most_frequent_kmers(length)
      dna = text.each_char.to_a
      array_dna_substrings = dna.each_cons(length).to_a

      counts = Hash.new 0
      array_dna_substrings.each do |elem|
        #count[elem] += 1
        counts[elem.join] += 1
      end

      counts = counts.sort_by { |substring, count| count}.reverse
      res = []

      for i in 0..counts.length-1
        res << counts[i] if counts[i][1] >= counts[0][1]
      end

      res = Hash[res.map {|key, value| [key,value]}]
      s = Set.new(res.keys)
      p [s,res.values.first]
end

dna1 = DNA.new('ATTGATTCCG')
dna1.most_frequent_kmers(2)
dna1.most_frequent_kmers(2)
dna1.most_frequent_kmers(3)
dna1.most_frequent_kmers(4)

示例输出:

>> dna1 = DNA.new('ATTGATTCCG') => ATTGATTCCG 
>> dna1.most_frequent_kmers(1) => [#<Set: {"T"}>, 4] 
>> dna1.most_frequent_kmers(2) => [#<Set: {"AT", "TT"}>, 2] 
>> dna1.most_frequent_kmers(3) => [#<Set: {"ATT"}>, 2] 
>> dna1.most_frequent_kmers(4) => [#<Set: {"ATTG", "TTGA", "TGAT", "GATT", "ATTC", "TTCC", "TCCG"}>, 1]

上面的代码完美无瑕,但必须有一个更简洁,更简洁的方法来搜索字符串中的子字符串设置长度。

我相信可以使用除法进行一组,但我无法弄明白。

任何帮助都会很棒!

干杯

0 个答案:

没有答案