我试图弄清楚是否有更短的红宝石般的方法来找到长度为n的最频繁的子串?
我写了以下代码:
def most_frequent_kmers(length)
dna = text.each_char.to_a
array_dna_substrings = dna.each_cons(length).to_a
counts = Hash.new 0
array_dna_substrings.each do |elem|
#count[elem] += 1
counts[elem.join] += 1
end
counts = counts.sort_by { |substring, count| count}.reverse
res = []
for i in 0..counts.length-1
res << counts[i] if counts[i][1] >= counts[0][1]
end
res = Hash[res.map {|key, value| [key,value]}]
s = Set.new(res.keys)
p [s,res.values.first]
end
dna1 = DNA.new('ATTGATTCCG')
dna1.most_frequent_kmers(2)
dna1.most_frequent_kmers(2)
dna1.most_frequent_kmers(3)
dna1.most_frequent_kmers(4)
示例输出:
>> dna1 = DNA.new('ATTGATTCCG') => ATTGATTCCG
>> dna1.most_frequent_kmers(1) => [#<Set: {"T"}>, 4]
>> dna1.most_frequent_kmers(2) => [#<Set: {"AT", "TT"}>, 2]
>> dna1.most_frequent_kmers(3) => [#<Set: {"ATT"}>, 2]
>> dna1.most_frequent_kmers(4) => [#<Set: {"ATTG", "TTGA", "TGAT", "GATT", "ATTC", "TTCC", "TCCG"}>, 1]
上面的代码完美无瑕,但必须有一个更简洁,更简洁的方法来搜索字符串中的子字符串设置长度。
我相信可以使用除法进行一组,但我无法弄明白。
任何帮助都会很棒!
干杯