创建匹配单词,出现的哈希

时间:2015-12-18 17:34:54

标签: ruby hash

我正在开发一个ruby程序,它将带一个字符串并将其与一个"字典"单词并将返回一个哈希,其中匹配的单词和匹配的次数。到目前为止,我能够遍历字符串和数组,它会在找到匹配时返回一个字符串,但我不知道如何使用匹配的单词和匹配项创建一个哈希。这是代码 -

dictionary = ["below","down","go","going","horn","how","howdy","it","i","low","own","part","partner","sit"]

def substrings (string, dictionary)
  dictionary = dictionary
  words = string.split(/\s+/)
  puts words
  x = 0
  while x < words.length do
    y = 0
    while y < dictionary.length do
      if words[x] == dictionary[y] 
      puts "it's working"
    end
    y += 1 
  end   
  x += 1
  end
end

substrings("let's go down below", dictionary)

非常感谢有关如何制作哈希的任何想法,谢谢!

4 个答案:

答案 0 :(得分:3)

这是另一种方法:

def substrings (string, dictionary)
  dictionary.each.with_object({}){|w, h| h[w] = string.scan(/\b#{w}\b/).length}
end

substrings("let's go down below", dictionary)

输出:

{
  "below"   => 1,
  "down"    => 1,
  "go"      => 1,
  "going"   => 0,
  "horn"    => 0,
  "how"     => 0,
  "howdy"   => 0,
  "it"      => 0,
  "i"       => 0,
  "low"     => 0,
  "own"     => 0,
  "part"    => 0,
  "partner" => 0,
  "sit"     => 0
}

答案 1 :(得分:3)

默想:

'b c c d'.split # => ["b", "c", "c", "d"]
'b c c d'.split.group_by{ |w| w } # => {"b"=>["b"], "c"=>["c", "c"], "d"=>["d"]}
'b c c d'.split.group_by{ |w| w }.map{ |k, v| [k, v.count] } # => [["b", 1], ["c", 2], ["d", 1]]
'b c c d'.split.group_by{ |w| w }.map{ |k, v| [k, v.count] }.to_h # => {"b"=>1, "c"=>2, "d"=>1}

由此我们可以建立:

dictionary = ['b', 'c']
word_count = 'b c c d'.split.group_by{ |w| w }.map{ |k, v| [k, v.count] }.to_h
word_count.values_at(*dictionary) # => [1, 2]

如果您只想要字典中的键/值对,则可以轻松完成:

require 'active_support/core_ext/hash/slice'
word_count.slice(*dictionary) # => {"b"=>1, "c"=>2}

group_by是一种非常有用的方法,可以根据您传递给它的任何条件进行分组。 values_at获取“键”列表并返回其对应的值。

在计算“单词”时存在潜在的问题,因为并非所有文本都会导致我们在将其分成组件子字符串后将其视为单词。例如:

'how now brown cow.'.split # => ["how", "now", "brown", "cow."]

请注意,最后一个单词的字符串中包含标点符号。同样,复合词和其他标点可能会导致问题:

'how-now brown, cow.'.split # => ["how-now", "brown,", "cow."]

然后,任务就变成了如何将那些被视为单词的一部分。简单的事就是简单地剥离它们:

'how-now brown, cow.'.gsub(/[^a-z]+/, ' ').split # => ["how", "now", "brown", "cow"]

在今天的疯狂时代,我们也看到包含数字的单词,特别是公司和程序名称等。你可以修改上面gsub中的模式来处理这个问题,但是如何让你弄明白。

我们也看到混合大小写,所以你的字典需要折叠成大写或小写,并且正在考虑的字符串也需要以相同的方式折叠,除非你想要在尊重角色案件时知道不同的数量:

word_count = 'b C c d'.downcase.split.group_by{ |w| w }.map{ |k, v| [k, v.count] }.to_h # => {"b"=>1, "c"=>2, "d"=>1}
word_count = 'b C c d'.split.group_by{ |w| w }.map{ |k, v| [k, v.count] }.to_h # => {"b"=>1, "C"=>1, "c"=>1, "d"=>1}

分析页面内容通常从这种代码开始,但必须编写许多规则来指定什么是有用的单词和什么是垃圾。并且,规则经常从一个源更改为另一个源,因为它们使用单​​词和数字可能会快速破坏代码的有用性:

second
2nd
例如

。它变得“有趣”。

答案 2 :(得分:2)

这样做的一种方法是创建有时被称为“计数哈希”的东西:

h = Hash.new(0)

这里零是“默认值”。这意味着,如果h没有密钥k,则h[k] 返回零(但哈希不会更改)。然后你会有:

h[k] += 1

扩展为:

h[k] = h[k] + 1

如果h有一个键k,右侧的h[k]会有一个值,所以Bob's your uncle。但是,如果h没有键k,则右侧的h[k]设置为默认值,因此表达式变为:

h[k] = 0 + 1

很酷,嗯?

所以对于你的问题你可以写:

dictionary = %w| below down go going horn how howdy it i low own part partner sit |
  #=> ["below", "down", "go", "going", "horn", "how", "howdy", "it", "i",
  #    "low", "own", "part", "partner", "sit"] 
string = "Periscope down, so we can go down, way down, below the surface."

string.delete(',.').split.downcase.each_with_object(Hash.new(0)) { |word,h|
  (h[word] += 1) if dictionary.include?(word) }
  #=> {"down"=>3, "go"=>1, "below"=>1}

您可能还会看到以下内容:

string.delete(',.').downcase.split.each_with_object({}) do |word,h|
  h[word.downcase] = (h[word] || 0) + 1 if dictionary.include?(word) }

因此,如果h没有密钥word,则h[word]将为nil,因此表达式变为:

h[word] = (h[word] || 0) + 1
  #=>   = (nil     || 0) + 1
  #=>   = 0 + 1  

另一种方法是首先计算string中每个单词的实例数,然后查看词典中的哪些单词:

h = string.delete(',.').downcase.split.group_by(&:itself)
  #=> {"periscope"=>["periscope"], "down"=>["down", "down", "down"], "so"=>["so"],
  #    "we"=>["we"], "can"=>["can"], "go"=>["go"], "way"=>["way"], "below"=>["below"],
  #    "the"=>["the", "the"], "surface"=>["surface"]}
h.each_with_object({}) { |(k,v),g| g[k] = v.size if dictionary.include?(k) }
  #=> {"down"=>3, "go"=>1, "below"=>1}

(编辑:请参阅@ theTinMan的回答,了解更好的使用方式Enumerable#group_by)。

答案 3 :(得分:2)

Cary给出的计数Hash的描述之上,您的代码可以稍微修改如下。

dictionary = ["below","down","go","going","horn","how","howdy","it","i","low","own","part","partner","sit"]

def substrings (string, dictionary)

  words = string.split(/\s+/)

  count_hash = Hash.new(0)

  words.each do |sentence_word|
    dictionary.each do |dictionary_word|
        if sentence_word == dictionary_word
            count_hash[sentence_word] += 1
        end
    end   
  end

  return count_hash
end

p substrings("let's go down below", dictionary)

但是,考虑到方法Array#count,我们可以利用它的优势并将上面的代码减少到类似下面的代码。在这个版本中,我们不需要计算哈希。

def substrings (string, dictionary)
  words = string.split(/\s+/)
  count_hash = Hash.new

  dictionary.each do |dictionary_word|
    if (count = words.count(dictionary_word)) > 0
        count_hash[dictionary_word] = count
    end
  end   

  return count_hash
end

您可以参考更多惯用Ruby解决方案的其他答案。如果我不得不捅它,下面就是我的版本

def substrings (string, dictionary)
  words = string.split(/\s+/)
  dictionary.map { |d| [d, words.count(d)] }.to_h.reject  {|_, v| v == 0}
end