Question

我有一个字数哈希，如下所示：

words = {
  "love"   => 10,
  "hate"   => 12,
  "lovely" => 3,
  "loving" => 2,
  "loved"  => 1, 
  "peace"  => 14,
  "thanks" => 3,
  "wonderful" => 10,
  "grateful" => 10
  # there are more but you get the idea
}

我想确保＆＃34;爱＆＃34;，＆＃34;爱＆＃34; ＆安培; ＆＃34;爱＆＃34;都被算作＆＃34;爱＆＃34;。因此，我将他们所有的计数加在一起作为＆＃34; love＆＃34;的计数，并删除＆＃34; love＆＃34;的其余变化。然而，与此同时，我并不想要＆＃34;可爱的＆＃34;算作＆＃34;爱＆＃34;，所以我保留它原样。

所以我最终会得到这样的东西。

words = [
  "love"   => 13,
  "hate"   => 12,
  "lovely" => 3,
  "peace"  => 14,
  "thanks" => 3,
  "wonderful" => 10,
  "grateful" => 10
  # there are more but you get the idea
]

我有一些代码可行，但我认为最后一行的逻辑确实是错误的。我想知道你是否可以帮助我解决这个问题或建议一个更好的方法。

words.select { |k| /\Alov[a-z]*/.match(k) }
words["love"] = purgedWordCount.select { |k| /\Alov[a-z]*/.match(k) }.map(&:last).reduce(:+) - 1 # that 1 is for the 1 for "lovely"; I tried not to hard code it by using words["lovely"], but it messed things up completely, so I had to do this. 
words.delete_if { |k| /\Alov[a-z]*/.match(k) && k != "love" && k != "lovely" }

谢谢！

Answer 1

我建议如下：

r = /
    lov     # match 'lov'
    (?!ely) # negative lookahead to not match 'ely'
    [a-z]+  # match one or more letters
            # /x is for 'extended', /i makes it case-independent
    /xi

words.each_with_object(Hash.new(0)) { |(k,v),h| (k=~r) ? h["love"]+=v : h[k]=v }
  #=> {"love"=>13, "hate"=>12, "lovely"=>3, "peace"=>14, "thanks"=>3,
  #    "wonderful"=>10, "grateful"=>10}

Answer 2

以下是功能性非破坏性版本

words = {
  "love"   => 10,
  "hate"   => 12,
  "lovely" => 3,
  "loving" => 2,
  "loved"  => 1, 
  "peace"  => 14,
  "thanks" => 3,
  "wonderful" => 10,
  "grateful" => 10
}

to_love_or_not_to_love = words.partition {|w| w.first =~ /^lov/ && w.first != "lovely"}

{"love" => to_love_or_not_to_love.first.map(&:last).sum}.merge(to_love_or_not_to_love.last.reduce({}) {|m, e| m[e.first] = e.last; m})

=＆GT; {＆＃34; love＆＃34; =＆gt; 13，＆＃34;讨厌＆＃34; =＆gt; 12，＆＃34;可爱＆＃34; =＆gt; 3，＆＃34;和平＆＃34; =＆gt; ; 14，＆＃34;谢谢＆＃34; =＆gt; 3，＆＃34;精彩＆＃34; =＆gt; 10，＆＃34;感激＆＃34; =＆gt; 10

Answer 3

words = {
  "love"   => 10,
  "hate"   => 12,
  "lovely" => 3,
  "loving" => 2,
  "loved"  => 1,
  "peace"  => 14,
  "thanks" => 3,
  "wonderful" => 10,
  "grateful" => 10
  # there are more but you get the idea
}

aggregated_words = words.inject({}) do |memo, (word, count)|
  key = word =~ /\Alov.+/ && word != "lovely" ? "love" : word
  memo[key] = memo[key].to_i + count
  memo
end

> {"love"=>13, "hate"=>12, "lovely"=>3, "peace"=>14, "thanks"=>3, "wonderful"=>10, "grateful"=>10}

Answer 4

我认为，如果你正在处理足够大的词汇量，那么你真正需要的是一个词干分析器，而不仅仅是一个正则表达式。制作干的哈希将是简单而优雅的解决方案。

简单的英语here，但是有很多宝石用于此目的和不同的语言。

Ruby＆amp;文本挖掘清理正则表达式

4 个答案: