Question

我正在尝试将最高计数存储到变量中。

当我遍历我的数组时它显示了正确的计数，但是对高计数变量的赋值似乎是数组中检查的最后一项的计数。

def calculate_word_frequency(content, line_number)
  looper = 0
  wordCounter = ""
  #CREATE AN ARRAY FROM EACH LINE
  myArray = content.split
  #LOOP THROUGH ARRAY COUNTING INSTANCES OF WORDS
  while looper < myArray.length 
    p myArray[looper]
    wordCounter = myArray[looper]
    puts myArray.count(wordCounter)
    if highest_wf_count  < myArray.count
      highest_wf_count = myArray.count
    end
    looper +=1
  end
  puts highest_wf_count
end

Answer 1

如何计算某些事物的频率并获得最大值，就是Stack Overflow。

我这样做：

def word_frequency(content)
  content 
  .split 
  .each_with_object(
    Hash.new { |h, k| h[k] = 0 }
  ) { |w, h|
    h[w] += 1 
  }
end

def max_frequency(content)
  word_frequency(content)
  .max_by{ |k, v| v }
end

word_frequency('a') # => {"a"=>1}
word_frequency('a b') # => {"a"=>1, "b"=>1}
word_frequency('a b a') # => {"a"=>2, "b"=>1}
word_frequency('a b a c a b') # => {"a"=>3, "b"=>2, "c"=>1}

max_frequency('a b a c a b') # => ["a", 3]

我使用的是基本的split，它只会在空格上分割。

'a b'.split # => ["a", "b"]
'a. b'.split # => ["a.", "b"]

这很天真，只能在空白处返回，而不是真正的单词。关于如何改进SO的结果有很多问题。

each_with_object与inject类似，只是更方便。它会成为你的朋友。

max_by与max类似，但在处理您需要深入研究的复杂对象时更方便/更快，以获得您正在比较的值。

要做的事情：

将代码缩减为更小的块。这对于调试和测试/维护非常重要。
很好地了解核心库，尤其是Enumerable，String，IO和File。如果你进行一般编程，你将比Ruby中的任何其他类/模块更多地使用它们。

Answer 2

仔细看看这两行：

puts myArray.count(wordCounter)

highest_wf_count = myArray.count

myArray.count(...)调用一个方法count(something)来计算与给定的'某事'相等的项目。
myArray.count是一个返回myArray项目数量的属性。

很可能你想要调用第一个，然后检查它，比较并从这些值中收集最大值，如：

countingresult = myArray.count(wordCounter)
puts countingresult

if highest_wf_count  < countingresult
  highest_wf_count = countingresult
end

正如你现在所做的那样，compare-and-gather-max会查看数组的恒定长度。

我没有进一步分析你的算法。请解决这个问题，如果您需要更多帮助，请努力遵守https://stackoverflow.com/help/mcve - 特别是，描述预期的输入/输出

顺便说一句。我刚刚注意到wordCounter 真正是什么。相信我，我需要三次重新理解才能理解。该变量的名称确实具有误导性。当你进行一些清理工作时，请将其更改为“currentWord”或“nextWordToCheck”等。

Answer 3

如果要查找字符串中出现的最高数字，可以尝试类似

的内容

def calculate_word_frequency(content)
  frequencies = content.split(/\s/).each_with_object(Hash.new(0)) do |word, counts|
    counts[word] += 1
  end
  sorted = frequencies.to_a.sort do |(_, count_a), (_, count_b)|
    count_b <=> count_a
  end
  max_word_and_count = sorted.first
  max_word_and_count.last
end

或不需要排序的缩短版本（如果您真的只对最大计数感兴趣）：

def calculate_word_frequency(content)
  max = 0
  frequencies = content.split(/\s/).each_with_object(Hash.new(0)) do |word, counts|
    count = counts[word] += 1
    max = count > max ? count : max
  end
  max
end

Answer 4

您的问题已得到解答，因此我想建议使用Enumerable#group_by的替代方法，这取决于所需的信息。

str = "Bill thought the other Bill should pay the bill or Sue should pay the bill"

只是最高频率

如果您只想要出现最多次数的单词的频率，您可以编写以下内容。

def calculate_word_frequency(content)
  content.split.
          group_by(&:itself).
          map { |_, arr| arr.size }.
          max
end

calculate_word_frequency str
  #=> 3

Object#itself是在Ruby v2.2中引入的。对于早期版本，请将group_by(&:itself)替换为group_by { |e| e }。

请注意，content.split与content.split /\s+/具有相同的效果。

最高频率及其频率的字

此外，如果您想知道哪个词的频率最高，请按照以下步骤修改上述内容。

def calculate_word_frequency(content)
  content.split.
          group_by(&:itself).
          map { |word, arr| [word, arr.size] }.
          max_by(&:last)
  end

calculate_word_frequency str
  # => ["the", 3]

案件漠不关心

如果您希望将“Bill”和“bill”视为同一个字词，请将content.split更改为content.downcase.split或按以下方式修改上述内容。

def calculate_word_frequency(content)
  content.split.
          group_by { |word| word.downcase }.
          map { |word, arr| [word, arr.size] }.
          max_by(&:last)
  end

calculate_word_frequency str
  #=> ["bill", 4]

忽视标点符号

如果您想忽略标点符号，请先按照以下步骤操作。

def calculate_word_frequency(content)
  content.delete(".,:;'\"?!").
          downcase.
          split.
          group_by(&:itself).
          map { |word, arr| [word, arr.size] }.
          max_by(&:last)
  end

str = "Bill said \"Bill, pay the bill!\" Bif said 'Sue' should've payed the bill."  
calculate_word_frequency str
  #=> ["bill", 4]

为什么我为变量分配了错误的计数？

4 个答案: