我是Ruby新手并尝试编写一个方法,该方法将返回字符串中最常见单词的数组。如果有一个具有高计数的单词,则应返回该单词。如果高计数绑定了两个单词,则两者都应以数组形式返回。
问题在于,当我通过第二个字符串时,代码只计算"单词"两次而不是三次。当第三个字符串通过时,它返回"它"计数为2,这没有任何意义,因为"它"应该有一个计数。
def most_common(string)
counts = {}
words = string.downcase.tr(",.?!",'').split(' ')
words.uniq.each do |word|
counts[word] = 0
end
words.each do |word|
counts[word] = string.scan(word).count
end
max_quantity = counts.values.max
max_words = counts.select { |k, v| v == max_quantity }.keys
puts max_words
end
most_common('a short list of words with some words') #['words']
most_common('Words in a short, short words, lists of words!') #['words']
most_common('a short list of words with some short words in it') #['words', 'short']
答案 0 :(得分:5)
计算单词实例的方法是你的问题。 it
位于with
,因此需要重复计算。
[1] pry(main)> 'with some words in it'.scan('it')
=> ["it", "it"]
虽然可以更轻松地完成,但您可以使用each_with_object
调用按值的实例数对数组的内容进行分组,如下所示:
counts = words.each_with_object(Hash.new(0)) { |e, h| h[e] += 1 }
这将遍历数组中的每个条目,并为散列中每个单词条目的值加1。
所以以下内容适合您:
def most_common(string)
words = string.downcase.tr(",.?!",'').split(' ')
counts = words.each_with_object(Hash.new(0)) { |e, h| h[e] += 1 }
max_quantity = counts.values.max
counts.select { |k, v| v == max_quantity }.keys
end
p most_common('a short list of words with some words') #['words']
p most_common('Words in a short, short words, lists of words!') #['words']
p most_common('a short list of words with some short words in it') #['words', 'short']
答案 1 :(得分:3)
当Nick回答你的问题时,我会建议另一种方法。作为"高计数"是模糊的,我建议你返回一个带有低位词和他们各自计数的哈希。从Ruby 1.9开始,哈希保留了输入键值对的顺序,因此我们可能希望利用它并使用按值递减顺序排序的键值对返回哈希值。
<强>代码强>
def words_by_count(str)
str.gsub(/./) do |c|
case c
when /\w/ then c.downcase
when /\s/ then c
else ''
end
end.split
.group_by {|w| w}
.map {|k,v| [k,v.size]}
.sort_by(&:last)
.reverse
.to_h
end
words_by_count('Words in a short, short words, lists of words!')
方法Array#h是在Ruby 2.1中引入的。对于早期的Ruby版本,必须使用:
Hash[str.gsub(/./)... .reverse]
示例强>
words_by_count('a short list of words with some words')
#=> {"words"=>2, "of"=>1, "some"=>1, "with"=>1,
# "list"=>1, "short"=>1, "a"=>1}
words_by_count('Words in a short, short words, lists of words!')
#=> {"words"=>3, "short"=>2, "lists"=>1, "a"=>1, "in"=>1, "of"=>1}
words_by_count('a short list of words with some short words in it')
#=> {"words"=>2, "short"=>2, "it"=>1, "with"=>1,
# "some"=>1, "of"=>1, "list"=>1, "in"=>1, "a"=>1}
<强>解释强>
以下是第二个例子中发生的事情,其中:
str = 'Words in a short, short words, lists of words!'
str.gsub(/./) do |c|...
匹配字符串中的每个字符,并将其发送到块以决定如何处理它。正如您所看到的,单词字符是向下的,空格是单独的,其他所有内容都被转换为空格。
s = str.gsub(/./) do |c|
case c
when /\w/ then c.downcase
when /\s/ then c
else ''
end
end
#=> "words in a short short words lists of words"
接下来是
a = s.split
#=> ["words", "in", "a", "short", "short", "words", "lists", "of", "words"]
h = a.group_by {|w| w}
#=> {"words"=>["words", "words", "words"], "in"=>["in"], "a"=>["a"],
# "short"=>["short", "short"], "lists"=>["lists"], "of"=>["of"]}
b = h.map {|k,v| [k,v.size]}
#=> [["words", 3], ["in", 1], ["a", 1], ["short", 2], ["lists", 1], ["of", 1]]
c = b.sort_by(&:last)
#=> [["of", 1], ["in", 1], ["a", 1], ["lists", 1], ["short", 2], ["words", 3]]
d = c.reverse
#=> [["words", 3], ["short", 2], ["lists", 1], ["a", 1], ["in", 1], ["of", 1]]
d.to_h # or Hash[d]
#=> {"words"=>3, "short"=>2, "lists"=>1, "a"=>1, "in"=>1, "of"=>1}
请注意,c = b.sort_by(&:last)
,d = c.reverse
可以替换为:
d = b.sort_by { |_,k| -k }
#=> [["words", 3], ["short", 2], ["a", 1], ["in", 1], ["lists", 1], ["of", 1]]
但sort
后跟reverse
通常会更快。
答案 2 :(得分:1)
def count_words string
word_list = Hash.new(0)
words = string.downcase.delete(',.?!').split
words.map { |word| word_list[word] += 1 }
word_list
end
def most_common_words string
hash = count_words string
max_value = hash.values.max
hash.select { |k, v| v == max_value }.keys
end
most_common 'a short list of words with some words'
#=> ["words"]
most_common 'Words in a short, short words, lists of words!'
#=> ["words"]
most_common 'a short list of words with some short words in it'
#=> ["short", "words"]
答案 3 :(得分:1)
假设 string 是一个包含多个单词的字符串。
words = string.split(/[.!?,\s]/)
words.sort_by{|x|words.count(x)}
这里我们将字符串拆分并添加到数组中。然后我们根据单词的数量对数组进行排序。最常见的单词将出现在最后。
答案 4 :(得分:0)
同样的事情也可以通过以下方式完成:
def most_common(string)
counts = Hash.new 0
string.downcase.tr(",.?!",'').split(' ').each{|word| counts[word] += 1}
# For "Words in a short, short words, lists of words!"
# counts ---> {"words"=>3, "in"=>1, "a"=>1, "short"=>2, "lists"=>1, "of"=>1}
max_value = counts.values.max
#max_value ---> 3
return counts.select{|key , value| value == counts.values.max}
#returns ---> {"words"=>3}
end
这只是一个较短的解决方案,您可能想要使用它。希望它有所帮助:)
答案 5 :(得分:0)
这是程序员喜欢的问题,不是吗:)功能方法怎么样?
# returns array of words after removing certain English punctuations
def english_words(str)
str.downcase.delete(',.?!').split
end
# returns hash mapping element to count
def element_counts(ary)
ary.group_by { |e| e }.inject({}) { |a, e| a.merge(e[0] => e[1].size) }
end
def most_common(ary)
ary.empty? ? nil :
element_counts(ary)
.group_by { |k, v| v }
.sort
.last[1]
.map(&:first)
end
most_common(english_words('a short list of words with some short words in it'))
#=> ["short", "words"]
答案 6 :(得分:0)
def firstRepeatedWord(string)
h_data = Hash.new(0)
string.split(" ").each{|x| h_data[x] +=1}
h_data.key(h_data.values.max)
end
答案 7 :(得分:0)
def common(string)
counts=Hash.new(0)
words=string.downcase.delete('.,!?').split(" ")
words.each {|k| counts[k]+=1}
p counts.sort.reverse[0]
end