目标:
编写一个带有两个参数的函数:(1)表示文本文档的String和(2)提供要返回的项目数的整数。实现该函数,使其返回按字频排序的字符串列表,这是最常出现的字首先。用你最好的判断来决定单词是如何分开的。您的解决方案应该在O(n)时间运行,其中n是文档中的字符数。
我的想法是,在最坏的情况下,函数的输入可能是文档中的单词总数,从而减少了按照频率对单词进行排序的问题。这使我认为如果我使用比较排序方法,时间复杂度的下限将是O(n log n)。所以,我的想法是,最好的方法是实现计数排序。这是我的代码。
我想告诉我,我的分析是否正确,我已经用我对时间复杂度的概念注释了代码,但它肯定是不正确的。这段代码的实际时间和空间复杂度是多少?如果有任何替代方法可以在实践中使用,我还想听听这是否是一个好方法。
### n is number of characters in string, k is number of words ###
def word_frequencies(string, n)
words = string.split(/\s/) # O(n)
max = 0
min = Float::INFINITY
frequencies = words.inject(Hash.new(0)) do |hash,word| # O(k)
occurrences = hash[word] += 1 # O(1)
max = occurrences if occurrences > max # O(1)
min = occurrences if occurrences < min # O(1)
hash; # O(1)
end
### perform a counting sort ###
sorted = Array.new(max + words.length)
delta = 0
frequencies.each do |word, frequency| #O(k)
p word + "--" + frequency.to_s
index = frequency
if sorted[index]
sorted[index] = sorted[index].push(word) # ??? I think O(1).
else
sorted[index] = [word] # O(1)
end
end
return sorted.compact.flatten[-n..-1].reverse
### Compact is O(k). Flatten is O(k). Reverse is O(k). So O(3k)
end
### Total --- O(n + 5k) = O(n). Correct?
### And the space complexity is O(n) for the hash + O(2k) for the sorted array.
### So total O(n).
text = "hi hello hi my name is what what hi hello hi this is a test test test test hi hi hi what hello these are some words these these"
p word_frequencies(text, 4)
答案 0 :(得分:3)
两种方式:
def word_counter(string, max)
string.split(/\s+/)
.group_by{|x|x}
.map{|x,y|[x,y.size]}
.sort_by{|_,size| size} # Have to sort =/
.last(max)
end
def word_counter(string, max)
# Create a Hash and a List to store values in.
word_counter, max_storage = Hash.new(0), []
#Split the string an and add each word to the hash:
string.split(/\s+/).each{|word| word_counter[word] += 1}
# Take each word and add it to the list (so that the list_index = word_count)
# I also add the count, but that is not really needed
word_counter.each{|key, val| max_storage[val] = [*max_storage[val]] << [key, val]}
# Higher count will always be at the end, remove nils and get the last "max" elements.
max_storage.compact.flatten(1).last(max)
end
答案 1 :(得分:2)
一个想法是:
该算法的顺序应为O(f),其中f是任何单词的最大频率。任何单词的最大频率最多为n,其中n是所需字符数。
答案 2 :(得分:1)
示例,快捷方式:)
#assuming you read from the file and get it to a string called str
h = {}
arr = str.split("\n")
arr.each do |i|
i.split(" ").each do |w|
if h.has_key[w]
h[w] += 1
else
h[w] = 1
end
end
end
Hash[h.sort_by{|k, v| v}.reverse]
这有效,但可以改进。