我试图在一个用例中使用生成器,其中我们必须跟踪字符串流中的k个“最大”元素。我想要做的是将元素添加到列表中,直到它们到达元素k,然后进行堆化,然后使用元素逐一维护堆从那里继续流。我对使用发电机有点陌生,因此感谢您的帮助
def my_generator(stream):
for string in stream:
yield string
def top_k(k,stream):
count = 0
min_heap = []
for string in stream:
if count >= k:
break
min_heap.append((len(string),string))
count += 1
print(min_heap)
heapq.heapify(min_heap)
for string in stream:
heapq.heappushpop(min_heap,(len(string),string))
return heapq.nsmallest(k,min_heap)
strings = ["This", "whatis", "going", "in"]
stream = my_generator(strings)
output = top_k(2,stream)
print(output)
答案 0 :(得分:2)
您的断点和随后的流恢复会导致元素“丢失”到空白处。
这是您的代码,但又不会丢失任何元素:
def top_k(k, stream):
min_heap = []
# loop over k instead of stream
for _ in range(k):
string = next(stream) # get the next item
min_heap.append((len(string), string))
print(min_heap) # debug
heapq.heapify(min_heap)
# here we finish all of what's left in stream
for string in stream:
heapq.heappushpop(min_heap, (len(string), string))
return heapq.nsmallest(k, min_heap)