我一直在使用Python Reddit API Wrapper(PRAW)从Reddit收集特定的评论,而我常用的一个功能是replace_more_comments()
来收集所有评论一个帖子。
其中一些线程非常大 - 例如10,000条评论 - 需要一段时间来收集所有评论。有没有办法显示replace_more_comments()
的进度条?
这是一个最小的工作代码示例:
import praw
r = praw.Reddit('MSU vs Nebraska game')
submission = r.get_submission(submission_id='3rxx3y')
flat_comments = praw.helpers.flatten_tree(submission.comments)
submission.replace_more_comments(limit=None, threshold=0)
all_comments = submission.comments
flat_comments = praw.helpers.flatten_tree(submission.comments)
答案 0 :(得分:0)
replace_more_comments
的内置实现不支持此功能,但您可以编写自己的版本。供参考,here's the original implementation。
我不知道如何绘制实际进度条;你必须写update_progress_bar
。我还没有测试过这段代码,它可能根本不起作用。
def replace_more_comments(self, post):
"""Update the comment tree by replacing instances of MoreComments."""
if post._replaced_more:
return
more_comments = post._extract_more_comments(comment.comments)
# Estimate the total number of comments
count = 0
for item in more_comments:
count += item.count
update_progress_bar(0, count)
num_loaded = 0
while more_comments:
item = heappop(more_comments)
# Fetch new comments and decrease remaining if a request was made
new_comments = item.comments(update=False)
elif new_comments is None:
continue
# Re-add new MoreComment objects to the heap of more_comments
for more in self._extract_more_comments(new_comments):
more._update_submission(post) # pylint: disable=W0212
heappush(more_comments, more)
# Increase progress bar
num_loaded += len(new_comments)
update_progress_bar(num_loaded, count)
# Insert the new comments into the tree
for comment in new_comments:
post._insert_comment(comment)
post._replaced_more = True