所以,让我们说我想从subreddit" news"中传播帖子。但是这些帖子非常频繁,我们不能说每个帖子都值得。所以我想通过尝试流式传输" hot"来过滤好帖子。名单。但我不确定这是否可能,或类似的事情是否可能
通常,这是我发布的帖子:
for submission in subreddit.stream.submissions():
if not submission.stickied:
print(str(submission.title) + " " + str(submission.url) + "\n")
这会过滤帖子,但不会流式传输:
for submission in subreddit.hot(limit=10):
print(str(submission.title) + " " + str(submission.url) + "\n")
那么,关于如何同时传输和过滤帖子的任何想法?
谢谢
答案 0 :(得分:2)
流媒体热帖是一种不协调的想法。
PRAW中的流的目的是在提交给Reddit之后立即(几乎)获得每个帖子或评论。另一方面,热门列表包含被认为当前有趣的项目,按分数排序,分数与分数按年龄分开。
然而,帖子很频繁,我们不能说每个帖子都值得。
因为Reddit用户需要花时间查看帖子并对其进行投票,所以评估帖子在发布后是否值得(以分数衡量)是没有多大意义的。
如果您的目标是对每个帖子执行一些操作,使其成为subreddit的顶部 n ,您可以按特定时间间隔检查首页,对任何帖子执行操作还没有见过。举个例子:
import praw
import time
reddit = praw.Reddit() # must be edited to properly authenticate
subreddit = reddit.subreddit('news')
seen_submissions = set()
while True:
for submission in subreddit.hot(limit=10):
if submission.fullname not in seen_submissions:
seen_submissions.add(submission.fullname)
print('{} {}\n'.format(submission.title, submission.url))
time.sleep(60) # sleep for a minute (60 seconds)
答案 1 :(得分:1)
要添加到jarhill0的答案中,您还可以通过在参数中指定“后”来对页面进行分页。
import praw
import time
reddit = praw.Reddit() # must be edited to properly authenticate
subreddit = reddit.subreddit('news')
seen_submissions = set()
while True:
params = None
for _ in range(10):# get first 10 pages of 'hot'.
for submission in subreddit.hot(limit=10, params=params):
if submission.fullname not in seen_submissions:
seen_submissions.add(submission.fullname)
print('{} {}\n'.format(submission.title, submission.url))
params = {"after": submission.fullname}
time.sleep(60) # sleep for a minute (60 seconds)