PRAW / Tweepy过滤关键字

时间:2019-11-11 10:10:52

标签: python python-3.x ubuntu tweepy praw

因此,我在过滤Praw的结果时遇到了一些问题。我想在结果中排​​除诸如[[request],[off topic]或[nsfw])之类的关键字。我不希望在prawy上发布praw的结果中包含的此类帖子。我正在寻找文档,但在PRAW网站上找不到任何内容。

这是我的代码:

def poster():
conn = sqlite3.connect('jb_id.db')
c = conn.cursor()
toTweet = []
for submission in reddit.subreddit(SUB).hot(limit=POST_LIMIT):
    if not submission.stickied and len(submission.title) < 255:    
        url = submission.shortlink
        title = submission.title
        udate = time.strftime("%Y-%m-%d %X",time.gmtime(submission.created_utc))

        try:
            # This keeps a record of the posts in a the database
            c.execute("INSERT INTO posts (id, title, udate) VALUES (?, ?, ?)",
            (url, title, udate))
            conn.commit()


            message = title + " " + url
            print(message)
            toTweet.append(message)

        except sqlite3.IntegrityError:
            # This means the post was already tweeted and is ignored
            print("Duplicate", url)

c.close()
conn.close()
tweeter(toTweet)

如您在此处看到的,我排除了超过255个字符的标题和标题。我想知道是否有一种方法可以过滤上面我在praw的结果中提到的关键字在reddit上的帖子。谢谢!

1 个答案:

答案 0 :(得分:0)

列出不应包含在提交标题中的关键字列表

bad_keywords = "[request]", "[off topic]", "[nsfw]"

如果提交的标题包含列表中的项目,请跳过循环

title_lowercase = submission.title.lower()
if any(x in title_lowercase for x in bad_keywords):
    continue

我会将其与您的其他排除项结合使用,以减少缩进并使其更具可读性

bad_title = any(x in title_lowercase for x in bad_keywords)
skip_submission = submission.stickied and len(submission.title) > 255 and bad_title
if skip_submission:
    continue

完整的解决方案

def poster():
conn = sqlite3.connect('jb_id.db')
c = conn.cursor()
toTweet = []

bad_keywords = "[request]", "[off topic]", "[nsfw]"

for submission in reddit.subreddit(SUB).hot(limit=POST_LIMIT):
    title = submission.title
    title_lowercase = title.lower()

    bad_title = any(x in title_lowercase for x in bad_keywords)
    skip_submission = submission.stickied and len(submission.title) > 255 and bad_title

    if skip_submission:
        continue

    url = submission.shortlink
    udate = time.strftime("%Y-%m-%d %X",time.gmtime(submission.created_utc))

    try:
        # This keeps a record of the posts in a the database
        c.execute("INSERT INTO posts (id, title, udate) VALUES (?, ?, ?)",
        (url, title, udate))
        conn.commit()


        message = title + " " + url
        print(message)
        toTweet.append(message)

    except sqlite3.IntegrityError:
        # This means the post was already tweeted and is ignored
        print("Duplicate", url)

c.close()
conn.close()
tweeter(toTweet)