从自我文本链接获取PRAW subreddit对象到subreddit

时间:2016-06-07 09:12:09

标签: python regex reddit praw

我正在使用PRAW和Python,我希望能够:

  1. 浏览subreddit上的“新”帖子
  2. 检测帖子selftext中是否存在指向subreddit的链接
  3. 如果链接了一个subreddit,请将该subreddit作为将在以后使用的PRAW对象。
  4. 我可以执行第1步,但查找是否有链接的subreddit,然后获取该subreddit对我来说很难。这是我到目前为止所得到的:

    #! python3
    # Reply with subreddit info from subreddit in text body
    
    import praw, time
    
    # Bot login details
    USERNAME = "AutoMobBot";
    PASSWORD = "<redacted>";
    
    UA = "[Subreddit Info Provider (Update 0) by /u/MatthewMob]";
    r = praw.Reddit(UA);
    r.login(USERNAME, PASSWORD, disable_warning=True);
    
    submissions = r.get_subreddit("matthewmob_csstesting").get_new(limit=10);
    
    for submission in submissions:
        for word in submission.selftext.lower().split():
            if word.startswith("/r/"):
                print("Found subreddit in:", submission.title);
                print(submission.selftext_html);
    
    print("Done...");
    input();
    

    这将只是获取提交,分割自我文本中的单词,如果其中一个分割单词以/r/开头,则打印出一些内容,显然如果用户,这将不会一直有效,例如,仅将subreddit链接为r/askredditwww.reddit.com/r/askreddit。即便如此,如果他们将/r/askreddit/top(最后的内容)联系起来,我怎样才能将该subreddit作为PRAW对象?我一直试图找到某种正则表达式代码来帮助我做到这一点,但还没有找到它。

    我的主要问题是从用户自我文本中的链接获取subreddit的最佳方法是什么,我该怎么做?

    如果您需要进一步澄清,我很乐意提供更多信息。

1 个答案:

答案 0 :(得分:0)

我现在找到了自己的答案。以下是适用于我的代码:

#! python3
# Reply with subreddit info from subreddit in text body

import praw, bs4, re
from pprint import pprint

# Bot login details
USERNAME = "AutoMobBot";
PASSWORD = "<Password>";

UA = "[Subreddit Info Provider (Update 4) by /u/MatthewMob]";
r = praw.Reddit(UA);
r.login(USERNAME, PASSWORD, disable_warning=True);

submissions = r.get_subreddit("matthewmob_csstesting").get_new(limit=3);

for submission in submissions:
    subs = [];
    subsfound = -1;
    soup = bs4.BeautifulSoup(submission.selftext_html, "lxml");
    for a in soup.find_all("a", href=True):
        href = a["href"] + "/";
        getsub = re.findall("\/r\/(.*?)\/", href, re.DOTALL);
        if getsub != None:
            if getsub[subsfound] not in subs:
                subs.append(getsub[subsfound]);
                subsfound = subsfound + 1;
                print("\nTitle:", submission.title);
                print("\nSubreddits Found:", subsfound);
                print("\nSubreddit Found:", subs[subsfound] + "\n");

print("Done...");
input();