我正在使用PRAW和Python,我希望能够:
我可以执行第1步,但查找是否有链接的subreddit,然后获取该subreddit对我来说很难。这是我到目前为止所得到的:
#! python3
# Reply with subreddit info from subreddit in text body
import praw, time
# Bot login details
USERNAME = "AutoMobBot";
PASSWORD = "<redacted>";
UA = "[Subreddit Info Provider (Update 0) by /u/MatthewMob]";
r = praw.Reddit(UA);
r.login(USERNAME, PASSWORD, disable_warning=True);
submissions = r.get_subreddit("matthewmob_csstesting").get_new(limit=10);
for submission in submissions:
for word in submission.selftext.lower().split():
if word.startswith("/r/"):
print("Found subreddit in:", submission.title);
print(submission.selftext_html);
print("Done...");
input();
这将只是获取提交,分割自我文本中的单词,如果其中一个分割单词以/r/
开头,则打印出一些内容,显然如果用户,这将不会一直有效,例如,仅将subreddit链接为r/askreddit
或www.reddit.com/r/askreddit
。即便如此,如果他们将/r/askreddit/top
(最后的内容)联系起来,我怎样才能将该subreddit作为PRAW对象?我一直试图找到某种正则表达式代码来帮助我做到这一点,但还没有找到它。
我的主要问题是从用户自我文本中的链接获取subreddit的最佳方法是什么,我该怎么做?
如果您需要进一步澄清,我很乐意提供更多信息。
答案 0 :(得分:0)
我现在找到了自己的答案。以下是适用于我的代码:
#! python3
# Reply with subreddit info from subreddit in text body
import praw, bs4, re
from pprint import pprint
# Bot login details
USERNAME = "AutoMobBot";
PASSWORD = "<Password>";
UA = "[Subreddit Info Provider (Update 4) by /u/MatthewMob]";
r = praw.Reddit(UA);
r.login(USERNAME, PASSWORD, disable_warning=True);
submissions = r.get_subreddit("matthewmob_csstesting").get_new(limit=3);
for submission in submissions:
subs = [];
subsfound = -1;
soup = bs4.BeautifulSoup(submission.selftext_html, "lxml");
for a in soup.find_all("a", href=True):
href = a["href"] + "/";
getsub = re.findall("\/r\/(.*?)\/", href, re.DOTALL);
if getsub != None:
if getsub[subsfound] not in subs:
subs.append(getsub[subsfound]);
subsfound = subsfound + 1;
print("\nTitle:", submission.title);
print("\nSubreddits Found:", subsfound);
print("\nSubreddit Found:", subs[subsfound] + "\n");
print("Done...");
input();