到目前为止,我的代码成功设法从给定subreddit名称时获取的5个结果中提取HTML代码。现在我想搜索imgur链接,无论是用于包含/ a /或单个图像的专辑。我想提起这个链接并将其发送到另一个类(imgurdl)
根据我目前的代码,最好的方法是什么?
from bs4 import BeautifulSoup
import praw
from urllib2 import urlopen
import urllib2
import sys
from urlparse import urljoin
import config
import imgurdl
import requests
cache = []
soup = BeautifulSoup
def reddit_login():
r = praw.Reddit(username = USER,
password = config.password,
client_id = config.client_id,
client_secret = config.client_secret,
user_agent = " v0.3"
)
print("***********logged in successfully***********")
return r
def get_category_links(subredditName, r):
print("Grabbing subreddit...")
submissions = r.subreddit(subredditName).hot(limit=5)
print("Grabbing comments...")
#comments = subred.comments(limit = 200)
for submission in submissions:
htmlSource = requests.get(submission.url).text
print (htmlSource)
r = reddit_login()
get_category_links(sys.argv[1], r)
答案 0 :(得分:0)
您可以从PRAW获取网址,然后检查它是否来自循环内部的imgur,然后将其发送到相应的函数。这样就不需要通过html源了。
for submission in submissions:
link = submission.url
if "imgur.com/a/" in link:
#Send to imgur album downloader
elif link.endswith(".jpg") or link.endswith(".png"):
#Sent to image downloader
elif "imgur.com/" in link:
#Send to single image imgur downloader