Imgur链接来自Python中的subreddits

时间:2017-06-21 16:40:21

标签: python web-scraping href reddit imgur

到目前为止,我的代码成功设法从给定subreddit名称时获取的5个结果中提取HTML代码。现在我想搜索imgur链接,无论是用于包含/ a /或单个图像的专辑。我想提起这个链接并将其发送到另一个类(imgurdl)

根据我目前的代码,最好的方法是什么?

from bs4 import BeautifulSoup
import praw
from urllib2 import urlopen
import urllib2
import sys
from urlparse import urljoin
import config
import imgurdl
import requests

cache = []
soup = BeautifulSoup
def reddit_login():
    r = praw.Reddit(username = USER,
                password = config.password,
                client_id = config.client_id,
                client_secret = config.client_secret,
                user_agent = " v0.3"
                )
    print("***********logged in successfully***********")
    return r

def get_category_links(subredditName, r):
    print("Grabbing subreddit...")
    submissions = r.subreddit(subredditName).hot(limit=5)
    print("Grabbing comments...")
    #comments = subred.comments(limit = 200)
    for submission in submissions:
        htmlSource = requests.get(submission.url).text
        print (htmlSource)


r = reddit_login()
get_category_links(sys.argv[1], r) 

1 个答案:

答案 0 :(得分:0)

您可以从PRAW获取网址,然后检查它是否来自循环内部的imgur,然后将其发送到相应的函数。这样就不需要通过html源了。

 for submission in submissions:
    link = submission.url
    if "imgur.com/a/" in link:
        #Send to imgur album downloader
    elif link.endswith(".jpg") or link.endswith(".png"):
        #Sent to image downloader
    elif "imgur.com/" in link:
        #Send to single image imgur downloader