我一直在努力学习一些python,我试图创建一个小程序,要求用户提供subreddit,然后打印所有头版标题和文章链接,这里是代码
import requests
from bs4 import BeautifulSoup
subreddit = input('Type de subreddit you want to see : ')
link_visit = f'https://www.reddit.com/r/{subreddit}/'
print(link_visit)
base_url = link_visit
r = requests.get(base_url)
soup = BeautifulSoup(r.text, 'html.parser')
for article in soup.find_all('div', class_='top-matter'):
headline = article.find('p', class_='title')
print('HeadLine : ' , headline.text )
a = headline.find('a', href=True)
link = a['href'].split('/domain')
print('Link : ' , link[0])
我的问题是,有时它打印所需的结果,有时它什么都不做,只要求用户输入subrredit并打印链接到所述subreddit。
有人可以解释为什么会这样吗?
答案 0 :(得分:0)
您的请求被reddit拒绝,以节省资源。
检测到失败的案例时,请打印出HTML。我想你会看到这样的事情:
<h1>whoa there, pardner!</h1>
<p>we're sorry, but you appear to be a bot and we've seen too many requests
from you lately. we enforce a hard speed limit on requests that appear to come
from bots to prevent abuse.</p>
<p>if you are not a bot but are spoofing one via your browser's user agent
string: please change your user agent string to avoid seeing this message
again.</p>
<p>please wait 3 second(s) and try again.</p>
<p>as a reminder to developers, we recommend that clients make no
more than <a href="http://github.com/reddit/reddit/wiki/API">one
request every two seconds</a> to avoid seeing this message.</p>