Python BeautifulSoup结果不一致

时间:2017-11-21 21:23:01

标签: python

我一直在努力学习一些python,我试图创建一个小程序,要求用户提供subreddit,然后打印所有头版标题和文章链接,这里是代码

import requests
from bs4 import BeautifulSoup

subreddit = input('Type de subreddit you want to see : ')
link_visit = f'https://www.reddit.com/r/{subreddit}/'
print(link_visit)

base_url = link_visit
r = requests.get(base_url)
soup = BeautifulSoup(r.text, 'html.parser')

for article in soup.find_all('div', class_='top-matter'):

   headline = article.find('p', class_='title')
   print('HeadLine : ' , headline.text )

   a = headline.find('a', href=True)
   link = a['href'].split('/domain')
   print('Link : ' , link[0])

我的问题是,有时它打印所需的结果,有时它什么都不做,只要求用户输入subrredit并打印链接到所述subreddit。

有人可以解释为什么会这样吗?

1 个答案:

答案 0 :(得分:0)

您的请求被reddit拒绝,以节省资源。

检测到失败的案例时,请打印出HTML。我想你会看到这样的事情:

    <h1>whoa there, pardner!</h1>



<p>we're sorry, but you appear to be a bot and we've seen too many requests
from you lately. we enforce a hard speed limit on requests that appear to come
from bots to prevent abuse.</p>

<p>if you are not a bot but are spoofing one via your browser's user agent
string: please change your user agent string to avoid seeing this message
again.</p>

<p>please wait 3 second(s) and try again.</p>

    <p>as a reminder to developers, we recommend that clients make no
    more than <a href="http://github.com/reddit/reddit/wiki/API">one
    request every two seconds</a> to avoid seeing this message.</p>