我正在尝试使用PRAW抓取Reddit,并且它始终抛出prawcore.exceptions.BadRequest: received 400 HTTP response
错误。
在Jupyter Notebook中进行实验时,我设法创建了整个功能管道,从Reddit检索数据绝对没有问题。仅当我尝试使用终端将代码作为脚本运行时,才会出现此问题。最初,我认为问题与笔记本(v3.6.5)和虚拟环境(v3.7.1)中的不同Python版本有关。但是,即使我将环境切换到3.6.5,该错误仍然存在。
当我使用嵌套的Reddit
循环测试它们的输出时,实例化Subreddit
对象,Submission
对象,Comment
对象和for
对象没有问题。就我的数据管道而言,我有一堆函数以相似的嵌套模式相互调用。尽管如此,即使我调用函数的方式在结构上类似于嵌套循环,它仍然会与生成器有关。
这是终端输出:
Traceback (most recent call last):
File "run_reddit_scraper.py", line 295, in <module>
reddit_id = process_reddit(reddit, SUBREDDIT_NAMES)
File "run_reddit_scraper.py", line 200, in process_reddit
subreddits_pk, subreddit_count = process_subreddits(reddit, subreddit_names)
File "run_reddit_scraper.py", line 165, in process_subreddits
submissions_pk, submission_count = process_submissions(subreddit)
File "run_reddit_scraper.py", line 119, in process_submissions
for submission in top_submissions:
File "/Users/nicktheodore/reddit-scraper/env/lib/python3.6/site-packages/praw/models/listing/generator.py", line 52, in __next__
self._next_batch()
File "/Users/nicktheodore/reddit-scraper/env/lib/python3.6/site-packages/praw/models/listing/generator.py", line 62, in _next_batch
self._listing = self._reddit.get(self.url, params=self.params)
File "/Users/nicktheodore/reddit-scraper/env/lib/python3.6/site-packages/praw/reddit.py", line 391, in get
data = self.request('GET', path, params=params)
File "/Users/nicktheodore/reddit-scraper/env/lib/python3.6/site-packages/praw/reddit.py", line 506, in request
params=params)
File "/Users/nicktheodore/reddit-scraper/env/lib/python3.6/site-packages/prawcore/sessions.py", line 185, in request
params=params, url=url)
File "/Users/nicktheodore/reddit-scraper/env/lib/python3.6/site-packages/prawcore/sessions.py", line 130, in _request_with_retries
raise self.STATUS_EXCEPTIONS[response.status_code](response)
prawcore.exceptions.BadRequest: received 400 HTTP response
我现在不知道发生了什么,因此我完全被封锁了。任何反馈都非常感谢!