我正在尝试学习如何使用Python进行编码,我决定将Reddit的图像处理器作为我的第一个学习项目。我已经成功地完成了所有设置,除了我的for循环以完成一系列subreddits并没有完成。
subreddits列表的格式如下:
sub_list = ['pics', 'mapporn', 'wallpapers']
然后这是我已经通过每个subreddit并下载所有图像的双循环(通过检查以.jpg或.png结尾的URL)。
for sub in sub_list:
posts = u.get_subreddit(sub).get_hot(limit=post_limit)
for image in posts:
file_name = image.url
extension = image.url[-4:]
print (extension)
if extension == '.jpg' or extension == '.png':
try:
wget.download(file_name, path)
except urllib.error.HTTPError as err:
if err.code == 404:
pass
else:
wget.download(file_name, path)
我的问题是在脚本停止的前十几个图像之后出现这个错误:
IndexError: list index out of range
如果有帮助,这是堆栈跟踪:
File "redditscrape.py", line 52, in <module>
main()
File "redditscrape.py", line 27, in main
create_folder()
File "redditscrape.py", line 31, in create_folder
download_images()
File "redditscrape.py", line 44, in download_images
wget.download(file_name, path)
File "C:\ProgLangs\Python35\lib\site-packages\wget.py", line 527, in download
filename = detect_filename(url, out, headers)
File "C:\ProgLangs\Python35\lib\site-packages\wget.py", line 486, in detect_filename
names["headers"] = filename_from_headers(headers) or ''
File "C:\ProgLangs\Python35\lib\site-packages\wget.py", line 258, in filename_from_headers
name = fnames[0].split('=')[1].strip(' \t"')
我尝试在各个点添加print语句以找到打破它的内容,但我假设由于某种原因脚本只迭代数组中的第一项然后停止,但我不知道为什么