这是我用来从reddit上的r / pics获取所有图片并将其放入目录中的代码。我希望能够将目录中的实际文件放入列表中。坚持如何做。
import requests
from bs4 import BeautifulSoup as bs
import os
url = "https://www.reddit.com/r/pics/"
r = requests.get(url)
data = r.text
soup = bs(data,'lxml')
image_tags = soup.findAll('img')
if not os.path.exists('direct'):
os.makedirs('direct')
os.chdir('direct')
x = 0
for image in image_tags:
try:
url = image['src']
source = requests.get(url)
if source.status_code == 200:
img_path = 'direct-' + str(x) +'.jpg'
with open(img_path, 'wb') as f:
f.write(requests.get(url).content)
f.close()
x+=1
except:
pass
编辑:这是更新的代码,但仍然可以解决问题
import requests
from bs4 import BeautifulSoup as bs
import os
url = "https://www.reddit.com/r/drawing"
r = requests.get(url)
data = r.text
soup = bs(data,'lxml')
image_tags = soup.findAll('img')
if not os.path.exists('directory'):
os.makedirs('directory')
os.chdir('directory')
x = 0
mylist = []
for image in image_tags:
url = image['src']
source = requests.get(url)
if source.status_code == 200:
img_path = 'direct-' + str(x) +'.jpg'
with open(img_path, 'wb') as f:
f.write(requests.get(url).content)
mylist.append(img_path)
f.close()
x += 1
print(mylist)
答案 0 :(得分:1)
在代码的开头创建一个列表:
...
mylist = []
...
然后在获取每张图像后,将其添加到列表中
...
img_path = 'direct-' + str(x) +'.jpg'
mylist.append(img_path)
....
编辑:
我执行了更新后的代码,而image_tags
返回的是空白-实际上是
url = "https://www.reddit.com/r/drawing"
r = requests.get(url)
data = r.text
不包含任何图像。我想reddit有某种保护措施,可以防止您以这种方式获取图像。
尝试添加print(data)
,您会明白我的意思
您应该使用reddit api,以便reddit不会限制您的请求。