如何将使用Beautiful汤抓取的图像文件放入列表中?

时间:2018-07-04 18:31:51

标签: python web-scraping beautifulsoup python-requests reddit

这是我用来从reddit上的r / pics获取所有图片并将其放入目录中的代码。我希望能够将目录中的实际文件放入列表中。坚持如何做。

import requests
from bs4 import BeautifulSoup as bs
import os

url = "https://www.reddit.com/r/pics/"
r = requests.get(url)
data = r.text
soup = bs(data,'lxml')

image_tags = soup.findAll('img')

if not os.path.exists('direct'):
    os.makedirs('direct')

os.chdir('direct')
x = 0

for image in image_tags:
    try:
        url = image['src']
        source = requests.get(url)
        if source.status_code == 200:
            img_path = 'direct-' + str(x) +'.jpg'
            with open(img_path, 'wb') as f:
                f.write(requests.get(url).content)
                f.close()
                x+=1
    except:
        pass

编辑:这是更新的代码,但仍然可以解决问题

import requests
from bs4 import BeautifulSoup as bs
import os


url = "https://www.reddit.com/r/drawing"
r = requests.get(url)
data = r.text
soup = bs(data,'lxml')

image_tags = soup.findAll('img')

if not os.path.exists('directory'):
    os.makedirs('directory')

os.chdir('directory')
x = 0
mylist = []
for image in image_tags:
    url = image['src']
    source = requests.get(url)
    if source.status_code == 200:
        img_path = 'direct-' + str(x) +'.jpg'
        with open(img_path, 'wb') as f:
            f.write(requests.get(url).content)
            mylist.append(img_path)
            f.close()
            x += 1


print(mylist)

1 个答案:

答案 0 :(得分:1)

在代码的开头创建一个列表:

...
mylist = []
...

然后在获取每张图像后,将其添加到列表中

...
img_path = 'direct-' + str(x) +'.jpg'
mylist.append(img_path)
....

编辑:

我执行了更新后的代码,而image_tags返回的是空白-实际上是

返回的页面
url = "https://www.reddit.com/r/drawing"
r = requests.get(url)
data = r.text

不包含任何图像。我想reddit有某种保护措施,可以防止您以这种方式获取图像。

尝试添加print(data),您会明白我的意思

您应该使用reddit api,以便reddit不会限制您的请求。