使用urllib.request.retrieve下载图像

时间:2019-08-09 18:45:50

标签: python-3.x urllib

我正在尝试从网站上下载图片,但我不断收到此错误消息:

  

HTTP错误403:禁止

这是我创建的要执行的功能:

    def download_images(url,knife):
      '''
      download_images is a function which will extract pictures of the knives in csgo
      url is the list of url which the images will be extracted from
      images of 'knife' will be downloaded
      '''

      page = requests.get(url)

      #Use beautifulsoup to extract the image urls
      soup = BeautifulSoup(page.content, 'html.parser') 

      #Pull all image labels from the website with instances of img_alt
      for img in soup.find_all('img', alt = True):
        #Find the url and labels of the knives
        if knife in img['alt']:
          #Download the images with the correct labels
          urllib.request.urlretrieve(img['src'],'{}.png'.format(img['alt']))

1 个答案:

答案 0 :(得分:0)

您应该更改用户代理。一个人可以使用许多用户代理。用户代理列表可用here。要使urllib使用其他用户代理,应添加此code。另外,您可以使用wget并使用选项-U,然后使用用户代理字符串(示例为'Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.1.6) Gecko/20070802 SeaMonkey/1.1.4')。


实施WGET

import os

def download_images(url,knife):
  '''
  download_images is a function which will extract pictures of the knives in csgo
  url is the list of url which the images will be extracted from
  images of 'knife' will be downloaded
  '''

  page = requests.get(url)

  #Use beautifulsoup to extract the image urls
  soup = BeautifulSoup(page.content, 'html.parser') 

  #Pull all image labels from the website with instances of img_alt
  for img in soup.find_all('img', alt = True):
    #Find the url and labels of the knives
    if knife in img['alt']:
      #Download the images with the correct labels
      os.system("wget --convert-links -U 'Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.1.6) Gecko/20070802 SeaMonkey/1.1.4' " + knife)