我正在尝试从网站上下载图片,但我不断收到此错误消息:
HTTP错误403:禁止
这是我创建的要执行的功能:
def download_images(url,knife):
'''
download_images is a function which will extract pictures of the knives in csgo
url is the list of url which the images will be extracted from
images of 'knife' will be downloaded
'''
page = requests.get(url)
#Use beautifulsoup to extract the image urls
soup = BeautifulSoup(page.content, 'html.parser')
#Pull all image labels from the website with instances of img_alt
for img in soup.find_all('img', alt = True):
#Find the url and labels of the knives
if knife in img['alt']:
#Download the images with the correct labels
urllib.request.urlretrieve(img['src'],'{}.png'.format(img['alt']))
答案 0 :(得分:0)
您应该更改用户代理。一个人可以使用许多用户代理。用户代理列表可用here。要使urllib使用其他用户代理,应添加此code。另外,您可以使用wget
并使用选项-U
,然后使用用户代理字符串(示例为'Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.1.6) Gecko/20070802 SeaMonkey/1.1.4'
)。
import os
def download_images(url,knife):
'''
download_images is a function which will extract pictures of the knives in csgo
url is the list of url which the images will be extracted from
images of 'knife' will be downloaded
'''
page = requests.get(url)
#Use beautifulsoup to extract the image urls
soup = BeautifulSoup(page.content, 'html.parser')
#Pull all image labels from the website with instances of img_alt
for img in soup.find_all('img', alt = True):
#Find the url and labels of the knives
if knife in img['alt']:
#Download the images with the correct labels
os.system("wget --convert-links -U 'Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.1.6) Gecko/20070802 SeaMonkey/1.1.4' " + knife)