我已经制作了一个刮刀来从网站下载图像。但是,当我运行它时,它会抛出错误显示:[raise HTTPError(req.full_url,code,msg,hdrs,fp) urllib.error.HTTPError:HTTP错误403]。我在其他网站上也使用这种方法来刮取图像,但没有遇到任何问题。我无法弄清楚为什么会出现此错误以及解决方法是什么。希望有人会调查它。
for (int i = 0; i < string1.Length; i++)
for (int j = 0; j < string2.Length; j++)
if (string1[i-1] != string2[j-1]) // find characters in the strings that are distinct
SUS[i][j] = SUS[i-1][j-1] + 1; // SUS: Shortest Unique Substring
else
SUS[i][j] = min(SUS[i-1][j], SUS[i][j-1]); // find minimum size of distinct strings
答案 0 :(得分:2)
您需要使用您用于获取初始页面的相同网络抓取会话下载图像。工作代码:
import requests
from lxml import html
def PictureScraping():
url = "https://www.yify-torrent.org/search/1080p/"
with requests.Session() as session:
response = session.get(url)
tree = html.fromstring(response.text)
titles = tree.xpath('//div[@class="movie-image"]')
for title in titles:
image_url = title.xpath('.//img/@src')[0]
image_name = image_url.split('/')[-1]
print(image_name)
image_url = "https:" + image_url
# download image
response = session.get(image_url, stream=True)
if response.status_code == 200:
with open(image_name, 'wb') as f:
for chunk in response.iter_content(1024):
f.write(chunk)
PictureScraping()