我的刮刀抛出错误而不是下载图像

时间:2017-05-02 13:17:40

标签: python web-crawler

我已经制作了一个刮刀来从网站下载图像。但是,当我运行它时,它会抛出错误显示:[raise HTTPError(req.full_url,code,msg,hdrs,fp) urllib.error.HTTPError:HTTP错误403]。我在其他网站上也使用这种方法来刮取图像,但没有遇到任何问题。我无法弄清楚为什么会出现此错误以及解决方法是什么。希望有人会调查它。

for (int i = 0; i < string1.Length; i++)
    for (int j = 0; j < string2.Length; j++)
        if (string1[i-1] != string2[j-1])   // find characters in the strings that are distinct
            SUS[i][j] = SUS[i-1][j-1] + 1;  // SUS: Shortest Unique Substring
        else
            SUS[i][j] = min(SUS[i-1][j], SUS[i][j-1]);  // find minimum size of distinct strings

1 个答案:

答案 0 :(得分:2)

您需要使用您用于获取初始页面的相同网络抓取会话下载图像。工作代码:

import requests
from lxml import html


def PictureScraping():
    url = "https://www.yify-torrent.org/search/1080p/"
    with requests.Session() as session:
        response = session.get(url)

        tree = html.fromstring(response.text)
        titles = tree.xpath('//div[@class="movie-image"]')
        for title in titles:
            image_url = title.xpath('.//img/@src')[0]
            image_name = image_url.split('/')[-1]
            print(image_name)
            image_url = "https:" + image_url

            # download image
            response = session.get(image_url, stream=True)
            if response.status_code == 200:
                with open(image_name, 'wb') as f:
                    for chunk in response.iter_content(1024):
                        f.write(chunk)

PictureScraping()