Question

我正在尝试下载新闻电子报纸（该电子报纸是图片）。我正在使用硒登录并获取图像src和请求模块以下载图像。

这是我使用的代码（请求部分）：

def download(driver,pageNumber):
    page,filename = pageNumber,""
    if page in range(1,10):
        filename = str(currentDT) + "_kompas_{}"+str(page)+".jpg"
        filename = filename.format(0)
    else: filename = str(currentDT) + "_kompas_"+str(page)+".jpg"
    print("Downloading Page " + str(pageNumber) + " ...")
    div = driver.find_element_by_xpath("//div[@class='page-wrapper' and  @page='" + str(pageNumber) + "']")
    img = div.find_element_by_tag_name("img")
    imgsrc = img.get_attribute("src")
    imgsrc2 = imgsrc.replace("getmedium","getpreview")
    img.click()
    WebDriverWait(driver,200).until(EC.visibility_of_element_located((By.XPATH,"//img[@src = '"+imgsrc2+"']")))
    div2 = driver.find_element_by_xpath("//div[@class='page-wrapper' and @page='" + str(pageNumber) + "']")
    img2 = div2.find_element_by_tag_name("img")
    url = img2.get_attribute("src")
    url = url.replace("https","http")
    print(url)
    url = img2.get_attribute("src")
    r = requests.get(url)
    if r.status_code == 200:
        with open(download_path + "1.jpg", 'wb') as f:
            f.write(r.content)

运行代码后，下载图像的大小为0个字节。当我使用print(r.headers)检查标题时，它会抛出类似这样的内容：

{'Date'：'Fri，28 Sep 2018 06:14:29 GMT'，'Content-Type'：'text / html; charset = UTF-8'，'Transfer-Encoding'：'chunked'，'Connection'：'keep-alive'，'Set-Cookie'：'__cfduid = d2770acf5454bb72630a1936eda1930561538115268; expires = 19年9月28日星期六，格林尼治标准时间；路径= /; domain = .epaper.id; HttpOnly，ci_session = db77e070cbe346e0ac183d686efae9989e8f2096;路径= /; HttpOnly”，“ X-Powered-By”：“ PHP / 5.6.37”，“ Expires”：“ Thu，1981年11月19日08:52:00 GMT”，“ Cache-Control”：“无存储，无-缓存，必须重新验证，后检查= 0，预检查= 0”，“编译指示”：“ no-cache”，“ Expect-CT”：“ max-age = 604800，report-uri =“ https：/ /report-uri.cloudflare.com/cdn-cgi/beacon/expect-ct“'，'服务器'：'cloudflare'，'CF-RAY'：'461411eeef1c31aa-SIN'，'Content-Encoding'：'gzip' }

我该怎么做才能解决此问题？请帮助我...

python-requests无法下载图像，下载的图像为0字节

0 个答案: