我正在尝试下载新闻电子报纸(该电子报纸是图片)。我正在使用硒登录并获取图像src和请求模块以下载图像。
这是我使用的代码(请求部分):
def download(driver,pageNumber):
page,filename = pageNumber,""
if page in range(1,10):
filename = str(currentDT) + "_kompas_{}"+str(page)+".jpg"
filename = filename.format(0)
else: filename = str(currentDT) + "_kompas_"+str(page)+".jpg"
print("Downloading Page " + str(pageNumber) + " ...")
div = driver.find_element_by_xpath("//div[@class='page-wrapper' and @page='" + str(pageNumber) + "']")
img = div.find_element_by_tag_name("img")
imgsrc = img.get_attribute("src")
imgsrc2 = imgsrc.replace("getmedium","getpreview")
img.click()
WebDriverWait(driver,200).until(EC.visibility_of_element_located((By.XPATH,"//img[@src = '"+imgsrc2+"']")))
div2 = driver.find_element_by_xpath("//div[@class='page-wrapper' and @page='" + str(pageNumber) + "']")
img2 = div2.find_element_by_tag_name("img")
url = img2.get_attribute("src")
url = url.replace("https","http")
print(url)
url = img2.get_attribute("src")
r = requests.get(url)
if r.status_code == 200:
with open(download_path + "1.jpg", 'wb') as f:
f.write(r.content)
运行代码后,下载图像的大小为0个字节。当我使用print(r.headers)
检查标题时,它会抛出类似这样的内容:
{'Date':'Fri,28 Sep 2018 06:14:29 GMT','Content-Type':'text / html; charset = UTF-8','Transfer-Encoding':'chunked','Connection':'keep-alive','Set-Cookie':'__cfduid = d2770acf5454bb72630a1936eda1930561538115268; expires = 19年9月28日星期六,格林尼治标准时间;路径= /; domain = .epaper.id; HttpOnly,ci_session = db77e070cbe346e0ac183d686efae9989e8f2096;路径= /; HttpOnly”,“ X-Powered-By”:“ PHP / 5.6.37”,“ Expires”:“ Thu,1981年11月19日08:52:00 GMT”,“ Cache-Control”:“无存储,无-缓存,必须重新验证,后检查= 0,预检查= 0”,“编译指示”:“ no-cache”,“ Expect-CT”:“ max-age = 604800,report-uri =“ https:/ /report-uri.cloudflare.com/cdn-cgi/beacon/expect-ct“','服务器':'cloudflare','CF-RAY':'461411eeef1c31aa-SIN','Content-Encoding':'gzip' }
我该怎么做才能解决此问题?请帮助我...