我想知道,如何使用selenium / webdriver下载页面的图像。假设用户会话需要下载图像,因此具有纯URL是没有用的。任何示例代码都非常受欢迎。
答案 0 :(得分:17)
我更喜欢做这样的事情:
1. Get the SRC attribute of the image.
2. Use ImageIO.read to read the image onto a BufferedImage
3. Save the BufferedImage using ImageIO.write function
答案 1 :(得分:10)
答案 2 :(得分:4)
我更喜欢这样:
WebElement logo = driver.findElement(By.cssSelector(".image-logo"));
String logoSRC = logo.getAttribute("src");
URL imageURL = new URL(logoSRC);
BufferedImage saveImage = ImageIO.read(imageURL);
ImageIO.write(saveImage, "png", new File("logo-image.png"));
答案 3 :(得分:3)
我发现避免两次下载图像的唯一方法是使用Chrome DevTools协议查看器。
在Python中,这给出了:
import base64
import pychrome
def save_image(file_content, file_name):
try:
file_content=base64.b64decode(file_content)
with open("C:\\Crawler\\temp\\" + file_name,"wb") as f:
f.write(file_content)
except Exception as e:
print(str(e))
def response_received(requestId, loaderId, timestamp, type, response, frameId):
if type == 'Image':
url = response.get('url')
print(f"Image loaded: {url}")
response_body = tab.Network.getResponseBody(requestId=requestId)
file_name = url.split('/')[-1].split('?')[0]
if file_name:
save_image(response_body['body'], file_name)
tab.Network.responseReceived = response_received
# start the tab
tab.start()
# call method
tab.Network.enable()
# get request to target the site selenium
driver.get("https://www.realtor.com/ads/forsale/TMAI112283AAAA")
# wait for loading
tab.wait(50)
答案 4 :(得分:2)
另一个最正确的解决方案是直接通过简单的HTTP请求下载它 您可以使用webDriver的用户会话,因为它存储cookie 在我的example中,我只是分析它返回的状态代码。如果是200,则图像存在,可以显示或下载。如果您需要真正下载文件本身 - 您可以从httpResponse实体获取所有图像数据(将其用作简单的输入流)。
// just look at your cookie's content (e.g. using browser)
// and import these settings from it
private static final String SESSION_COOKIE_NAME = "JSESSIONID";
private static final String DOMAIN = "domain.here.com";
private static final String COOKIE_PATH = "/cookie/path/here";
protected boolean isResourceAvailableByUrl(String resourceUrl) {
HttpClient httpClient = new DefaultHttpClient();
HttpContext localContext = new BasicHttpContext();
BasicCookieStore cookieStore = new BasicCookieStore();
// apply jsessionid cookie if it exists
cookieStore.addCookie(getSessionCookie());
localContext.setAttribute(ClientContext.COOKIE_STORE, cookieStore);
// resourceUrl - is url which leads to image
HttpGet httpGet = new HttpGet(resourceUrl);
try {
HttpResponse httpResponse = httpClient.execute(httpGet, localContext);
return httpResponse.getStatusLine().getStatusCode() == HttpStatus.SC_OK;
} catch (IOException e) {
return false;
}
}
protected BasicClientCookie getSessionCookie() {
Cookie originalCookie = webDriver.manage().getCookieNamed(SESSION_COOKIE_NAME);
if (originalCookie == null) {
return null;
}
// just build new apache-like cookie based on webDriver's one
String cookieName = originalCookie.getName();
String cookieValue = originalCookie.getValue();
BasicClientCookie resultCookie = new BasicClientCookie(cookieName, cookieValue);
resultCookie.setDomain(DOMAIN);
resultCookie.setExpiryDate(originalCookie.getExpiry());
resultCookie.setPath(COOKIE_PATH);
return resultCookie;
}
答案 5 :(得分:2)
尝试以下
JavascriptExecutor js = (JavascriptExecutor) driver;
String base64string = (String) js.executeScript("var c = document.createElement('canvas');"
+ " var ctx = c.getContext('2d');"
+ "var img = document.getElementsByTagName('img')[0];"
+ "c.height=img.naturalHeight;"
+ "c.width=img.naturalWidth;"
+ "ctx.drawImage(img, 0, 0,img.naturalWidth, img.naturalHeight);"
+ "var base64String = c.toDataURL();"
+ "return base64String;");
String[] base64Array = base64string.split(",");
String base64 = base64Array[base64Array.length - 1];
byte[] data = Base64.decode(base64);
ByteArrayInputStream memstream = new ByteArrayInputStream(data);
BufferedImage saveImage = ImageIO.read(memstream);
ImageIO.write(saveImage, "png", new File("path"));
答案 6 :(得分:1)
此处的其他解决方案不适用于所有浏览器,不适用于所有网站,或两者兼而有之。
此解决方案应该更加强大。它使用浏览器查看图像,调整浏览器大小以适应图像大小,截取屏幕截图,最后将浏览器的大小调整为原始大小。
def get_image(driver, img_url):
'''Given an images url, return a binary screenshot of it in png format.'''
driver.get_url(img_url)
# Get the dimensions of the browser and image.
orig_h = driver.execute_script("return window.outerHeight")
orig_w = driver.execute_script("return window.outerWidth")
margin_h = orig_h - driver.execute_script("return window.innerHeight")
margin_w = orig_w - driver.execute_script("return window.innerWidth")
new_h = driver.execute_script('return document.getElementsByTagName("img")[0].height')
new_w = driver.execute_script('return document.getElementsByTagName("img")[0].width')
# Resize the browser window.
logging.info("Getting Image: orig %sX%s, marg %sX%s, img %sX%s - %s"%(
orig_w, orig_h, margin_w, margin_h, new_w, new_h, img_url))
driver.set_window_size(new_w + margin_w, new_h + margin_h)
# Get the image by taking a screenshot of the page.
img_val = driver.get_screenshot_as_png()
# Set the window size back to what it was.
driver.set_window_size(orig_w, orig_h)
# Go back to where we started.
driver.back()
return img_val
此解决方案的一个缺点是,如果图像非常小,浏览器将不会调整大小,并且您可能会在其周围出现黑色边框。
答案 7 :(得分:1)
为我工作:
# open the image in a new tab
driver.execute_script('''window.open("''' + wanted_url + '''","_blank");''')
sleep(2)
driver.switch_to.window(driver.window_handles[1])
sleep(2)
# make screenshot
driver.save_screenshot("C://Folder/" + photo_name + ".jpeg")
sleep(2)
# close the new tab
driver.execute_script('''window.close();''')
sleep(2)
#back to original tab
driver.switch_to.window(driver.window_handles[0])
答案 8 :(得分:0)
如果您需要测试该图像是否可用且存在,您可能会这样做:
protected boolean isResourceAvailableByUrl(String resourceUrl) {
// backup current url, to come back to it in future
String currentUrl = webDriver.getCurrentUrl();
try {
// try to get image by url
webDriver.get(resourceUrl);
// if "resource not found" message was not appeared - image exists
return webDriver.findElements(RESOURCE_NOT_FOUND).isEmpty();
} finally {
// back to page
webDriver.get(currentUrl);
}
}
但是你需要确定,在执行此方法之前,通过currentUrl会真正让你回到页面上。在我的情况下是这样的。如果不是 - 您可以尝试使用:
webDriver.navigate().back()
而且,不幸的是,似乎没有任何机会分析响应状态代码。这就是为什么你需要在NOT_FOUND页面上找到任何特定的web元素并检查它是否出现然后决定 - 该图像不存在。
这只是解决方法,因为我找不到任何正式方法来解决它。
注意: 当您使用授权会话获取资源时,此解决方案非常有用,并且不能通过ImageIO或HttpClient严格下载。
答案 9 :(得分:0)
使用selenium获取图像src
elemImg.get_attribute('src')
为python使用编程语言; 检查这个答案: How to save an image locally using Python whose URL address I already know?
答案 10 :(得分:0)
这是一个javascript解决方案。 这有点傻-而且我厌倦了以太多请求访问源映像的服务器。有人可以告诉我fetch()是否访问浏览器的缓存?我不想向源服务器发送垃圾邮件。
它将FileReader()附加到窗口,获取图像并将其转换为base64,然后将字符串标记在窗口上。
然后,驱动程序可以返回该窗口变量。
export async function scrapePic(driver) {
try {
console.log("waiting for that profile piccah")
console.log(driver)
let rootEl = await driver.findElement(By.css('.your-root-element'));
let imgEl = await rootEl.findElement(By.css('img'))
await driver.wait(until.elementIsVisible(imgEl, 10000));
console.log('profile piccah found')
let img = await imgEl.getAttribute('src')
//attach reader to driver window
await driver.executeScript(`window.myFileReader = new FileReader();`)
await driver.executeScript(`
window.myFileReader.onloadend = function() {
window['profileImage'] = this.result
}
fetch( arguments[0] ).then( res => res.blob() ).then( blob => window.electronFileReader.readAsDataURL(blob) )
`, img)
await driver.sleep(5000)
let img64 = await driver.executeScript(`return window.profileImage`)
console.log(img64)
} catch (e) {
console.log(e)
} finally {
return img64
}
}
答案 11 :(得分:0)
在我的用例中,存在cookie和其他问题,导致此处的其他方法不合适。
我最终使用XMLHttpRequest填充了FileReader(来自How to convert image into base64 string using javascript,然后使用Selenium的ExecuteAsyncScript
进行了调用(如Selenium and asynchronous JavaScript calls所示),这使我得到了{ {3}}可以直接解析。
这是我的C#代码,用于获取数据URL:
public string ImageUrlToDataUrl(IWebDriver driver, string imageUrl)
{
var js = new StringBuilder();
js.AppendLine("var done = arguments[0];"); // The callback from ExecuteAsyncScript
js.AppendLine(@"
function toDataURL(url, callback) {
var xhr = new XMLHttpRequest();
xhr.onload = function() {
var reader = new FileReader();
reader.onloadend = function() {
callback(reader.result);
}
reader.readAsDataURL(xhr.response);
};
xhr.open('GET', url);
xhr.responseType = 'blob';
xhr.send();
}"); // XMLHttpRequest -> FileReader -> DataURL conversion
js.AppendLine("toDataURL('" + imageUrl + "', done);"); // Invoke the function
var executor = (IJavaScriptExecutor) driver;
var dataUrl = executor.ExecuteAsyncScript(js.ToString()) as string;
return dataUrl;
}
答案 12 :(得分:0)
尽管@ aboy021 JS代码在语法上是正确的,但我无法运行该代码。 (使用Chrome V83.xx)
但是此代码有效(Java):
String url = "/your-url-goes.here.jpg";
String imageData = (String) ((JavascriptExecutor) driver).executeAsyncScript(
"var callback = arguments[0];" + // The callback from ExecuteAsyncScript
"var reader;" +
"var xhr = new XMLHttpRequest();" +
"xhr.onreadystatechange = function() {" +
" if (xhr.readyState == 4) {" +
"var reader = new FileReader();" +
"reader.readAsDataURL(xhr.response);" +
"reader.onloadend = function() {" +
" callback(reader.result);" +
"}" +
" }" +
"};" +
"xhr.open('GET', '" + url + "', true);" +
"xhr.responseType = 'blob';" +
"xhr.send();");
String base64Data = imageData.split(",")[1];
byte[] decodedBytes = Base64.getDecoder().decode(base64Data);
try (OutputStream stream = new FileOutputStream("c:\\dev\\tmp\\output.jpg")) {
stream.write(decodedBytes);
} catch (IOException e) {
e.printStackTrace();
}
答案 13 :(得分:0)
如何从元素文本或属性获取URL的方式下载到文件
完整的扩展代码可以在这里找到:
如果您想在不编写代码的情况下使用此方法,请使用NuGet https://www.nuget.org/packages/Gravity.Core/
Install-Package Gravity.Core -Version 2020.7.5.3
用法
using OpenQA.Selenium.Extensions;
...
var driver = new ChromeDriver();
// from element attribute
var element = driver.FindElement(By.XPath("//img[@id='my_img']")).DownloadResource(path: @"C:\images\cap_image_01.png", attribute: "src");
// from element text
var element = driver.FindElement(By.XPath("//div[1]")).DownloadResource(path: @"C:\images\cap_image_01.png");
建议使用NuGet,因为它包含许多用于Selenium的工具和扩展
用于不使用NuGet(自行实现)
扩展类
using System.IO;
using System.Net.Http;
using System.Text.RegularExpressions;
namespace Extensions
{
public static class WebElementExtensions
{
public static IWebElement DownloadResource(this IWebElement element, string path)
{
return DoDownloadResource(element, path, "");
}
public static IWebElement DownloadResource(this IWebElement element, string path, string attribute)
{
return DoDownloadResource(element, path, attribute);
}
private static IWebElement DoDownloadResource(this IWebElement element, string path, string attribute)
{
// get resource address
var resource = (string.IsNullOrEmpty(attribute))
? element.Text
: element.GetAttribute(attribute);
// download resource
using (var client = new HttpClient())
{
// get response for the current resource
var httpResponseMessage = client.GetAsync(resource).GetAwaiter().GetResult();
// exit condition
if (!httpResponseMessage.IsSuccessStatusCode) return element;
// create directories path
Directory.CreateDirectory(path);
// get absolute file name
var fileName = Regex.Match(resource, @"[^/\\&\?]+\.\w{3,4}(?=([\?&].*$|$))").Value;
path = (path.LastIndexOf(@"\") == path.Length - 1)
? path + fileName
: path + $@"\{fileName}";
// write the file
File.WriteAllBytes(path, httpResponseMessage.Content.ReadAsByteArrayAsync().GetAwaiter().GetResult());
}
// keep the fluent
return element;
}
}
}
用法
using Extensions;
...
var driver = new ChromeDriver();
// from element attribute
var element = driver.FindElement(By.XPath("//img[@id='my_img']")).DownloadResource(path: @"C:\images\cap_image_01.png", attribute: "src");
// from element text
var element = driver.FindElement(By.XPath("//div[1]")).DownloadResource(path: @"C:\images\cap_image_01.png");