我正在尝试下载网站的xls文件。当我点击链接下载文件时,我会收到一个javascript确认框。我像下面那样处理它
ConfirmHandler okHandler = new ConfirmHandler(){
public boolean handleConfirm(Page page, String message) {
return true;
}
};
webClient.setConfirmHandler(okHandler);
有一个下载文件的链接。
<a href="./my_file.php?mode=xls&w=d2hlcmUgc2VsbElkPSd3b3JsZGNvbScgYW5kIHN0YXR1cz0nV0FJVERFTEknIGFuZCBkYXRlIDw9IC0xMzQ4MTUzMjAwICBhbmQgZGF0ZSA%2BPSAtMTM1MDgzMTU5OSA%3D" target="actionFrame" onclick="return confirm('Do you want do download XLS file?')"><u>Download</u></a>
我点击链接
HTMLPage x = webClient.getPage("http://working.com/download");
HtmlAnchor anchor = (HtmlAnchor) x.getFirstByXPath("//a[@target='actionFrame']");
anchor.click();
handeConfirm()方法已被执行。但我不知道如何从服务器保存文件流。我尝试使用下面的代码查看流。
anchor.click().getWebResponse().getContentAsString();
但是,结果与页面x相同。任何人都知道如何从服务器捕获流?谢谢。
答案 0 :(得分:9)
我找到了一种使用WebWindowListener获取InputStream的方法。在webWindowContentChanged(WebWindowEvent事件)里面,我把代码放在下面。
InputStream xls = event.getWebWindow().getEnclosedPage().getWebResponse().getContentAsStream();
获得xls后,我可以将文件保存到硬盘中。
答案 1 :(得分:9)
我根据您的帖子制作了..注意:您可以更改内容类型条件,仅下载特定类型的文件。例如。(application / octect-stream,application / pdf等)。
package net.s4bdigital.export.main;
import java.io.File;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.InputStream;
import java.io.OutputStream;
import java.util.List;
import org.junit.Before;
import org.junit.Test;
import org.openqa.selenium.By;
import org.openqa.selenium.WebDriver;
import org.openqa.selenium.htmlunit.HtmlUnitDriver;
import com.gargoylesoftware.htmlunit.ConfirmHandler;
import com.gargoylesoftware.htmlunit.Page;
import com.gargoylesoftware.htmlunit.WebClient;
import com.gargoylesoftware.htmlunit.WebResponse;
import com.gargoylesoftware.htmlunit.WebWindowEvent;
import com.gargoylesoftware.htmlunit.WebWindowListener;
import com.gargoylesoftware.htmlunit.util.NameValuePair;
public class HtmlUnitDownloadFile {
protected String baseUrl;
protected static WebDriver driver;
@Before
public void openBrowser() {
baseUrl = "http://localhost/teste.html";
driver = new CustomHtmlUnitDriver();
((HtmlUnitDriver) driver).setJavascriptEnabled(true);
}
@Test
public void downloadAFile() throws Exception {
driver.get(baseUrl);
driver.findElement(By.linkText("click to Downloadfile")).click();
}
public class CustomHtmlUnitDriver extends HtmlUnitDriver {
// This is the magic. Keep a reference to the client instance
protected WebClient modifyWebClient(WebClient client) {
ConfirmHandler okHandler = new ConfirmHandler(){
public boolean handleConfirm(Page page, String message) {
return true;
}
};
client.setConfirmHandler(okHandler);
client.addWebWindowListener(new WebWindowListener() {
public void webWindowOpened(WebWindowEvent event) {
// TODO Auto-generated method stub
}
public void webWindowContentChanged(WebWindowEvent event) {
WebResponse response = event.getWebWindow().getEnclosedPage().getWebResponse();
System.out.println(response.getLoadTime());
System.out.println(response.getStatusCode());
System.out.println(response.getContentType());
List<NameValuePair> headers = response.getResponseHeaders();
for(NameValuePair header: headers){
System.out.println(header.getName() + " : " + header.getValue());
}
// Change or add conditions for content-types that you would to like
// receive like a file.
if(response.getContentType().equals("text/plain")){
getFileResponse(response, "target/testDownload.war");
}
}
public void webWindowClosed(WebWindowEvent event) {
}
});
return client;
}
}
public static void getFileResponse(WebResponse response, String fileName){
InputStream inputStream = null;
// write the inputStream to a FileOutputStream
OutputStream outputStream = null;
try {
inputStream = response.getContentAsStream();
// write the inputStream to a FileOutputStream
outputStream = new FileOutputStream(new File(fileName));
int read = 0;
byte[] bytes = new byte[1024];
while ((read = inputStream.read(bytes)) != -1) {
outputStream.write(bytes, 0, read);
}
System.out.println("Done!");
} catch (IOException e) {
e.printStackTrace();
} finally {
if (inputStream != null) {
try {
inputStream.close();
} catch (IOException e) {
e.printStackTrace();
}
}
if (outputStream != null) {
try {
// outputStream.flush();
outputStream.close();
} catch (IOException e) {
e.printStackTrace();
}
}
}
}
}
答案 2 :(得分:3)
如果您不使用Selenium包装HtmlUnit,那么这是一种更简单的方法。只需使用扩展的WebWindowListener提供HtmlUnit的WebClient。
您也可以使用Apache commons.io轻松进行流复制。
WebClient webClient = new WebClient();
webClient.addWebWindowListener(new WebWindowListener() {
public void webWindowOpened(WebWindowEvent event) { }
public void webWindowContentChanged(WebWindowEvent event) {
// Change or add conditions for content-types that you would
// to like receive like a file.
if (response.getContentType().equals("text/plain")) {
try {
IOUtils.copy(response.getContentAsStream(), new FileOutputStream("downloaded_file"));
} catch (FileNotFoundException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
}
}
public void webWindowClosed(WebWindowEvent event) {}
});
答案 3 :(得分:1)
final WebClient webClient = new WebClient(BrowserVersion.CHROME);
webClient.getOptions().setTimeout(2000);
webClient.getOptions().setThrowExceptionOnScriptError(false);
webClient.getOptions().setThrowExceptionOnFailingStatusCode(false);
webClient.waitForBackgroundJavaScript(2000);
//get General page
final HtmlPage page = webClient.getPage("http://your");
//get Frame
final HtmlPage frame = ((HtmlPage)
page.getFrameByName("Frame").getEnclosedPage());
webClient.setConfirmHandler(new ConfirmHandler() {
public boolean handleConfirm(Page page, String message) {
return true;
}
});
//get element file
final DomElement file = mainFrame.getElementByName("File");
final InputStream xls = file.click().getWebResponse().getContentAsStream();
assertNotNull(xls);
}
答案 4 :(得分:0)
扩展Roy的答案,这是我对这个问题的解决方案:
public static void prepareForDownloadingFile(WebClient webClient, File output) {
webClient.addWebWindowListener(new WebWindowListener() {
public void webWindowOpened(WebWindowEvent event) {
}
public void webWindowContentChanged(WebWindowEvent event) {
Page page = event.getNewPage();
FileOutputStream fos = null;
InputStream is = null;
if (page != null && page instanceof UnexpectedPage) {
try {
fos = new FileOutputStream(output);
UnexpectedPage uPage = (UnexpectedPage) page;
is = uPage.getInputStream();
IOUtils.copy(is, fos);
webClient.removeWebWindowListener(this);
} catch (Exception e) {
e.printStackTrace();
} finally {
try {
if (fos != null)
fos.close();
if (is != null)
is.close();
} catch (IOException e) {
e.printStackTrace();
}
}
}
}
public void webWindowClosed(WebWindowEvent event) {
}
});
}
我觉得有足够的差异可以使它成为新的答案:
-没有魔术变量(response
)
-关闭InputStream
和FileOutputStream
-寻找UnexpectedPage
以确定我们不在HTML页面上
-请求后下载文件一次,然后将其删除
-不需要知道ContentType
例如,在单击一次启动下载的按钮之前调用一次,将下载该文件。
答案 5 :(得分:-1)
找出下载URL,并在List中抓取它。从下载URL我们可以使用此代码获取整个文件。
try{
String path = "your destination path";
List<HtmlElement> downloadfiles = (List<HtmlElement>) page.getByXPath("the tag you want to scrape");
if (downloadfiles.isEmpty()) {
System.out.println("No items found !");
} else {
for (HtmlElement htmlItem : downloadfiles) {
String DownloadURL = htmlItem.getHrefAttribute();
Page invoicePdf = client.getPage(DownloadURL);
if (invoicePdf.getWebResponse().getContentType().equals("application/pdf")) {
System.out.println("creatign PDF:");
IOUtils.copy(invoicePdf.getWebResponse().getContentAsStream(),
new FileOutputStream(path + "file name"));
}
}
}
} catch (Exception e) {
e.printStackTrace();
}