我尝试使用我在stackoverflow上找到的不同方法保存.pdf文件,包括FileUtils IO
,但是,我总是会损坏它。当我用记事本打开损坏的文件时,我得到了以下内容:
<HEAD>
<TITLE>
09010b129fasdf558a-
</TITLE>
</HEAD>
<HTML>
<SCRIPT language="javascript" src="./js/windowClose.js"></SCRIPT>
<LINK href="./theme/default.css" rel="stylesheet" type="text/css">
<LINK href="./theme/additions.css" rel="stylesheet" type="text/css">
<BODY leftmargin="0" topmargin="0">
<TABLE cellpadding="0" cellspacing="0" width="100%">
<TR>
<TD class="mainSectionHeader">
<A href="javascript:windowClose()" class="allLinks">
CLOSE
</A>
</TD>
</TR>
</TABLE>
<script language='javaScript'>
alert('Session timed out. Please login again.\n');
window.close();
</script>
</BODY>
</HTML>
后来,我尝试使用@BalusC提供的答案从浏览器中保存.pdf
文件。这个解决方案非常有用:我能够摆脱session
问题。但是,它也会产生损坏的.pdf。但是当我用记事本打开它时,它完全不同。不过没有登录问题了:
<HTML>
<HEAD>
<TITLE>
Evidence System
</TITLE>
</HEAD>
<LINK href="./theme/default.css" rel="stylesheet" type="text/css">
<TABLE cellpadding="0" cellspacing="0" class="tableWidth760" align="center">
<TR>
<TD class="headerTextCtr">
Evidence System
</TD>
</TR>
<TR>
<TD colspan="2">
<HR size="1" noshade>
</TD>
</TR>
<TR>
<TD colspan="2">
<HTML>
<HEAD>
<link href="./theme/default.css" rel="stylesheet" type="text/css">
<script language="JavaScript">
function trim(str)
{
var trmd_str
if(str != "")
{
trmd_str = str.replace(/\s*/, "")
if (trmd_str != ""){
trmd_str = trmd_str.replace(/\s*$/, "")
}
}else{
trmd_str = str
}
return trmd_str
}
function validate(frm){
//check for User name
var msg="";
if(trim(frm.userName.value)==""){
msg += "Please enter your user id.\n";
frm.userName.focus();
}
if(trim(frm.password.value)==""){
msg += "Please enter your password.\n";
frm.userName.focus();
}
if (trim(msg)==""){
frm.submit();
}else{
alert(msg);
}
}
function numCheck(event,frm){
if( event.keyCode == 13){
validate(frm);
}
}
</script>
</HEAD>
<BODY onLoad="document.frmLogin.userName.focus();">
<FORM name='frmLogin' method='post' action='./ServletVerify'>
<TABLE width="100%" cellspacing="20">
<tr>
<td class="mainTextRt">
Username
<input type="text" name="userName" maxlength="32" tabindex="1" value=""
onKeyPress="numCheck(event,this.form)" class="formTextField120">
</TD>
<td class="mainTextLt">
Password
<input type="password" name="password" maxlength="32" tabindex="2" value=""
onKeyPress="numCheck(event,this.form)" class="formTextField120">
</TD>
</TR>
<tr>
<td colspan="2" class="mainTextCtr" style="color:red">
Unknown Error
</td>
</tr>
<tr>
<td colspan="2" class="mainTextCtr">
<input type="button" tabindex="3" value="Submit" onclick="validate(this.form)" >
</TD>
</TR>
</TABLE>
<INPUT TYPE="hidden" NAME="actionFlag" VALUE="inbox">
</FORM>
</BODY>
</HTML>
</TD>
</TR>
<TR>
<TD height="2"></TD>
</TR>
<TR>
<TD colspan="2">
<HR size="1" noshade>
</TD>
</TR>
<TR>
<TD colspan="2">
<LINK href="./theme/default.css" rel="stylesheet" type="text/css">
<TABLE width="80%" align="center" cellspacing="0" cellpadding="0">
<TR>
<TD class="footerSubtext">
Evidence Management System
</TD>
</TR>
<!-- For development builds, change the date accordingly when sending EAR files out to Wal-Mart -->
<TR>
<TD class="footerSubtext">
Build: v3.1
</TD>
</TR>
</TABLE>
</TD>
</TR>
</TABLE>
</HTML>
我还有其他选择吗?
PS:当我尝试使用CTRL+Shift+S
手动保存文件时,文件会保存正常。
答案 0 :(得分:3)
PDF被视为Binary File
,因为copyUrlToFile()
的工作方式而被破坏。顺便说一句,这看起来像是JAVA - Download Binary File (e.g. PDF) file from Webserver
尝试使用此自定义二进制下载方法 -
public void downloadBinaryFile(String path) {
URL u = new URL(path);
URLConnection uc = u.openConnection();
String contentType = uc.getContentType();
int contentLength = uc.getContentLength();
if (contentType.startsWith("text/") || contentLength == -1) {
throw new IOException("This is not a binary file.");
}
InputStream raw = uc.getInputStream();
InputStream in = new BufferedInputStream(raw);
byte[] data = new byte[contentLength];
int bytesRead = 0;
int offset = 0;
while (offset < contentLength) {
bytesRead = in.read(data, offset, data.length - offset);
if (bytesRead == -1)
break;
offset += bytesRead;
}
in.close();
if (offset != contentLength) {
throw new IOException("Only read " + offset + " bytes; Expected " + contentLength + " bytes");
}
String filename = u.getFile().substring(filename.lastIndexOf('/') + 1);
FileOutputStream out = new FileOutputStream(filename);
out.write(data);
out.flush();
out.close();
}
编辑:实际上听起来好像你不在你认为自己的页面上......而不是做 driver.getCurrentUrl()
让您的脚本从指向PDF的链接中获取Url。假设有一个类似<a href='http://mysite.com/my.pdf' />
的链接,而不是点击它,然后获取网址,只需从该链接获取href,然后下载它。
String pdfPath = driver.findElement(By.id("someId")).getAttribute("href");
downloadBinaryFile(pdfPath);
答案 1 :(得分:3)
从错误的响应中看似只是一个HTML错误页面:
提醒('会话超时。请重新登录。\ n');
因此,下载PDF文件似乎需要在有效的HTTP会话中进行。 HTTP会话由cookie支持。 HTTP会话通常在服务器端包含有关当前活动和/或登录用户的信息。
Selenium Web驱动程序完全透明地管理cookie。您可以按以下方式以编程方式获取它们:
Set<Cookie> cookies = driver.manage().getCookies();
当手动摆弄Selenium的java.net.URL
外部控制时,您应该确保自己URL连接使用相同的cookie(因此也保持相同的HTTP会话)。您可以在URL连接上设置cookie,如下所示:
URLConnection connection = new URL(driver.getCurrentUrl()).openConnection();
for (Cookie cookie : driver.manage().getCookies()) {
String cookieHeader = cookie.getName() + "=" + cookie.getValue();
connection.addRequestProperty("Cookie", cookieHeader);
}
InputStream input = connection.getInputStream(); // Write this to file.
答案 2 :(得分:2)
服务器可能正在压缩pdf。您可以使用此代码,从this answer窃取,以检测并解压缩来自服务器的响应,
InputStream is = driver.getCurrentUrl().openStream();
try {
InputStream decoded = decompressStream(is);
FileOutputStream output = new FileOutputStream(
new File("C:\\Users\\myDocs\\myfolder\\myFile.pdf"));
try {
IOUtils.copy(decoded, output);
}
finally {
output.close();
}
} finally {
is.close();
}
public static InputStream decompressStream(InputStream input) {
PushBackInputStream pb = new PushBackInputStream( input, 2 ); //we need a pushbackstream to look ahead
byte [] signature = new byte[2];
pb.read( signature ); //read the signature
pb.unread( signature ); //push back the signature to the stream
if( signature[ 0 ] == (byte) 0x1f && signature[ 1 ] == (byte) 0x8b ) //check if matches standard gzip maguc number
return new GZIPInputStream( pb );
else
return pb;
}
答案 3 :(得分:1)
当我尝试使用CTRL + Shift + S手动保存文件时,该文件 保存好了。
虽然我主张使用Java来下载文件,但是有一个不太推荐的解决方法是按 Ctrl + Shift + S 以编程方式:Robot
类。
使用变通方法很糟糕,但就我在我尝试的浏览器和操作系统中所知,它可以正常工作。此代码不应该进入任何严肃的应用程序。但如果您无法以正确的方式解决问题,那么测试就可以了。
Robot robot = new Robot();
按Ctrl + Shift + S
robot.keyPress(KeyEvent.VK_CONTROL);
robot.keyPress(KeyEvent.VK_SHIFT);
robot.keyPress(KeyEvent.VK_S);
robot.keyRelease(KeyEvent.VK_S);
robot.keyRelease(KeyEvent.VK_SHIFT);
robot.keyRelease(KeyEvent.VK_CONTROL);
在我知道的浏览器和操作系统中,您应该位于文件名输入中的Save file
对话框中。您可以输入绝对路径:
robot.keyPress(KeyEvent.VK_C); // C
robot.keyRelease(KeyEvent.VK_C);
robot.keyPress(KeyEvent.VK_COLON); // : (colon)
robot.keyRelease(KeyEvent.VK_COLON);
robot.keyPress(KeyEvent.VK_SLASH); // / (slash)
robot.keyRelease(KeyEvent.VK_SLASH);
// etc. for the whole file path
robot.keyPress(KeyEvent.VK_ENTER); // confirm by pressing Enter in the end
robot.keyRelease(KeyEvent.VK_ENTER);
要获取密钥代码,您可以使用KeyEvent#getExtendedKeyCodeForChar()
(仅限Java 7+),或How can I make Robot type a `:`?和Convert String to KeyEvents。