如何从浏览器中保存.pdf?

时间:2013-09-27 20:44:20

标签: java selenium io fileutils

我尝试使用我在stackoverflow上找到的不同方法保存.pdf文件,包括FileUtils IO,但是,我总是会损坏它。当我用记事本打开损坏的文件时,我得到了以下内容:

<HEAD>

    <TITLE>
        09010b129fasdf558a-
    </TITLE>

</HEAD>


<HTML>

<SCRIPT language="javascript" src="./js/windowClose.js"></SCRIPT>

<LINK href="./theme/default.css" rel="stylesheet" type="text/css">
<LINK href="./theme/additions.css" rel="stylesheet" type="text/css">

<BODY leftmargin="0" topmargin="0">

<TABLE cellpadding="0" cellspacing="0" width="100%">
    <TR>
        <TD class="mainSectionHeader">
            <A href="javascript:windowClose()" class="allLinks">
                CLOSE
            </A>
        </TD>

    </TR>

</TABLE>

                <script language='javaScript'>
                    alert('Session timed out. Please login again.\n');
                    window.close();
                </script>



</BODY>

</HTML>

后来,我尝试使用@BalusC提供的答案从浏览器中保存.pdf文件。这个解决方案非常有用:我能够摆脱session问题。但是,它也会产生损坏的.pdf。但是当我用记事本打开它时,它完全不同。不过没有登录问题了:

<HTML>

    <HEAD>

        <TITLE>
            Evidence System
        </TITLE>

    </HEAD>

<LINK href="./theme/default.css" rel="stylesheet" type="text/css">

<TABLE cellpadding="0" cellspacing="0" class="tableWidth760" align="center">
    <TR>
        <TD class="headerTextCtr">
            Evidence System
        </TD>
    </TR>
    <TR>
        <TD colspan="2">
            <HR size="1" noshade>
        </TD>
    </TR>
    <TR>
        <TD colspan="2">



<HTML>
<HEAD>
<link href="./theme/default.css" rel="stylesheet" type="text/css">
<script language="JavaScript">

function trim(str)
{
    var trmd_str

    if(str != "")
    {
        trmd_str = str.replace(/\s*/, "")
        if (trmd_str != ""){

            trmd_str = trmd_str.replace(/\s*$/, "")
        }

    }else{
        trmd_str = str
    }
    return trmd_str
}  

function validate(frm){
    //check for User name 
    var msg="";
    if(trim(frm.userName.value)==""){
        msg += "Please enter your user id.\n";
        frm.userName.focus();
    }

    if(trim(frm.password.value)==""){
        msg += "Please enter your password.\n";
        frm.userName.focus();
    }

    if (trim(msg)==""){
        frm.submit();
    }else{
        alert(msg);
    }
}

function numCheck(event,frm){
    if( event.keyCode == 13){
            validate(frm);  
    }
}

</script>
</HEAD>

<BODY onLoad="document.frmLogin.userName.focus();">

<FORM name='frmLogin' method='post' action='./ServletVerify'>
    <TABLE width="100%" cellspacing="20">
        <tr>
            <td class="mainTextRt">
                Username
                <input type="text" name="userName" maxlength="32" tabindex="1" value="" 
                onKeyPress="numCheck(event,this.form)" class="formTextField120">
            </TD>
            <td class="mainTextLt">
                Password
                <input type="password" name="password" maxlength="32" tabindex="2" value="" 
                onKeyPress="numCheck(event,this.form)" class="formTextField120">
            </TD>
        </TR>

        <tr>                    
            <td colspan="2" class="mainTextCtr" style="color:red">
                Unknown Error
            </td>
        </tr>

        <tr>
            <td colspan="2" class="mainTextCtr">
                <input type="button" tabindex="3" value="Submit" onclick="validate(this.form)" >
            </TD>
        </TR>
    </TABLE>

    <INPUT TYPE="hidden" NAME="actionFlag" VALUE="inbox">
</FORM>

</BODY>
</HTML>

        </TD>
    </TR>
    <TR>
        <TD height="2"></TD>
    </TR>
    <TR>
        <TD colspan="2">
            <HR size="1" noshade>
        </TD>
    </TR>
    <TR>
        <TD colspan="2">
            <LINK href="./theme/default.css" rel="stylesheet" type="text/css">

<TABLE width="80%" align="center" cellspacing="0" cellpadding="0">
    <TR>
        <TD class="footerSubtext">
            Evidence Management System
        </TD>
    </TR>

    <!-- For development builds, change the date accordingly when sending EAR files out to Wal-Mart -->
    <TR>
        <TD class="footerSubtext">
            Build:&nbsp;&nbsp;v3.1
        </TD>
    </TR>

</TABLE>
        </TD>
    </TR>
</TABLE>

</HTML>

我还有其他选择吗?

PS:当我尝试使用CTRL+Shift+S手动保存文件时,文件会保存正常。

4 个答案:

答案 0 :(得分:3)

PDF被视为Binary File,因为copyUrlToFile()的工作方式而被破坏。顺便说一句,这看起来像是JAVA - Download Binary File (e.g. PDF) file from Webserver

的副本

尝试使用此自定义二进制下载方法 -

public void downloadBinaryFile(String path) {
    URL u = new URL(path);
    URLConnection uc = u.openConnection();
    String contentType = uc.getContentType();
    int contentLength = uc.getContentLength();
    if (contentType.startsWith("text/") || contentLength == -1) {
      throw new IOException("This is not a binary file.");
    }
    InputStream raw = uc.getInputStream();
    InputStream in = new BufferedInputStream(raw);
    byte[] data = new byte[contentLength];
    int bytesRead = 0;
    int offset = 0;
    while (offset < contentLength) {
      bytesRead = in.read(data, offset, data.length - offset);
      if (bytesRead == -1)
        break;
      offset += bytesRead;
    }
    in.close();

    if (offset != contentLength) {
      throw new IOException("Only read " + offset + " bytes; Expected " + contentLength + " bytes");
    }

    String filename = u.getFile().substring(filename.lastIndexOf('/') + 1);
    FileOutputStream out = new FileOutputStream(filename);
    out.write(data);
    out.flush();
    out.close();
}

编辑:实际上听起来好像你不在你认为自己的页面上......而不是做 driver.getCurrentUrl()

让您的脚本从指向PDF的链接中获取Url。假设有一个类似<a href='http://mysite.com/my.pdf' />的链接,而不是点击它,然后获取网址,只需从该链接获取href,然后下载它。

String pdfPath = driver.findElement(By.id("someId")).getAttribute("href");
downloadBinaryFile(pdfPath);

答案 1 :(得分:3)

从错误的响应中看似只是一个HTML错误页面:

  

提醒('会话超时。请重新登录。\ n');

因此,下载PDF文件似乎需要在有效的HTTP会话中进行。 HTTP会话由cookie支持。 HTTP会话通常在服务器端包含有关当前活动和/或登录用户的信息。

Selenium Web驱动程序完全透明地管理cookie。您可以按以下方式以编程方式获取它们:

Set<Cookie> cookies = driver.manage().getCookies();

当手动摆弄Selenium的java.net.URL外部控制时,您应该确保自己URL连接使用相同的cookie(因此也保持相同的HTTP会话)。您可以在URL连接上设置cookie,如下所示:

URLConnection connection = new URL(driver.getCurrentUrl()).openConnection();

for (Cookie cookie : driver.manage().getCookies()) {
    String cookieHeader = cookie.getName() + "=" + cookie.getValue();
    connection.addRequestProperty("Cookie", cookieHeader);
}

InputStream input = connection.getInputStream(); // Write this to file.

答案 2 :(得分:2)

服务器可能正在压缩pdf。您可以使用此代码,从this answer窃取,以检测并解压缩来自服务器的响应,

InputStream is = driver.getCurrentUrl().openStream();
try {
   InputStream decoded = decompressStream(is);
   FileOutputStream output = new FileOutputStream(
       new File("C:\\Users\\myDocs\\myfolder\\myFile.pdf"));
   try {
       IOUtils.copy(decoded, output);
   }
   finally {
       output.close();
   }
} finally {
   is.close();
}

public static InputStream decompressStream(InputStream input) {
     PushBackInputStream pb = new PushBackInputStream( input, 2 ); //we need a pushbackstream to look ahead
     byte [] signature = new byte[2];
     pb.read( signature ); //read the signature
     pb.unread( signature ); //push back the signature to the stream
     if( signature[ 0 ] == (byte) 0x1f && signature[ 1 ] == (byte) 0x8b ) //check if matches standard gzip maguc number
       return new GZIPInputStream( pb );
     else 
       return pb;
}

答案 3 :(得分:1)

  

当我尝试使用CTRL + Shift + S手动保存文件时,该文件   保存好了。

虽然我主张使用Java来下载文件,但是有一个不太推荐的解决方法是按 Ctrl + Shift + S 以编程方式:Robot类。

使用变通方法很糟糕,但就我在我尝试的浏览器和操作系统中所知,它可以正常工作。此代码不应该进入任何严肃的应用程序。但如果您无法以正确的方式解决问题,那么测试就可以了

Robot robot = new Robot();

按Ctrl + Shift + S

robot.keyPress(KeyEvent.VK_CONTROL);
robot.keyPress(KeyEvent.VK_SHIFT);
robot.keyPress(KeyEvent.VK_S);
robot.keyRelease(KeyEvent.VK_S);
robot.keyRelease(KeyEvent.VK_SHIFT);
robot.keyRelease(KeyEvent.VK_CONTROL);

在我知道的浏览器和操作系统中,您应该位于文件名输入中的Save file对话框中。您可以输入绝对路径:

robot.keyPress(KeyEvent.VK_C);        // C
robot.keyRelease(KeyEvent.VK_C);
robot.keyPress(KeyEvent.VK_COLON);    // : (colon)
robot.keyRelease(KeyEvent.VK_COLON);
robot.keyPress(KeyEvent.VK_SLASH);    // / (slash)
robot.keyRelease(KeyEvent.VK_SLASH);
// etc. for the whole file path

robot.keyPress(KeyEvent.VK_ENTER);    // confirm by pressing Enter in the end
robot.keyRelease(KeyEvent.VK_ENTER);

要获取密钥代码,您可以使用KeyEvent#getExtendedKeyCodeForChar()(仅限Java 7+),或How can I make Robot type a `:`?Convert String to KeyEvents