Question

最后，我的终极目标是：

从网址上读取（此问题的内容）
将检索到的[PDF]内容保存到数据库中的BLOB字段（已经固定下来）
从BLOB字段中读取并将该内容附加到电子邮件
所有没有进入文件系统

使用以下方法的目标是获取可在下游用作电子邮件附件的byte[]（以避免写入磁盘）：

public byte[] retrievePDF() {

         HttpClient httpClient = new HttpClient();

         GetMethod httpGet = new GetMethod("http://website/document.pdf");
         httpClient.executeMethod(httpGet);
         InputStream is = httpGet.getResponseBodyAsStream();

         byte[] byteArray = new byte[(int) httpGet.getResponseContentLength()];

         is.read(byteArray, 0, byteArray.length);

        return byteArray;
}

对于特定PDF，getResponseContentLength()方法返回101,689作为长度。 奇怪的部分是，如果我设置断点并询问byteArray变量，它有101,689个字节的元素，但是，在字节＃3744之后，数组的剩余字节都是零（0）。 PDF阅读器客户端（如Adobe Reader）无法读取生成的PDF文件。

为什么会这样？

通过浏览器检索相同的PDF并保存到磁盘，或者使用下面的方法（我在answer to this StackOverflow post之后设置图案），产生可读的PDF：

public void retrievePDF() {
    FileOutputStream fos = null;
    URL url;
    ReadableByteChannel rbc = null;

    url = new URL("http://website/document.pdf");

    DataSource urlDataSource = new URLDataSource(url);

    /* Open a connection, then set appropriate time-out values */
    URLConnection conn = url.openConnection();
    conn.setConnectTimeout(120000);
    conn.setReadTimeout(120000);

    rbc = Channels.newChannel(conn.getInputStream());

    String filePath = "C:\\temp\\";
    String fileName = "testing1234.pdf";
    String tempFileName = filePath + fileName;

    fos = new FileOutputStream(tempFileName);
    fos.getChannel().transferFrom(rbc, 0, 1 << 24);
    fos.flush();

    /* Clean-up everything */
    fos.close();
    rbc.close();
}

对于这两种方法，当执行右键单击＆gt;时，生成的PDF的大小为101,689字节。 Windows中的属性... 。

为什么字节数组基本上会“中途停止”？

Answer 1

InputStream.read读取最多byteArray.length个字节但可能读得不多。它返回它读取的字节数。你应该反复调用它来完全读取数据，如下所示：

int bytesRead = 0;
while (true) {
    int n = is.read(byteArray, bytesRead, byteArray.length);
    if (n == -1) break;
    bytesRead += n;
}

Answer 2

检查InputStream.read的返回值。它不会一气呵成。你必须写一个循环。或者，更好的是，使用Apache Commons IO来复制流。

Answer 3

101689 = 2 ^ 16 + 36153 所以看起来，缓冲区大小有16位的限制。 36153和3744之间的差异可能源于标题部分已经在一个超小的1K缓冲区中读取，并且已经包含一些字节。

从URL读取奇怪的byte []行为

3 个答案: