Question

对于我的生活，我一直无法找到与我想要做的事情相符的问题，所以我将解释我的用例在这里。如果你知道一个已经涵盖了这个问题的主题，请随时指导我。：）

我有一段代码会定期将文件上传到Amazon S3（每20秒）。该文件是由另一个进程写入的日志文件，因此该函数实际上是一种拖尾日志的方法，以便有人可以半实时地读取其内容，而无需直接访问日志所在的机器。

直到最近，我一直在使用S3 PutObject方法（使用File作为输入）来执行此上传。但是在AWS SDK 1.9中，这不再有效，因为如果实际上传的内容大小大于上传开始时承诺的内容长度，则S3客户端会拒绝该请求。此方法在开始流式传输数据之前读取文件的大小，因此，鉴于此应用程序的性质，该文件很可能在该点与流的末尾之间增加。这意味着我现在需要确保只发送N个字节的数据，无论文件有多大。

我没有必要以任何方式解释文件中的字节，所以我不关心编码。我可以逐字节传输它。基本上，我想要的是一个简单的方法，我可以读取文件到第N个字节，然后让它终止读取，即使文件中有更多的数据超过该点。（换句话说，在特定点将EOF插入流中。）

例如，如果我的文件在开始上传时长度为10000字节，但在上传期间增长到12000字节，我想停止以10000字节上传，无论大小如何变化。（在随后的上传中，我会上传12000字节或更多。）

我还没有找到预先制作的方法 - 到目前为止我发现的最好的似乎是IOUtils.copyLarge（InputStream，OutputStream，offset，length），可以告诉它复制最多提供的OutputStream的“length”字节。但是，copyLarge是一个阻塞方法，就像PutObject（可能在其InputStream上调用一种read（）形式一样），所以看起来我根本无法工作。

我还没有找到任何方法或预先构建的流可以做到这一点，所以它让我觉得我需要编写自己的实现，直接监视已经读取了多少字节。那可能会像BufferedInputStream一样工作，其中每批读取的字节数是缓冲区大小或要读取的剩余字节中的较小者。（例如，缓冲区大小为3000字节，我将分别以3000字节进行三批，然后是1000字节+ EOF的批处理。）

有谁知道更好的方法吗？感谢。

编辑为了澄清，我已经知道了几个替代方案，这两个方案都不理想：

（1）我可以在上传时锁定文件。这样做会导致在写入文件的过程中丢失数据或操作问题。

（2）我可以在上传之前创建该文件的本地副本。这可能是非常低效的并占用大量不必要的磁盘空间（此文件可能会增长到几千兆字节范围，并且运行它的计算机可能缺少磁盘空间）。

编辑2：根据同事的建议，我的最终解决方案如下：

private void uploadLogFile(final File logFile) {
    if (logFile.exists()) {
        long byteLength = logFile.length();
        try (
            FileInputStream fileStream = new FileInputStream(logFile);
            InputStream limitStream = ByteStreams.limit(fileStream, byteLength);
        ) {
            ObjectMetadata md = new ObjectMetadata();
            md.setContentLength(byteLength);
            // Set other metadata as appropriate.
            PutObjectRequest req = new PutObjectRequest(bucket, key, limitStream, md);
            s3Client.putObject(req);
        } // plus exception handling
    }
}

LimitInputStream是我的同事建议的，显然不知道它已经被弃用了。 ByteStreams.limit是当前的Guava替代品，它可以满足我的需求。谢谢，大家。

Answer 1

完整答案rip＆amp;取代

包装InputStream相对简单，例如限制它在发送数据结束之前将传递的字节数。 FilterInputStream针对这种一般类型的工作，但由于您必须覆盖此特定作业的几乎所有方法，因此它只会妨碍。

以下是解决方案的粗略内容：

import java.io.IOException; import java.io.InputStream; /** * An {@code InputStream} wrapper that provides up to a maximum number of * bytes from the underlying stream. Does not support mark/reset, even * when the wrapped stream does, and does not perform any buffering. */ public class BoundedInputStream extends InputStream { /** This stream's underlying @{code InputStream} */ private final InputStream data; /** The maximum number of bytes still available from this stream */ private long bytesRemaining; /** * Initializes a new {@code BoundedInputStream} with the specified * underlying stream and byte limit * @param data the @{code InputStream} serving as the source of this * one's data * @param maxBytes the maximum number of bytes this stream will deliver * before signaling end-of-data */ public BoundedInputStream(InputStream data, long maxBytes) { this.data = data; bytesRemaining = Math.max(maxBytes, 0); } @Override public int available() throws IOException { return (int) Math.min(data.available(), bytesRemaining); } @Override public void close() throws IOException { data.close(); } @Override public synchronized void mark(int limit) { // does nothing } @Override public boolean markSupported() { return false; } @Override public int read(byte[] buf, int off, int len) throws IOException { if (bytesRemaining > 0) { int nRead = data.read( buf, off, (int) Math.min(len, bytesRemaining)); bytesRemaining -= nRead; return nRead; } else { return -1; } } @Override public int read(byte[] buf) throws IOException { return this.read(buf, 0, buf.length); } @Override public synchronized void reset() throws IOException { throw new IOException("reset() not supported"); } @Override public long skip(long n) throws IOException { long skipped = data.skip(Math.min(n, bytesRemaining)); bytesRemaining -= skipped; return skipped; } @Override public int read() throws IOException { if (bytesRemaining > 0) { int c = data.read(); if (c >= 0) { bytesRemaining -= 1; } return c; } else { return -1; } } }

在Java中读取文件的前N个字节作为InputStream？

1 个答案: