Question

我正在将一个大文件（18 GB）转换为一个字节[]，但出现此错误：

java.lang.OutOfMemoryError: Java heap space

这是负责该异常的代码：

byte[] content = Files.readAllBytes(path);

我正在创建字节数组以通过网络发送它：

createFile(filename.toString(),content);

private ADLStoreClient client; // package com.microsoft.azure.datalake.store
public boolean createFile(String filename, byte[] content) {
        try {
            // create file and write some content
            OutputStream stream = client.createFile(filename, IfExists.OVERWRITE);
            // set file permission
            client.setPermission(filename, "777");
            // append to file
            stream.write(content);
            stream.close();

        } catch (ADLException ex) {
            printExceptionDetails(ex);
            return false;
        } catch (Exception ex) {
            log.error(" Exception: {}", ex);
            return false;
        }
        return true;
    }

很明显readAllBytes（）将所有字节读入内存并导致OutOfMemoryError，我认为这可以使用流解决，但是我不擅长使用它们，任何人都可以给出适当的解决方案，谢谢

Answer 1

如Azure ADLStoreClient documentation所述：

createFile(String path, IfExists mode)

创建一个文件。如果overwriteIfExists为false并且文件已经存在，则引发异常。调用返回一个ADLFileOutputStream，然后可以将其写入。

是这样的：

try (InputStream in = new FileInputStream(path);
     OutputStream out = client.createFile(filename, IfExists.OVERWRITE)) {
    IOUtils.copyLarge(in, out);
}

您可以从IOUtils获取commons-io或自己制作copyLarge例程，这很简单：

void copyLarge(InputStream in, OutputStream out) throws IOException {
    byte[] buffer = new byte[65536];
    int length;
    while ((length = in.read(buffer)) > 0) {
        out.write(buffer, 0, length);
    }
}

Answer 2

像这样？（如果您想逐行处理）

try (Stream<String> stream = Files.lines(Paths.get(fileName))) {

            stream.forEach(System.out::println);

        } catch (IOException e) {
            e.printStackTrace();
        }
...

Answer 3

这是文件流类，我用于读取要流式传输的文件：

/**
 * Allows a file to be read and iterated over and allow to take advantage of java streams
 * @author locus2k
 *
 */
public class FileStream implements Iterator<byte[]>, Iterable<byte[]>, Spliterator<byte[]> {

  private InputStream stream;
  private int bufferSize;
  private long blockCount;


  /**
   * Create a FileStreamReader
   * @param stream the input stream containing the content to be read
   * @param bufferSize size of the buffer that should be read at once from the stream
   */
  private FileStream(InputStream stream, long fileSize, int bufferSize) {
    this.bufferSize = bufferSize;
    //calculate how many blocks will be generated by this stream
    this.blockCount = (long) Math.ceil((float)fileSize / (float)bufferSize);
    this.stream = stream;
  }

  @Override
  public boolean hasNext() {
    boolean hasNext = false;
    try {
      hasNext = stream.available() > 0;
      return hasNext;
    } catch (IOException e) {
      return false;
    } finally {
      //close the stream if there is no more to read
      if (!hasNext) {
        close();
      }
    }
  }

  @Override
  public byte[] next() {
    try {
      byte[] data = new byte[Math.min(bufferSize, stream.available())];
      stream.read(data);
      return data;
    } catch (IOException e) {
      //Close the stream if next causes an exception
      close();
      throw new RuntimeException(e.getMessage());
    }
  }

  /**
   * Close the stream
   */
  public void close() {
    try {
      stream.close();
    } catch (IOException e) { }
  }

  @Override
  public boolean tryAdvance(Consumer<? super byte[]> action) {
    action.accept(next());
    return hasNext();
  }

  @Override
  public Spliterator<byte[]> trySplit() {
    return this;
  }

  @Override
  public long estimateSize() {
    return blockCount;
  }

  @Override
  public int characteristics() {
    return Spliterator.IMMUTABLE;
  }

  @Override
  public Iterator<byte[]> iterator() {
    return this;
  }

  @Override
  public void forEachRemaining(Consumer<? super byte[]> action) {
    while(hasNext())
      action.accept(next());
  }

  /**
   * Create a java stream
   * @param inParallel if true then the returned stream is a parallel stream; if false the returned stream is a sequential stream.
   * @return stream with the data
   */
  private Stream<byte[]> stream(boolean inParallel) {
    return StreamSupport.stream(this, inParallel);
  }

  /**
   * Create a File Stream reader
   * @param fileName Name of the file to stream
   * @param bufferSize size of the buffer that should be read at once from the stream
   * @return Stream representation of the file
   */
  public static Stream<byte[]> stream(String fileName, int bufferSize) {
    return stream(new File(fileName), bufferSize);
  }

  /**
   * Create a FileStream reader
   * @param file The file to read
   * @param bufferSize the size of each read
   * @return the stream
   */
  public static Stream<byte[]> stream(File file, int bufferSize) {
    try {
      return stream(new FileInputStream(file), bufferSize);
    } catch (FileNotFoundException ex) {
      throw new IllegalArgumentException(ex.getMessage());
    }
  }

  /**
   * Create a file stream reader
   * @param stream the stream to read from (note this process will close the stream)
   * @param bufferSize size of each read
   * @return the stream
   */
  public static Stream<byte[]> stream(InputStream stream, int bufferSize) {
    try {
      return new FileStream(stream, stream.available(), bufferSize).stream(false);
    } catch (IOException ex) {
      throw new IllegalArgumentException(ex.getMessage());
    }
  }

  /**
   * Calculate the number of segments that will be created
   * @param sourceSize the size of the file
   * @param bufferSize the buffer size (or chunk size for each segment to be)
   * @return the number of packets that will be created
   */
  public static long caculateEstimatedSize(long sourceSize, Integer bufferSize) {
    return (long) Math.ceil((float)sourceSize / (float)bufferSize);
  }
}

然后使用它，您可以做类似的事情

FileStream.stream("myfile.text", 30000).forEach(b -> System.out.println(b.length));

这将创建一个文件流，并且forEach中的每个调用都将返回一个字节数组，该字节数组的大小为指定的缓冲区大小，在这种情况下，该字节数组将为30,000。

Answer 4

按照您所说的，您尝试将18 GB的内存（RAM）放入使用-Xmsn并将其设置为18 GB，但是您将需要18 GB的可用内存，您可以在Java文档中了解它：https://mailtrap.io

java.lang.OutOfMemoryError：Java堆空间Files.readAllBytes（path）

4 个答案: