我正在将一个大文件(18 GB)转换为一个字节[],但出现此错误:
java.lang.OutOfMemoryError: Java heap space
这是负责该异常的代码:
byte[] content = Files.readAllBytes(path);
我正在创建字节数组以通过网络发送它:
createFile(filename.toString(),content);
private ADLStoreClient client; // package com.microsoft.azure.datalake.store
public boolean createFile(String filename, byte[] content) {
try {
// create file and write some content
OutputStream stream = client.createFile(filename, IfExists.OVERWRITE);
// set file permission
client.setPermission(filename, "777");
// append to file
stream.write(content);
stream.close();
} catch (ADLException ex) {
printExceptionDetails(ex);
return false;
} catch (Exception ex) {
log.error(" Exception: {}", ex);
return false;
}
return true;
}
很明显readAllBytes()将所有字节读入内存并导致OutOfMemoryError,我认为这可以使用流解决,但是我不擅长使用它们,任何人都可以给出适当的解决方案,谢谢
答案 0 :(得分:2)
如Azure ADLStoreClient documentation所述:
createFile(String path, IfExists mode)
创建一个文件。如果
overwriteIfExists
为false并且文件已经存在,则引发异常。调用返回一个ADLFileOutputStream
,然后可以将其写入。
是这样的:
try (InputStream in = new FileInputStream(path);
OutputStream out = client.createFile(filename, IfExists.OVERWRITE)) {
IOUtils.copyLarge(in, out);
}
您可以从IOUtils
获取commons-io
或自己制作copyLarge
例程,这很简单:
void copyLarge(InputStream in, OutputStream out) throws IOException {
byte[] buffer = new byte[65536];
int length;
while ((length = in.read(buffer)) > 0) {
out.write(buffer, 0, length);
}
}
答案 1 :(得分:1)
像这样? (如果您想逐行处理)
try (Stream<String> stream = Files.lines(Paths.get(fileName))) {
stream.forEach(System.out::println);
} catch (IOException e) {
e.printStackTrace();
}
...
答案 2 :(得分:1)
这是文件流类,我用于读取要流式传输的文件:
/**
* Allows a file to be read and iterated over and allow to take advantage of java streams
* @author locus2k
*
*/
public class FileStream implements Iterator<byte[]>, Iterable<byte[]>, Spliterator<byte[]> {
private InputStream stream;
private int bufferSize;
private long blockCount;
/**
* Create a FileStreamReader
* @param stream the input stream containing the content to be read
* @param bufferSize size of the buffer that should be read at once from the stream
*/
private FileStream(InputStream stream, long fileSize, int bufferSize) {
this.bufferSize = bufferSize;
//calculate how many blocks will be generated by this stream
this.blockCount = (long) Math.ceil((float)fileSize / (float)bufferSize);
this.stream = stream;
}
@Override
public boolean hasNext() {
boolean hasNext = false;
try {
hasNext = stream.available() > 0;
return hasNext;
} catch (IOException e) {
return false;
} finally {
//close the stream if there is no more to read
if (!hasNext) {
close();
}
}
}
@Override
public byte[] next() {
try {
byte[] data = new byte[Math.min(bufferSize, stream.available())];
stream.read(data);
return data;
} catch (IOException e) {
//Close the stream if next causes an exception
close();
throw new RuntimeException(e.getMessage());
}
}
/**
* Close the stream
*/
public void close() {
try {
stream.close();
} catch (IOException e) { }
}
@Override
public boolean tryAdvance(Consumer<? super byte[]> action) {
action.accept(next());
return hasNext();
}
@Override
public Spliterator<byte[]> trySplit() {
return this;
}
@Override
public long estimateSize() {
return blockCount;
}
@Override
public int characteristics() {
return Spliterator.IMMUTABLE;
}
@Override
public Iterator<byte[]> iterator() {
return this;
}
@Override
public void forEachRemaining(Consumer<? super byte[]> action) {
while(hasNext())
action.accept(next());
}
/**
* Create a java stream
* @param inParallel if true then the returned stream is a parallel stream; if false the returned stream is a sequential stream.
* @return stream with the data
*/
private Stream<byte[]> stream(boolean inParallel) {
return StreamSupport.stream(this, inParallel);
}
/**
* Create a File Stream reader
* @param fileName Name of the file to stream
* @param bufferSize size of the buffer that should be read at once from the stream
* @return Stream representation of the file
*/
public static Stream<byte[]> stream(String fileName, int bufferSize) {
return stream(new File(fileName), bufferSize);
}
/**
* Create a FileStream reader
* @param file The file to read
* @param bufferSize the size of each read
* @return the stream
*/
public static Stream<byte[]> stream(File file, int bufferSize) {
try {
return stream(new FileInputStream(file), bufferSize);
} catch (FileNotFoundException ex) {
throw new IllegalArgumentException(ex.getMessage());
}
}
/**
* Create a file stream reader
* @param stream the stream to read from (note this process will close the stream)
* @param bufferSize size of each read
* @return the stream
*/
public static Stream<byte[]> stream(InputStream stream, int bufferSize) {
try {
return new FileStream(stream, stream.available(), bufferSize).stream(false);
} catch (IOException ex) {
throw new IllegalArgumentException(ex.getMessage());
}
}
/**
* Calculate the number of segments that will be created
* @param sourceSize the size of the file
* @param bufferSize the buffer size (or chunk size for each segment to be)
* @return the number of packets that will be created
*/
public static long caculateEstimatedSize(long sourceSize, Integer bufferSize) {
return (long) Math.ceil((float)sourceSize / (float)bufferSize);
}
}
然后使用它,您可以做类似的事情
FileStream.stream("myfile.text", 30000).forEach(b -> System.out.println(b.length));
这将创建一个文件流,并且forEach中的每个调用都将返回一个字节数组,该字节数组的大小为指定的缓冲区大小,在这种情况下,该字节数组将为30,000。
答案 3 :(得分:0)
按照您所说的,您尝试将18 GB的内存(RAM)放入 使用-Xmsn并将其设置为18 GB,但是您将需要18 GB的可用内存,您可以在Java文档中了解它:https://mailtrap.io