我有一个模块负责读取,处理和写入磁盘的字节。字节通过UDP传入,在汇编各个数据报之后,处理并写入磁盘的最终字节数组通常在200字节到500,000字节之间。有时,会有一些字节数组在汇编后超过500,000字节,但这些数组相对较少。
我目前正在使用FileOutputStream
的write(byte\[\])
method。我也正在尝试将FileOutputStream
包裹在BufferedOutputStream
中,包括使用the constructor that accepts a buffer size as a parameter。
似乎使用BufferedOutputStream
趋向于略微提高性能,但我只是开始尝试使用不同的缓冲区大小。我只有一组有限的样本数据可供使用(来自样本运行的两个数据集,我可以通过我的应用程序管道)。是否有一般的经验法则我可以应用于尝试计算最佳缓冲区大小以减少磁盘写入并最大化磁盘写入的性能,因为我知道有关我正在编写的数据的信息?
答案 0 :(得分:32)
BufferedOutputStream有助于写入小于缓冲区大小,例如8 KB。对于较大的写入,它没有帮助,也没有使它变得更糟。如果所有写入都大于缓冲区大小,或者每次写入后总是刷新(),我就不会使用缓冲区。但是,如果你的写入的很大一部分少于缓冲区大小,并且每次都不使用flush(),那么它的价值就是。
您可能会发现将缓冲区大小增加到32 KB或更大会使您获得边际改进,或者使其变得更糟。 YMMV
您可能会发现BufferedOutputStream.write的代码很有用
/**
* Writes <code>len</code> bytes from the specified byte array
* starting at offset <code>off</code> to this buffered output stream.
*
* <p> Ordinarily this method stores bytes from the given array into this
* stream's buffer, flushing the buffer to the underlying output stream as
* needed. If the requested length is at least as large as this stream's
* buffer, however, then this method will flush the buffer and write the
* bytes directly to the underlying output stream. Thus redundant
* <code>BufferedOutputStream</code>s will not copy data unnecessarily.
*
* @param b the data.
* @param off the start offset in the data.
* @param len the number of bytes to write.
* @exception IOException if an I/O error occurs.
*/
public synchronized void write(byte b[], int off, int len) throws IOException {
if (len >= buf.length) {
/* If the request length exceeds the size of the output buffer,
flush the output buffer and then write the data directly.
In this way buffered streams will cascade harmlessly. */
flushBuffer();
out.write(b, off, len);
return;
}
if (len > buf.length - count) {
flushBuffer();
}
System.arraycopy(b, off, buf, count, len);
count += len;
}
答案 1 :(得分:1)
我最近一直试图探索IO性能。根据我的观察,直接写入FileOutputStream
已经取得了更好的结果;我将其归因于FileOutputStream
对write(byte[], int, int)
的原生呼叫。此外,我还观察到,当BufferedOutputStream
的潜伏期开始收敛于直接FileOutputStream
的潜伏期时,它会波动很多,即它可能突然甚至翻倍(我没有&#39}但是还是找不到原因。
P.S。我正在使用Java 8,现在无法评论我的观察是否适用于以前的java版本。
这是我测试的代码,我的输入是一个~10KB的文件
public class WriteCombinationsOutputStreamComparison {
private static final Logger LOG = LogManager.getLogger(WriteCombinationsOutputStreamComparison.class);
public static void main(String[] args) throws IOException {
final BufferedInputStream input = new BufferedInputStream(new FileInputStream("src/main/resources/inputStream1.txt"), 4*1024);
final ByteArrayOutputStream byteArrayOutputStream = new ByteArrayOutputStream();
int data = input.read();
while (data != -1) {
byteArrayOutputStream.write(data); // everything comes in memory
data = input.read();
}
final byte[] bytesRead = byteArrayOutputStream.toByteArray();
input.close();
/*
* 1. WRITE USING A STREAM DIRECTLY with entire byte array --> FileOutputStream directly uses a native call and writes
*/
try (OutputStream outputStream = new FileOutputStream("src/main/resources/outputStream1.txt")) {
final long begin = System.nanoTime();
outputStream.write(bytesRead);
outputStream.flush();
final long end = System.nanoTime();
LOG.info("Total time taken for file write, writing entire array [nanos=" + (end - begin) + "], [bytesWritten=" + bytesRead.length + "]");
if (LOG.isDebugEnabled()) {
LOG.debug("File reading result was: \n" + new String(bytesRead, Charset.forName("UTF-8")));
}
}
/*
* 2. WRITE USING A BUFFERED STREAM, write entire array
*/
// changed the buffer size to different combinations --> write latency fluctuates a lot for same buffer size over multiple runs
try (BufferedOutputStream outputStream = new BufferedOutputStream(new FileOutputStream("src/main/resources/outputStream1.txt"), 16*1024)) {
final long begin = System.nanoTime();
outputStream.write(bytesRead);
outputStream.flush();
final long end = System.nanoTime();
LOG.info("Total time taken for buffered file write, writing entire array [nanos=" + (end - begin) + "], [bytesWritten=" + bytesRead.length + "]");
if (LOG.isDebugEnabled()) {
LOG.debug("File reading result was: \n" + new String(bytesRead, Charset.forName("UTF-8")));
}
}
}
}
输出:
2017-01-30 23:38:59.064 [INFO] [main] [WriteCombinationsOutputStream] - Total time taken for file write, writing entire array [nanos=100990], [bytesWritten=11059]
2017-01-30 23:38:59.086 [INFO] [main] [WriteCombinationsOutputStream] - Total time taken for buffered file write, writing entire array [nanos=142454], [bytesWritten=11059]