我的硬盘上有两个(每个2GB)文件,想要将它们相互比较:
java.io.FileInputStream
两次读取并比较每个字节的字节数组,需要20多分钟。java.io.BufferedInputStream
缓冲区为64kb,文件以块的形式读取然后进行比较。比较完成是一个像
这样的紧密循环int numRead = Math.min(numRead[0], numRead[1]);
for (int k = 0; k < numRead; k++)
{
if (buffer[1][k] != buffer[0][k])
{
return buffer[0][k] - buffer[1][k];
}
}
我该怎么做才能加快速度? NIO应该比普通的流更快吗? Java无法使用DMA / SATA技术,而是做了一些缓慢的OS-API调用吗?
修改
谢谢你的回答。我做了一些基于它们的实验。正如安德烈亚斯所展示的
溪流或
nio
方法差别不大。
更重要的是正确的缓冲区大小。
我的实验证实了这一点。由于文件是以大块读取的,因此即使是额外的缓冲区(BufferedInputStream
)也不会提供任何内容。优化比较是可能的,并且我通过32次展开获得了最佳结果,但与磁盘读取相比,花费的时间比较小,因此加速很小。看起来我无能为力; - (
答案 0 :(得分:15)
我尝试了三种不同的方法来比较两个相同的3,8 gb文件,缓冲区大小介于8 kb和1 MB之间。 第一种方法只使用两个缓冲输入流
第二种方法使用一个线程池,它读入两个不同的线程并在第三个线程中进行比较。这会以高CPU利用率为代价获得略高的吞吐量。对于那些短期运行的任务,线程池的管理需要大量的开销。
第三种方法使用nio,由laginimaineb发布
如您所见,一般方法没有太大差异。更重要的是正确的缓冲区大小。
奇怪的是,我使用线程读取的字节少了1个字节。我无法发现错误。
comparing just with two streams
I was equal, even after 3684070360 bytes and reading for 704813 ms (4,98MB/sec * 2) with a buffer size of 8 kB
I was equal, even after 3684070360 bytes and reading for 578563 ms (6,07MB/sec * 2) with a buffer size of 16 kB
I was equal, even after 3684070360 bytes and reading for 515422 ms (6,82MB/sec * 2) with a buffer size of 32 kB
I was equal, even after 3684070360 bytes and reading for 534532 ms (6,57MB/sec * 2) with a buffer size of 64 kB
I was equal, even after 3684070360 bytes and reading for 422953 ms (8,31MB/sec * 2) with a buffer size of 128 kB
I was equal, even after 3684070360 bytes and reading for 793359 ms (4,43MB/sec * 2) with a buffer size of 256 kB
I was equal, even after 3684070360 bytes and reading for 746344 ms (4,71MB/sec * 2) with a buffer size of 512 kB
I was equal, even after 3684070360 bytes and reading for 669969 ms (5,24MB/sec * 2) with a buffer size of 1024 kB
comparing with threads
I was equal, even after 3684070359 bytes and reading for 602391 ms (5,83MB/sec * 2) with a buffer size of 8 kB
I was equal, even after 3684070359 bytes and reading for 523156 ms (6,72MB/sec * 2) with a buffer size of 16 kB
I was equal, even after 3684070359 bytes and reading for 527547 ms (6,66MB/sec * 2) with a buffer size of 32 kB
I was equal, even after 3684070359 bytes and reading for 276750 ms (12,69MB/sec * 2) with a buffer size of 64 kB
I was equal, even after 3684070359 bytes and reading for 493172 ms (7,12MB/sec * 2) with a buffer size of 128 kB
I was equal, even after 3684070359 bytes and reading for 696781 ms (5,04MB/sec * 2) with a buffer size of 256 kB
I was equal, even after 3684070359 bytes and reading for 727953 ms (4,83MB/sec * 2) with a buffer size of 512 kB
I was equal, even after 3684070359 bytes and reading for 741000 ms (4,74MB/sec * 2) with a buffer size of 1024 kB
comparing with nio
I was equal, even after 3684070360 bytes and reading for 661313 ms (5,31MB/sec * 2) with a buffer size of 8 kB
I was equal, even after 3684070360 bytes and reading for 656156 ms (5,35MB/sec * 2) with a buffer size of 16 kB
I was equal, even after 3684070360 bytes and reading for 491781 ms (7,14MB/sec * 2) with a buffer size of 32 kB
I was equal, even after 3684070360 bytes and reading for 317360 ms (11,07MB/sec * 2) with a buffer size of 64 kB
I was equal, even after 3684070360 bytes and reading for 643078 ms (5,46MB/sec * 2) with a buffer size of 128 kB
I was equal, even after 3684070360 bytes and reading for 865016 ms (4,06MB/sec * 2) with a buffer size of 256 kB
I was equal, even after 3684070360 bytes and reading for 716796 ms (4,90MB/sec * 2) with a buffer size of 512 kB
I was equal, even after 3684070360 bytes and reading for 652016 ms (5,39MB/sec * 2) with a buffer size of 1024 kB
使用的代码:
import junit.framework.Assert;
import org.junit.Before;
import org.junit.Test;
import java.io.BufferedInputStream;
import java.io.File;
import java.io.FileInputStream;
import java.io.IOException;
import java.nio.ByteBuffer;
import java.nio.channels.FileChannel;
import java.text.DecimalFormat;
import java.text.NumberFormat;
import java.util.Arrays;
import java.util.concurrent.*;
public class FileCompare {
private static final int MIN_BUFFER_SIZE = 1024 * 8;
private static final int MAX_BUFFER_SIZE = 1024 * 1024;
private String fileName1;
private String fileName2;
private long start;
private long totalbytes;
@Before
public void createInputStream() {
fileName1 = "bigFile.1";
fileName2 = "bigFile.2";
}
@Test
public void compareTwoFiles() throws IOException {
System.out.println("comparing just with two streams");
int currentBufferSize = MIN_BUFFER_SIZE;
while (currentBufferSize <= MAX_BUFFER_SIZE) {
compareWithBufferSize(currentBufferSize);
currentBufferSize *= 2;
}
}
@Test
public void compareTwoFilesFutures()
throws IOException, ExecutionException, InterruptedException {
System.out.println("comparing with threads");
int myBufferSize = MIN_BUFFER_SIZE;
while (myBufferSize <= MAX_BUFFER_SIZE) {
start = System.currentTimeMillis();
totalbytes = 0;
compareWithBufferSizeFutures(myBufferSize);
myBufferSize *= 2;
}
}
@Test
public void compareTwoFilesNio() throws IOException {
System.out.println("comparing with nio");
int myBufferSize = MIN_BUFFER_SIZE;
while (myBufferSize <= MAX_BUFFER_SIZE) {
start = System.currentTimeMillis();
totalbytes = 0;
boolean wasEqual = isEqualsNio(myBufferSize);
if (wasEqual) {
printAfterEquals(myBufferSize);
} else {
Assert.fail("files were not equal");
}
myBufferSize *= 2;
}
}
private void compareWithBufferSize(int myBufferSize) throws IOException {
final BufferedInputStream inputStream1 =
new BufferedInputStream(
new FileInputStream(new File(fileName1)),
myBufferSize);
byte[] buff1 = new byte[myBufferSize];
final BufferedInputStream inputStream2 =
new BufferedInputStream(
new FileInputStream(new File(fileName2)),
myBufferSize);
byte[] buff2 = new byte[myBufferSize];
int read1;
start = System.currentTimeMillis();
totalbytes = 0;
while ((read1 = inputStream1.read(buff1)) != -1) {
totalbytes += read1;
int read2 = inputStream2.read(buff2);
if (read1 != read2) {
break;
}
if (!Arrays.equals(buff1, buff2)) {
break;
}
}
if (read1 == -1) {
printAfterEquals(myBufferSize);
} else {
Assert.fail("files were not equal");
}
inputStream1.close();
inputStream2.close();
}
private void compareWithBufferSizeFutures(int myBufferSize)
throws ExecutionException, InterruptedException, IOException {
final BufferedInputStream inputStream1 =
new BufferedInputStream(
new FileInputStream(
new File(fileName1)),
myBufferSize);
final BufferedInputStream inputStream2 =
new BufferedInputStream(
new FileInputStream(
new File(fileName2)),
myBufferSize);
final boolean wasEqual = isEqualsParallel(myBufferSize, inputStream1, inputStream2);
if (wasEqual) {
printAfterEquals(myBufferSize);
} else {
Assert.fail("files were not equal");
}
inputStream1.close();
inputStream2.close();
}
private boolean isEqualsParallel(int myBufferSize
, final BufferedInputStream inputStream1
, final BufferedInputStream inputStream2)
throws InterruptedException, ExecutionException {
final byte[] buff1Even = new byte[myBufferSize];
final byte[] buff1Odd = new byte[myBufferSize];
final byte[] buff2Even = new byte[myBufferSize];
final byte[] buff2Odd = new byte[myBufferSize];
final Callable<Integer> read1Even = new Callable<Integer>() {
public Integer call() throws Exception {
return inputStream1.read(buff1Even);
}
};
final Callable<Integer> read2Even = new Callable<Integer>() {
public Integer call() throws Exception {
return inputStream2.read(buff2Even);
}
};
final Callable<Integer> read1Odd = new Callable<Integer>() {
public Integer call() throws Exception {
return inputStream1.read(buff1Odd);
}
};
final Callable<Integer> read2Odd = new Callable<Integer>() {
public Integer call() throws Exception {
return inputStream2.read(buff2Odd);
}
};
final Callable<Boolean> oddEqualsArray = new Callable<Boolean>() {
public Boolean call() throws Exception {
return Arrays.equals(buff1Odd, buff2Odd);
}
};
final Callable<Boolean> evenEqualsArray = new Callable<Boolean>() {
public Boolean call() throws Exception {
return Arrays.equals(buff1Even, buff2Even);
}
};
ExecutorService executor = Executors.newCachedThreadPool();
boolean isEven = true;
Future<Integer> read1 = null;
Future<Integer> read2 = null;
Future<Boolean> isEqual = null;
int lastSize = 0;
while (true) {
if (isEqual != null) {
if (!isEqual.get()) {
return false;
} else if (lastSize == -1) {
return true;
}
}
if (read1 != null) {
lastSize = read1.get();
totalbytes += lastSize;
final int size2 = read2.get();
if (lastSize != size2) {
return false;
}
}
isEven = !isEven;
if (isEven) {
if (read1 != null) {
isEqual = executor.submit(oddEqualsArray);
}
read1 = executor.submit(read1Even);
read2 = executor.submit(read2Even);
} else {
if (read1 != null) {
isEqual = executor.submit(evenEqualsArray);
}
read1 = executor.submit(read1Odd);
read2 = executor.submit(read2Odd);
}
}
}
private boolean isEqualsNio(int myBufferSize) throws IOException {
FileChannel first = null, seconde = null;
try {
first = new FileInputStream(fileName1).getChannel();
seconde = new FileInputStream(fileName2).getChannel();
if (first.size() != seconde.size()) {
return false;
}
ByteBuffer firstBuffer = ByteBuffer.allocateDirect(myBufferSize);
ByteBuffer secondBuffer = ByteBuffer.allocateDirect(myBufferSize);
int firstRead, secondRead;
while (first.position() < first.size()) {
firstRead = first.read(firstBuffer);
totalbytes += firstRead;
secondRead = seconde.read(secondBuffer);
if (firstRead != secondRead) {
return false;
}
if (!nioBuffersEqual(firstBuffer, secondBuffer, firstRead)) {
return false;
}
}
return true;
} finally {
if (first != null) {
first.close();
}
if (seconde != null) {
seconde.close();
}
}
}
private static boolean nioBuffersEqual(ByteBuffer first, ByteBuffer second, final int length) {
if (first.limit() != second.limit() || length > first.limit()) {
return false;
}
first.rewind();
second.rewind();
for (int i = 0; i < length; i++) {
if (first.get() != second.get()) {
return false;
}
}
return true;
}
private void printAfterEquals(int myBufferSize) {
NumberFormat nf = new DecimalFormat("#.00");
final long dur = System.currentTimeMillis() - start;
double seconds = dur / 1000d;
double megabytes = totalbytes / 1024 / 1024;
double rate = (megabytes) / seconds;
System.out.println("I was equal, even after " + totalbytes
+ " bytes and reading for " + dur
+ " ms (" + nf.format(rate) + "MB/sec * 2)" +
" with a buffer size of " + myBufferSize / 1024 + " kB");
}
}
答案 1 :(得分:7)
有了这么大的文件,使用java.nio.
可以获得更好的效果此外,使用java流读取单个字节可能非常慢。使用字节数组(我自己的经验中的2-6K元素,ymmv,因为它看起来像平台/应用程序特定)将显着提高您使用流的读取性能。
答案 2 :(得分:6)
使用Java读取和写入文件也同样快。您可以使用FileChannels。 至于比较文件,显然这需要花费大量时间来比较字节 这是使用FileChannels和ByteBuffers的示例(可以进一步优化):
public static boolean compare(String firstPath, String secondPath, final int BUFFER_SIZE) throws IOException {
FileChannel firstIn = null, secondIn = null;
try {
firstIn = new FileInputStream(firstPath).getChannel();
secondIn = new FileInputStream(secondPath).getChannel();
if (firstIn.size() != secondIn.size())
return false;
ByteBuffer firstBuffer = ByteBuffer.allocateDirect(BUFFER_SIZE);
ByteBuffer secondBuffer = ByteBuffer.allocateDirect(BUFFER_SIZE);
int firstRead, secondRead;
while (firstIn.position() < firstIn.size()) {
firstRead = firstIn.read(firstBuffer);
secondRead = secondIn.read(secondBuffer);
if (firstRead != secondRead)
return false;
if (!buffersEqual(firstBuffer, secondBuffer, firstRead))
return false;
}
return true;
} finally {
if (firstIn != null) firstIn.close();
if (secondIn != null) firstIn.close();
}
}
private static boolean buffersEqual(ByteBuffer first, ByteBuffer second, final int length) {
if (first.limit() != second.limit())
return false;
if (length > first.limit())
return false;
first.rewind(); second.rewind();
for (int i=0; i<length; i++)
if (first.get() != second.get())
return false;
return true;
}
答案 3 :(得分:6)
修改NIO比较功能后,我得到以下结果。
I was equal, even after 4294967296 bytes and reading for 304594 ms (13.45MB/sec * 2) with a buffer size of 1024 kB
I was equal, even after 4294967296 bytes and reading for 225078 ms (18.20MB/sec * 2) with a buffer size of 4096 kB
I was equal, even after 4294967296 bytes and reading for 221351 ms (18.50MB/sec * 2) with a buffer size of 16384 kB
注意:这意味着正在以37 MB / s的速率读取文件
在更快的驱动器上运行相同的东西
I was equal, even after 4294967296 bytes and reading for 178087 ms (23.00MB/sec * 2) with a buffer size of 1024 kB
I was equal, even after 4294967296 bytes and reading for 119084 ms (34.40MB/sec * 2) with a buffer size of 4096 kB
I was equal, even after 4294967296 bytes and reading for 109549 ms (37.39MB/sec * 2) with a buffer size of 16384 kB
注意:这意味着正在以74.8 MB / s的速率读取文件
private static boolean nioBuffersEqual(ByteBuffer first, ByteBuffer second, final int length) {
if (first.limit() != second.limit() || length > first.limit()) {
return false;
}
first.rewind();
second.rewind();
int i;
for (i = 0; i < length-7; i+=8) {
if (first.getLong() != second.getLong()) {
return false;
}
}
for (; i < length; i++) {
if (first.get() != second.get()) {
return false;
}
}
return true;
}
答案 4 :(得分:5)
以下是关于在java中读取文件的不同方法的相对优点的好文章。可能有一些用处:
答案 5 :(得分:2)
你可以查看Suns Article for I/O Tuning(已经有点过时),也许你可以找到那里的例子与你的代码之间的相似之处。还要看一下包含比java.io更快的I / O元素的java.nio包。 Dobbs Journal博士在high performance IO using java.nio上发表了一篇非常好的文章。
如果是这样,还有其他示例和调优技巧可以帮助您加速代码。
此外,Arrays类内置了methods for comparing byte arrays,也许这些也可用于加快速度并简化循环。
答案 6 :(得分:1)
为了更好地进行比较,请尝试一次复制两个文件。硬盘驱动器可以比读取两个文件更有效地读取一个文件(因为头部必须来回移动才能读取) 减少这种情况的一种方法是使用更大的缓冲区,例如16 MB。与ByteBuffer。
使用ByteBuffer,您可以通过比较long值与getLong()
一次比较8个字节如果您的Java有效,大部分工作都在磁盘/操作系统中进行读写,因此它不应该比使用任何其他语言慢得多(因为磁盘/操作系统是瓶颈)
在确定代码中没有错误之前,不要认为Java很慢。
答案 7 :(得分:1)
我发现在这篇文章中链接的很多文章都过时了(也有一些非常有见地的东西)。 2001年有一些文章链接起来,信息充其量是有问题的。机械同情的马丁汤普森在2011年写了很多关于此的内容。请参考他为背景和理论撰写的内容。
我发现NIO与NIO的性能关系不大。它更多地是关于输出缓冲区的大小(在那个缓冲区上读取字节数组)。 NIO没有魔力让它快速进行网络规模的酱油。
我能够采用Martin的例子并使用1.0时代的OutputStream并使其尖叫。 NIO也很快,但最大的指标就是输出缓冲区的大小,不管你是否使用NIO,除非你当然使用内存映射的NIO然后重要。 :)
如果您想了解最新的权威信息,请参阅Martin的博客:
http://mechanical-sympathy.blogspot.com/2011/12/java-sequential-io-performance.html
如果你想看看NIO如何不会产生那么大的差别(因为我能够使用更快的常规IO编写示例),请参阅:
http://www.dzone.com/links/fast_java_io_nio_is_always_faster_than_fileoutput.html
我已经测试了我对使用快速硬盘的新Windows笔记本电脑,带有SSD的macbook pro,EC2 xlarge和带有最大IOPS /高速I / O的EC2 4x大的假设(很快就在大磁盘上) NAS光纤磁盘阵列)因此它可以工作(对于较小的EC2实例存在一些问题但是如果你关心性能......你会使用一个小的EC2实例吗?)。如果你使用真正的硬件,在我的测试中到目前为止,传统的IO总是获胜。如果您使用高/ IO EC2,那么这也是一个明显的赢家。如果您在有源EC2实例下使用,NIO可以获胜。
没有基准替代品。
无论如何,我不是专家,我只是使用Martin Thompson爵士在博客文章中写下的框架进行了一些实证测试。
我将这个用于下一步并使用 Files.newInputStream (来自JDK 7) TransferQueue 创建一个使Java I / O尖叫的配方(即使在小的EC2实例上)。可以在本文档底部找到Boon(https://github.com/RichardHightower/boon/wiki/Auto-Growable-Byte-Buffer-like-a-ByteBuilder)的配方。这允许我使用传统的OutputStream,但在较小的EC2实例上可以使用。 (我是Boon的主要作者。但是我接受新作者。工资很糟糕。每小时0美元。但好消息是,我可以随时加倍你的工资。)
我的2美分。
请参阅此内容,了解为什么 TransferQueue 非常重要。 http://php.sabscape.com/blog/?p=557
主要学习:
答案 8 :(得分:0)
DMA / SATA是硬件/低级技术,任何编程语言都无法看到。
对于内存映射输入/输出,你应该使用java.nio,我相信。
您确定不是按一个字节读取这些文件吗?这将是浪费,我建议逐块进行,每个块应该像64兆字节,以尽量减少搜索。
答案 9 :(得分:-1)
尝试将输入流上的缓冲区设置为几兆字节。