我需要在每个服务器上读取~50个文件,并将每个文本文件的表示放入内存中。每个文本文件都有自己的字符串(这是用于字符串持有者的最佳类型?)。
将文件读入内存的最快方法是什么?保存文本的最佳数据结构/类型是什么,以便我可以在内存中操作它(主要是搜索和替换)?
由于
答案 0 :(得分:30)
内存映射文件将是最快的......如下所示:
final File file;
final FileChannel channel;
final MappedByteBuffer buffer;
file = new File(fileName);
fin = new FileInputStream(file);
channel = fin.getChannel();
buffer = channel.map(MapMode.READ_ONLY, 0, file.length());
然后继续从字节缓冲区读取。
这将明显快于FileInputStream
或FileReader
。
编辑:
经过一番调查后发现,根据您的操作系统,您可能最好使用新的BufferedInputStream(new FileInputStream(file))
。然而,将所有内容一次性读入char []文件的大小听起来就像是最糟糕的方式。
所以BufferedInputStream
应该在所有平台上提供大致一致的性能,而内存映射文件可能很慢或很快,具体取决于底层操作系统。与性能至关重要的一切一样,您应该测试代码并查看最佳效果。
编辑:
这里有一些测试(第一个测试完成两次以将文件放入磁盘缓存中)。
我在rt.jar类文件上运行它,解压缩到硬盘驱动器,这是在Windows 7 beta x64下。这是16784个文件,总共94,706,637个字节。
首先是结果......
(记住第一次重复以获得磁盘缓存设置)
ArrayTest中
ArrayTest
DataInputByteAtATime
DataInputReadFully
存储器映射
这是代码......
import java.io.BufferedInputStream;
import java.io.DataInputStream;
import java.io.File;
import java.io.FileInputStream;
import java.io.IOException;
import java.io.InputStream;
import java.nio.MappedByteBuffer;
import java.nio.channels.FileChannel;
import java.nio.channels.FileChannel.MapMode;
import java.util.HashSet;
import java.util.Set;
public class Main
{
public static void main(final String[] argv)
{
ArrayTest.main(argv);
ArrayTest.main(argv);
DataInputByteAtATime.main(argv);
DataInputReadFully.main(argv);
MemoryMapped.main(argv);
}
}
abstract class Test
{
public final void run(final File root)
{
final Set<File> files;
final long size;
final long start;
final long end;
final long total;
files = new HashSet<File>();
getFiles(root, files);
start = System.currentTimeMillis();
size = readFiles(files);
end = System.currentTimeMillis();
total = end - start;
System.out.println(getClass().getName());
System.out.println("time = " + total);
System.out.println("bytes = " + size);
}
private void getFiles(final File dir,
final Set<File> files)
{
final File[] childeren;
childeren = dir.listFiles();
for(final File child : childeren)
{
if(child.isFile())
{
files.add(child);
}
else
{
getFiles(child, files);
}
}
}
private long readFiles(final Set<File> files)
{
long size;
size = 0;
for(final File file : files)
{
size += readFile(file);
}
return (size);
}
protected abstract long readFile(File file);
}
class ArrayTest
extends Test
{
public static void main(final String[] argv)
{
final Test test;
test = new ArrayTest();
test.run(new File(argv[0]));
}
protected long readFile(final File file)
{
InputStream stream;
stream = null;
try
{
final byte[] data;
int soFar;
int sum;
stream = new BufferedInputStream(new FileInputStream(file));
data = new byte[(int)file.length()];
soFar = 0;
do
{
soFar += stream.read(data, soFar, data.length - soFar);
}
while(soFar != data.length);
sum = 0;
for(final byte b : data)
{
sum += b;
}
return (sum);
}
catch(final IOException ex)
{
ex.printStackTrace();
}
finally
{
if(stream != null)
{
try
{
stream.close();
}
catch(final IOException ex)
{
ex.printStackTrace();
}
}
}
return (0);
}
}
class DataInputByteAtATime
extends Test
{
public static void main(final String[] argv)
{
final Test test;
test = new DataInputByteAtATime();
test.run(new File(argv[0]));
}
protected long readFile(final File file)
{
DataInputStream stream;
stream = null;
try
{
final int fileSize;
int sum;
stream = new DataInputStream(new BufferedInputStream(new FileInputStream(file)));
fileSize = (int)file.length();
sum = 0;
for(int i = 0; i < fileSize; i++)
{
sum += stream.readByte();
}
return (sum);
}
catch(final IOException ex)
{
ex.printStackTrace();
}
finally
{
if(stream != null)
{
try
{
stream.close();
}
catch(final IOException ex)
{
ex.printStackTrace();
}
}
}
return (0);
}
}
class DataInputReadFully
extends Test
{
public static void main(final String[] argv)
{
final Test test;
test = new DataInputReadFully();
test.run(new File(argv[0]));
}
protected long readFile(final File file)
{
DataInputStream stream;
stream = null;
try
{
final byte[] data;
int sum;
stream = new DataInputStream(new BufferedInputStream(new FileInputStream(file)));
data = new byte[(int)file.length()];
stream.readFully(data);
sum = 0;
for(final byte b : data)
{
sum += b;
}
return (sum);
}
catch(final IOException ex)
{
ex.printStackTrace();
}
finally
{
if(stream != null)
{
try
{
stream.close();
}
catch(final IOException ex)
{
ex.printStackTrace();
}
}
}
return (0);
}
}
class DataInputReadInChunks
extends Test
{
public static void main(final String[] argv)
{
final Test test;
test = new DataInputReadInChunks();
test.run(new File(argv[0]));
}
protected long readFile(final File file)
{
DataInputStream stream;
stream = null;
try
{
final byte[] data;
int size;
final int fileSize;
int sum;
stream = new DataInputStream(new BufferedInputStream(new FileInputStream(file)));
fileSize = (int)file.length();
data = new byte[512];
size = 0;
sum = 0;
do
{
size += stream.read(data);
sum = 0;
for(int i = 0; i < size; i++)
{
sum += data[i];
}
}
while(size != fileSize);
return (sum);
}
catch(final IOException ex)
{
ex.printStackTrace();
}
finally
{
if(stream != null)
{
try
{
stream.close();
}
catch(final IOException ex)
{
ex.printStackTrace();
}
}
}
return (0);
}
}
class MemoryMapped
extends Test
{
public static void main(final String[] argv)
{
final Test test;
test = new MemoryMapped();
test.run(new File(argv[0]));
}
protected long readFile(final File file)
{
FileInputStream stream;
stream = null;
try
{
final FileChannel channel;
final MappedByteBuffer buffer;
final int fileSize;
int sum;
stream = new FileInputStream(file);
channel = stream.getChannel();
buffer = channel.map(MapMode.READ_ONLY, 0, file.length());
fileSize = (int)file.length();
sum = 0;
for(int i = 0; i < fileSize; i++)
{
sum += buffer.get();
}
return (sum);
}
catch(final IOException ex)
{
ex.printStackTrace();
}
finally
{
if(stream != null)
{
try
{
stream.close();
}
catch(final IOException ex)
{
ex.printStackTrace();
}
}
}
return (0);
}
}
答案 1 :(得分:4)
最有效的方法是:
File.length()
)new InputStreamReader (new FileInputStream(file), encoding)
阅读new String(buffer)
如果您需要在启动时搜索并更换一次,请使用String.replaceAll()。
如果需要重复执行,可以考虑使用StringBuilder。它没有replaceAll(),但您可以使用它来操作字符数组( - >没有内存分配)。
那说:
如果只执行0.1秒,没有理由浪费大量时间让这段代码快速运行。
如果仍然存在性能问题,请考虑将所有文本文件放入JAR中,将其添加到类路径中并使用Class.getResourceAsStream()来读取文件。从Java类路径加载东西是高度优化的。
答案 2 :(得分:1)
这很大程度上取决于文本文件的内部结构以及您打算用它们做什么。
文件是键值字典(即“属性”文件)吗? XML? JSON?你有那些标准结构。
如果它们具有正式结构,您也可以使用JavaCC来构建文件的对象表示。
否则,如果它们只是数据blob,那么,读取文件并将它们放在一个字符串中。
编辑关于搜索&amp; replace-juste使用String's replaceAll function。
答案 3 :(得分:1)
在搜索谷歌搜索有关Java速度的现有测试之后,我必须说TofuBear的测试用例完全打开了我的眼睛。你必须在自己的平台上运行他的测试,看看你最快的是什么。
在运行他的测试并添加一些我自己的(感谢TofuBear用于发布他的原始代码)之后,看起来你可以通过使用自己的自定义缓冲区而不是使用BufferedInputStream来获得更快的速度。
令我沮丧的是,NIO ByteBuffer表现不佳。
注意:静态byte []缓冲区削减了几毫秒,但静态ByteBuffers实际上增加了处理时间! 代码有什么问题吗?
我添加了一些测试:
ArrayTest_CustomBuffering(直接将数据读入我自己的缓冲区)
ArrayTest_CustomBuffering_StaticBuffer(将数据读入一个开头只创建一次的静态缓冲区)
FileChannelArrayByteBuffer(使用NIO ByteBuffer并包装自己的byte []数组)
FileChannelAllocateByteBuffer(使用带有.allocate的NIO ByteBuffer)
FileChannelAllocateByteBuffer_StaticBuffer(与4相同,但带有静态缓冲区)
FileChannelAllocateDirectByteBuffer(使用NIO ByteBuffer和.allocateDirect)
FileChannelAllocateDirectByteBuffer_StaticBuffer(与6相同,但带有静态缓冲区)
以下是我的结果:在解压缩的rt.jar上使用Windows Vista和jdk1.6.0_13:
ArrayTest中
时间= 2075
bytes = 2120336424
ArrayTest中
时间= 2044
bytes = 2120336424
ArrayTest_CustomBuffering
时间= 1903年
bytes = 2120336424
ArrayTest_CustomBuffering_StaticBuffer
时间= 1872年
bytes = 2120336424
DataInputByteAtATime
时间= 2668
bytes = 2120336424
DataInputReadFully
时间= 2028
bytes = 2120336424
存储器映射
时间= 2901
bytes = 2120336424
FileChannelArrayByteBuffer
时间= 2371
bytes = 2120336424
FileChannelAllocateByteBuffer
时间= 2356
bytes = 2120336424
FileChannelAllocateByteBuffer_StaticBuffer
时间= 2668
bytes = 2120336424
FileChannelAllocateDirectByteBuffer
时间= 2512
bytes = 2120336424
FileChannelAllocateDirectByteBuffer_StaticBuffer
时间= 2590
bytes = 2120336424
我的黑客版TofuBear的代码:
import java.io.BufferedInputStream;
import java.io.DataInputStream;
import java.io.File;
import java.io.FileInputStream;
import java.io.IOException;
import java.io.InputStream;
import java.nio.MappedByteBuffer;
import java.nio.ByteBuffer;
import java.nio.channels.FileChannel;
import java.nio.channels.FileChannel.MapMode;
import java.util.HashSet;
import java.util.Set;
public class Main {
public static void main(final String[] argv) {
ArrayTest.mainx(argv);
ArrayTest.mainx(argv);
ArrayTest_CustomBuffering.mainx(argv);
ArrayTest_CustomBuffering_StaticBuffer.mainx(argv);
DataInputByteAtATime.mainx(argv);
DataInputReadFully.mainx(argv);
MemoryMapped.mainx(argv);
FileChannelArrayByteBuffer.mainx(argv);
FileChannelAllocateByteBuffer.mainx(argv);
FileChannelAllocateByteBuffer_StaticBuffer.mainx(argv);
FileChannelAllocateDirectByteBuffer.mainx(argv);
FileChannelAllocateDirectByteBuffer_StaticBuffer.mainx(argv);
}
}
abstract class Test {
static final int BUFF_SIZE = 20971520;
static final byte[] StaticData = new byte[BUFF_SIZE];
static final ByteBuffer StaticBuffer =ByteBuffer.allocate(BUFF_SIZE);
static final ByteBuffer StaticDirectBuffer = ByteBuffer.allocateDirect(BUFF_SIZE);
public final void run(final File root) {
final Set<File> files;
final long size;
final long start;
final long end;
final long total;
files = new HashSet<File>();
getFiles(root, files);
start = System.currentTimeMillis();
size = readFiles(files);
end = System.currentTimeMillis();
total = end - start;
System.out.println(getClass().getName());
System.out.println("time = " + total);
System.out.println("bytes = " + size);
}
private void getFiles(final File dir,final Set<File> files) {
final File[] childeren;
childeren = dir.listFiles();
for(final File child : childeren) {
if(child.isFile()) {
files.add(child);
}
else {
getFiles(child, files);
}
}
}
private long readFiles(final Set<File> files) {
long size;
size = 0;
for(final File file : files) {
size += readFile(file);
}
return (size);
}
protected abstract long readFile(File file);
}
class ArrayTest extends Test {
public static void mainx(final String[] argv) {
final Test test;
test = new ArrayTest();
test.run(new File(argv[0]));
}
protected long readFile(final File file) {
InputStream stream;
stream = null;
try {
final byte[] data;
int soFar;
int sum;
stream = new BufferedInputStream(new FileInputStream(file));
data = new byte[(int)file.length()];
soFar = 0;
do {
soFar += stream.read(data, soFar, data.length - soFar);
}
while(soFar != data.length);
sum = 0;
for(final byte b : data) {
sum += b;
}
return (sum);
}
catch(final IOException ex) {
ex.printStackTrace();
}
finally {
if(stream != null) {
try {
stream.close();
}
catch(final IOException ex) {
ex.printStackTrace();
}
}
}
return (0);
}
}
class ArrayTest_CustomBuffering extends Test {
public static void mainx(final String[] argv) {
final Test test;
test = new ArrayTest_CustomBuffering();
test.run(new File(argv[0]));
}
protected long readFile(final File file) {
InputStream stream;
stream = null;
try {
final byte[] data;
int soFar;
int sum;
stream = new FileInputStream(file);
data = new byte[(int)file.length()];
soFar = 0;
do {
soFar += stream.read(data, soFar, data.length - soFar);
}
while(soFar != data.length);
sum = 0;
for(final byte b : data) {
sum += b;
}
return (sum);
}
catch(final IOException ex) {
ex.printStackTrace();
}
finally {
if(stream != null) {
try {
stream.close();
}
catch(final IOException ex) {
ex.printStackTrace();
}
}
}
return (0);
}
}
class ArrayTest_CustomBuffering_StaticBuffer extends Test {
public static void mainx(final String[] argv) {
final Test test;
test = new ArrayTest_CustomBuffering_StaticBuffer();
test.run(new File(argv[0]));
}
protected long readFile(final File file) {
InputStream stream;
stream = null;
try {
int soFar;
int sum;
final int fileSize;
stream = new FileInputStream(file);
fileSize = (int)file.length();
soFar = 0;
do {
soFar += stream.read(StaticData, soFar, fileSize - soFar);
}
while(soFar != fileSize);
sum = 0;
for(int i=0;i<fileSize;i++) {
sum += StaticData[i];
}
return (sum);
}
catch(final IOException ex) {
ex.printStackTrace();
}
finally {
if(stream != null) {
try {
stream.close();
}
catch(final IOException ex) {
ex.printStackTrace();
}
}
}
return (0);
}
}
class DataInputByteAtATime extends Test {
public static void mainx(final String[] argv) {
final Test test;
test = new DataInputByteAtATime();
test.run(new File(argv[0]));
}
protected long readFile(final File file) {
DataInputStream stream;
stream = null;
try {
final int fileSize;
int sum;
stream = new DataInputStream(new BufferedInputStream(new FileInputStream(file)));
fileSize = (int)file.length();
sum = 0;
for(int i = 0; i < fileSize; i++) {
sum += stream.readByte();
}
return (sum);
}
catch(final IOException ex) {
ex.printStackTrace();
}
finally {
if(stream != null) {
try {
stream.close();
}
catch(final IOException ex) {
ex.printStackTrace();
}
}
}
return (0);
}
}
class DataInputReadFully extends Test {
public static void mainx(final String[] argv) {
final Test test;
test = new DataInputReadFully();
test.run(new File(argv[0]));
}
protected long readFile(final File file) {
DataInputStream stream;
stream = null;
try {
final byte[] data;
int sum;
stream = new DataInputStream(new BufferedInputStream(new FileInputStream(file)));
data = new byte[(int)file.length()];
stream.readFully(data);
sum = 0;
for(final byte b : data) {
sum += b;
}
return (sum);
}
catch(final IOException ex) {
ex.printStackTrace();
}
finally {
if(stream != null) {
try {
stream.close();
}
catch(final IOException ex) {
ex.printStackTrace();
}
}
}
return (0);
}
}
class DataInputReadInChunks extends Test {
public static void mainx(final String[] argv) {
final Test test;
test = new DataInputReadInChunks();
test.run(new File(argv[0]));
}
protected long readFile(final File file) {
DataInputStream stream;
stream = null;
try {
final byte[] data;
int size;
final int fileSize;
int sum;
stream = new DataInputStream(new BufferedInputStream(new FileInputStream(file)));
fileSize = (int)file.length();
data = new byte[512];
size = 0;
sum = 0;
do {
size += stream.read(data);
sum = 0;
for(int i = 0;
i < size;
i++) {
sum += data[i];
}
}
while(size != fileSize);
return (sum);
}
catch(final IOException ex) {
ex.printStackTrace();
}
finally {
if(stream != null) {
try {
stream.close();
}
catch(final IOException ex) {
ex.printStackTrace();
}
}
}
return (0);
}
}
class MemoryMapped extends Test {
public static void mainx(final String[] argv) {
final Test test;
test = new MemoryMapped();
test.run(new File(argv[0]));
}
protected long readFile(final File file) {
FileInputStream stream;
stream = null;
try {
final FileChannel channel;
final MappedByteBuffer buffer;
final int fileSize;
int sum;
stream = new FileInputStream(file);
channel = stream.getChannel();
buffer = channel.map(MapMode.READ_ONLY, 0, file.length());
fileSize = (int)file.length();
sum = 0;
for(int i = 0; i < fileSize; i++) {
sum += buffer.get();
}
return (sum);
}
catch(final IOException ex) {
ex.printStackTrace();
}
finally {
if(stream != null) {
try {
stream.close();
}
catch(final IOException ex) {
ex.printStackTrace();
}
}
}
return (0);
}
}
class FileChannelArrayByteBuffer extends Test {
public static void mainx(final String[] argv) {
final Test test;
test = new FileChannelArrayByteBuffer();
test.run(new File(argv[0]));
}
protected long readFile(final File file) {
FileInputStream stream;
stream = null;
try {
final byte[] data;
final FileChannel channel;
final ByteBuffer buffer;
int nRead=0;
final int fileSize;
int sum;
stream = new FileInputStream(file);
data = new byte[(int)file.length()];
buffer = ByteBuffer.wrap(data);
channel = stream.getChannel();
fileSize = (int)file.length();
nRead += channel.read(buffer);
buffer.rewind();
sum = 0;
for(int i = 0; i < fileSize; i++) {
sum += buffer.get();
}
return (sum);
}
catch(final IOException ex) {
ex.printStackTrace();
}
finally {
if(stream != null) {
try {
stream.close();
}
catch(final IOException ex) {
ex.printStackTrace();
}
}
}
return (0);
}
}
class FileChannelAllocateByteBuffer extends Test {
public static void mainx(final String[] argv) {
final Test test;
test = new FileChannelAllocateByteBuffer();
test.run(new File(argv[0]));
}
protected long readFile(final File file) {
FileInputStream stream;
stream = null;
try {
final byte[] data;
final FileChannel channel;
final ByteBuffer buffer;
int nRead=0;
final int fileSize;
int sum;
stream = new FileInputStream(file);
//data = new byte[(int)file.length()];
buffer = ByteBuffer.allocate((int)file.length());
channel = stream.getChannel();
fileSize = (int)file.length();
nRead += channel.read(buffer);
buffer.rewind();
sum = 0;
for(int i = 0; i < fileSize; i++) {
sum += buffer.get();
}
return (sum);
}
catch(final IOException ex) {
ex.printStackTrace();
}
finally {
if(stream != null) {
try {
stream.close();
}
catch(final IOException ex) {
ex.printStackTrace();
}
}
}
return (0);
}
}
class FileChannelAllocateDirectByteBuffer extends Test {
public static void mainx(final String[] argv) {
final Test test;
test = new FileChannelAllocateDirectByteBuffer();
test.run(new File(argv[0]));
}
protected long readFile(final File file) {
FileInputStream stream;
stream = null;
try {
final byte[] data;
final FileChannel channel;
final ByteBuffer buffer;
int nRead=0;
final int fileSize;
int sum;
stream = new FileInputStream(file);
//data = new byte[(int)file.length()];
buffer = ByteBuffer.allocateDirect((int)file.length());
channel = stream.getChannel();
fileSize = (int)file.length();
nRead += channel.read(buffer);
buffer.rewind();
sum = 0;
for(int i = 0; i < fileSize; i++) {
sum += buffer.get();
}
return (sum);
}
catch(final IOException ex) {
ex.printStackTrace();
}
finally {
if(stream != null) {
try {
stream.close();
}
catch(final IOException ex) {
ex.printStackTrace();
}
}
}
return (0);
}
}
class FileChannelAllocateByteBuffer_StaticBuffer extends Test {
public static void mainx(final String[] argv) {
final Test test;
test = new FileChannelAllocateByteBuffer_StaticBuffer();
test.run(new File(argv[0]));
}
protected long readFile(final File file) {
FileInputStream stream;
stream = null;
try {
final byte[] data;
final FileChannel channel;
int nRead=0;
final int fileSize;
int sum;
stream = new FileInputStream(file);
//data = new byte[(int)file.length()];
StaticBuffer.clear();
StaticBuffer.limit((int)file.length());
channel = stream.getChannel();
fileSize = (int)file.length();
nRead += channel.read(StaticBuffer);
StaticBuffer.rewind();
sum = 0;
for(int i = 0; i < fileSize; i++) {
sum += StaticBuffer.get();
}
return (sum);
}
catch(final IOException ex) {
ex.printStackTrace();
}
finally {
if(stream != null) {
try {
stream.close();
}
catch(final IOException ex) {
ex.printStackTrace();
}
}
}
return (0);
}
}
class FileChannelAllocateDirectByteBuffer_StaticBuffer extends Test {
public static void mainx(final String[] argv) {
final Test test;
test = new FileChannelAllocateDirectByteBuffer_StaticBuffer();
test.run(new File(argv[0]));
}
protected long readFile(final File file) {
FileInputStream stream;
stream = null;
try {
final byte[] data;
final FileChannel channel;
int nRead=0;
final int fileSize;
int sum;
stream = new FileInputStream(file);
//data = new byte[(int)file.length()];
StaticDirectBuffer.clear();
StaticDirectBuffer.limit((int)file.length());
channel = stream.getChannel();
fileSize = (int)file.length();
nRead += channel.read(StaticDirectBuffer);
StaticDirectBuffer.rewind();
sum = 0;
for(int i = 0; i < fileSize; i++) {
sum += StaticDirectBuffer.get();
}
return (sum);
}
catch(final IOException ex) {
ex.printStackTrace();
}
finally {
if(stream != null) {
try {
stream.close();
}
catch(final IOException ex) {
ex.printStackTrace();
}
}
}
return (0);
}
}
答案 4 :(得分:0)
任何传统方法的速度都会受到限制。我不确定你会看到一种方法与下一种方法有很大不同。
我会专注于可以使整个操作更快的商业技巧。
例如,如果您读取所有文件并将它们存储在一个文件中,并带有来自每个原始文件的时间戳,那么您可以检查是否有任何文件已更改而未实际打开它们。 (换句话说,一个简单的缓存)。
如果您的问题是快速启动GUI,您可能会在显示第一个屏幕后找到在后台线程中打开文件的方法。
操作系统可以很好地处理文件,如果这是批处理过程的一部分(没有用户I / O),你可以从批处理文件开始,该文件在启动java之前将所有文件附加到一个大文件中,使用某些东西像这样:
echo "file1" > file.all
type "file1" >> file.all
echo "file2" >> file.all
type "file2" >> file.all
然后只需打开file.all(我不确定这会有多快,但它可能是我刚才所说条件的最快方法)
我想我只是说,通常情况下,速度问题的解决方案通常需要稍微扩展您的观点并使用新参数完全重新思考解决方案。对现有算法的修改通常只会以可读性为代价提供较小的速度增强。
答案 5 :(得分:0)
您应该能够使用Commons IO FileUtils.readFileToString(File)等标准工具在一秒钟内读取所有文件
您也可以使用writeStringToFile(File,String)来保存修改后的文件。
http://commons.apache.org/io/api-release/index.html?org/apache/commons/io/FileUtils.html
BTW:50不是大量文件。典型的PC可以有100K或更多文件。