目前我有:
问题是:正如您可能已经猜到的那样,我想更好地阅读和解析此文件......
问题:
谢谢:)
答案 0 :(得分:2)
一个900万行文件应该不到几秒钟。大部分时间都花在将数据读入内存中。如何分割数据不太可能产生很大的重要性。
BufferedReader和String.split听起来不错。除非你确定这会有所帮助,否则我不会使用实习。 (它不会为你实习生())
最新版本的Java 6在处理字符串方面有一些性能改进。我会尝试使用Java 6 update 25来查看它是否更快。
编辑:做一些测试发现分裂速度非常慢,你可以改进它。
public static void main(String... args) throws IOException {
long start1 = System.nanoTime();
PrintWriter pw = new PrintWriter("deleteme.txt");
StringBuilder sb = new StringBuilder();
for (int j = 1000; j < 1040; j++)
sb.append(j).append(' ');
String outLine = sb.toString();
for (int i = 0; i < 1000 * 1000; i++)
pw.println(outLine);
pw.close();
long time1 = System.nanoTime() - start1;
System.out.printf("Took %f seconds to write%n", time1 / 1e9);
{
long start = System.nanoTime();
FileReader fr = new FileReader("deleteme.txt");
char[] buffer = new char[1024 * 1024];
while (fr.read(buffer) > 0) ;
fr.close();
long time = System.nanoTime() - start;
System.out.printf("Took %f seconds to read text as fast as possible%n", time / 1e9);
}
{
long start = System.nanoTime();
BufferedReader br = new BufferedReader(new FileReader("deleteme.txt"));
String line;
while ((line = br.readLine()) != null) {
String[] words = line.split(" ");
}
br.close();
long time = System.nanoTime() - start;
System.out.printf("Took %f seconds to read lines and split%n", time / 1e9);
}
{
long start = System.nanoTime();
BufferedReader br = new BufferedReader(new FileReader("deleteme.txt"));
String line;
Pattern splitSpace = Pattern.compile(" ");
while ((line = br.readLine()) != null) {
String[] words = splitSpace.split(line, 0);
}
br.close();
long time = System.nanoTime() - start;
System.out.printf("Took %f seconds to read lines and split (precompiled)%n", time / 1e9);
}
{
long start = System.nanoTime();
BufferedReader br = new BufferedReader(new FileReader("deleteme.txt"));
String line;
List<String> words = new ArrayList<String>();
while ((line = br.readLine()) != null) {
words.clear();
int pos = 0, end;
while ((end = line.indexOf(' ', pos)) >= 0) {
words.add(line.substring(pos, end));
pos = end + 1;
}
// words.
//System.out.println(words);
}
br.close();
long time = System.nanoTime() - start;
System.out.printf("Took %f seconds to read lines and break using indexOf%n", time / 1e9);
}
}
打印
Took 1.757984 seconds to write
Took 1.158652 seconds to read text as fast as possible
Took 6.671587 seconds to read lines and split
Took 4.210100 seconds to read lines and split (precompiled)
Took 1.642296 seconds to read lines and break using indexOf
所以看起来自己拆分字符串是一种改进,让你尽可能快地接近踩踏文本。更快地读取它的唯一方法是将文件视为二进制/ ASCII-7。 ;)