我必须:-
我在做什么:
问题:
当我阅读包裹在RandomAccessFile周围的BufferedReader时,似乎文件指针在一次调用中向前移动 。但是,如果我直接使用RandomAccessFile.readLine(),则文件指针会正确地逐步向前移动。
使用BufferedReader作为包装器:
RandomAccessFile randomAccessFile = new RandomAccessFile("mybigfile.txt", "r");
BufferedReader brRafReader = new BufferedReader(new FileReader(randomAccessFile.getFD()));
while((line = brRafReader.readLine()) != null) {
System.out.println(line+", Position : "+randomAccessFile.getFilePointer());
}
输出:
Line goes here, Position : 13040
Line goes here, Position : 13040
Line goes here, Position : 13040
Line goes here, Position : 13040
使用直接RandomAccessFile.readLine
RandomAccessFile randomAccessFile = new RandomAccessFile("mybigfile.txt", "r");
while((line = randomAccessFile.readLine()) != null) {
System.out.println(line+", Position : "+randomAccessFile.getFilePointer());
}
输出:(这是预期的。每次读取readline时文件指针都会正确移动)
Line goes here, Position : 11011
Line goes here, Position : 11089
Line goes here, Position : 12090
Line goes here, Position : 13040
谁能告诉我我在做什么错?有什么方法可以使用RandomAccessFile加快阅读速度吗?
答案 0 :(得分:2)
观察到该行为的原因是,顾名思义,BufferedReader
被缓冲。它一次读取一个更大的数据块 (到缓冲区中),并且仅返回缓冲区内容的相关部分,即直到下一个\n
行分隔符的部分。 / p>
我认为,从广义上讲,有两种可能的方法:
对于1.,您将不再使用RandomAccessFile#readLine
。相反,您可以通过
byte buffer[] = new byte[8192];
...
// In a loop:
int read = randomAccessFile.read(buffer);
// Figure out where a line break `\n` appears in the buffer,
// return the resulting lines, and take the position of the `\n`
// into account when storing the "file pointer"
正如模糊的评论所表明的那样:这可能既麻烦又麻烦。您基本上可以重新实现readLine
类中的BufferedReader
方法。在这一点上,我什至不想提起不同行分隔符或字符集可能引起的麻烦。
对于2,您可以简单地访问BufferedReader
的存储缓冲区偏移量的字段。这在下面的示例中实现。当然,这是一个粗略的解决方案,但在此提及并显示为简单的替代方案,具体取决于解决方案的“可持续性”以及您愿意投资多少精力。
import java.io.BufferedReader;
import java.io.FileReader;
import java.io.RandomAccessFile;
import java.lang.reflect.Field;
import java.util.ArrayList;
import java.util.List;
public class LargeFileRead {
public static void main(String[] args) throws Exception {
String fileName = "myBigFile.txt";
long before = System.nanoTime();
List<String> result = readBuffered(fileName);
//List<String> result = readDefault(fileName);
long after = System.nanoTime();
double ms = (after - before) / 1e6;
System.out.println("Reading took " + ms + "ms "
+ "for " + result.size() + " lines");
}
private static List<String> readBuffered(String fileName) throws Exception {
List<String> lines = new ArrayList<String>();
RandomAccessFile randomAccessFile = new RandomAccessFile(fileName, "r");
BufferedReader brRafReader = new BufferedReader(
new FileReader(randomAccessFile.getFD()));
String line = null;
long currentOffset = 0;
long previousOffset = -1;
while ((line = brRafReader.readLine()) != null) {
long fileOffset = randomAccessFile.getFilePointer();
if (fileOffset != previousOffset) {
if (previousOffset != -1) {
currentOffset = previousOffset;
}
previousOffset = fileOffset;
}
int bufferOffset = getOffset(brRafReader);
long realPosition = currentOffset + bufferOffset;
System.out.println("Position : " + realPosition
+ " with FP " + randomAccessFile.getFilePointer()
+ " and offset " + bufferOffset);
lines.add(line);
}
return lines;
}
private static int getOffset(BufferedReader bufferedReader) throws Exception {
Field field = BufferedReader.class.getDeclaredField("nextChar");
int result = 0;
try {
field.setAccessible(true);
result = (Integer) field.get(bufferedReader);
} finally {
field.setAccessible(false);
}
return result;
}
private static List<String> readDefault(String fileName) throws Exception {
List<String> lines = new ArrayList<String>();
RandomAccessFile randomAccessFile = new RandomAccessFile(fileName, "r");
String line = null;
while ((line = randomAccessFile.readLine()) != null) {
System.out.println("Position : " + randomAccessFile.getFilePointer());
lines.add(line);
}
return lines;
}
}
(注意:偏移量似乎仍然偏离1,但这是由于未在该位置考虑行分隔符。如有必要,可以进行调整) >
注意:这只是一个草图。读完后应该正确关闭RandomAccessFile对象,但这取决于在超过时间限制时应该如何中断读,如问题所述
答案 1 :(得分:0)
BufferedReader从文件中读取数据块,默认情况下为8 KB。在缓冲区中查找要返回下一行的换行符。
我想,这就是为什么您看到物理文件位置大幅增加的原因。
RandomAccessFile在读取下一行时将不使用缓冲区。它将逐字节读取。真的很慢。
当您仅使用BufferedReader并记住需要继续的那一行时,性能如何?