从Java中[非常非常大]的文件中读取最后一行文本的最快捷,最有效的方法是什么?
答案 0 :(得分:81)
下面是两个函数,一个返回文件的最后一个非空行而不加载或单步执行整个文件,另一个返回文件的最后N行而不单步执行整个文件:强>
尾巴的作用是直接缩放到文件的最后一个字符,然后逐个字符地逐个步骤,记录它看到的内容,直到找到换行符。一旦找到换行符,它就会突破循环。反转记录的内容并将其抛入字符串并返回。 0xA是新行,0xD是回车符。
如果您的行结尾为\r\n
或crlf
或其他“双换行样式换行符”,那么您必须指定n * 2行才能获得最后n行,因为它会计算2行每一行。
public String tail( File file ) {
RandomAccessFile fileHandler = null;
try {
fileHandler = new RandomAccessFile( file, "r" );
long fileLength = fileHandler.length() - 1;
StringBuilder sb = new StringBuilder();
for(long filePointer = fileLength; filePointer != -1; filePointer--){
fileHandler.seek( filePointer );
int readByte = fileHandler.readByte();
if( readByte == 0xA ) {
if( filePointer == fileLength ) {
continue;
}
break;
} else if( readByte == 0xD ) {
if( filePointer == fileLength - 1 ) {
continue;
}
break;
}
sb.append( ( char ) readByte );
}
String lastLine = sb.reverse().toString();
return lastLine;
} catch( java.io.FileNotFoundException e ) {
e.printStackTrace();
return null;
} catch( java.io.IOException e ) {
e.printStackTrace();
return null;
} finally {
if (fileHandler != null )
try {
fileHandler.close();
} catch (IOException e) {
/* ignore */
}
}
}
但你可能不想要最后一行,你想要最后N行,所以请改用它:
public String tail2( File file, int lines) {
java.io.RandomAccessFile fileHandler = null;
try {
fileHandler =
new java.io.RandomAccessFile( file, "r" );
long fileLength = fileHandler.length() - 1;
StringBuilder sb = new StringBuilder();
int line = 0;
for(long filePointer = fileLength; filePointer != -1; filePointer--){
fileHandler.seek( filePointer );
int readByte = fileHandler.readByte();
if( readByte == 0xA ) {
if (filePointer < fileLength) {
line = line + 1;
}
} else if( readByte == 0xD ) {
if (filePointer < fileLength-1) {
line = line + 1;
}
}
if (line >= lines) {
break;
}
sb.append( ( char ) readByte );
}
String lastLine = sb.reverse().toString();
return lastLine;
} catch( java.io.FileNotFoundException e ) {
e.printStackTrace();
return null;
} catch( java.io.IOException e ) {
e.printStackTrace();
return null;
}
finally {
if (fileHandler != null )
try {
fileHandler.close();
} catch (IOException e) {
}
}
}
调用上述方法:
File file = new File("D:\\stuff\\huge.log");
System.out.println(tail(file));
System.out.println(tail2(file, 10));
警告强> 在unicode的狂野西部,这段代码可能会导致此函数的输出错误。例如“Mary?s”而不是“Mary's”。具有hats, accents, Chinese characters等字符可能会导致输出错误,因为重音符号会在字符后添加为修饰符。反转复合字符会改变反转时字符身份的性质。您必须对计划使用的所有语言进行全面的测试。
有关此unicode反转问题的详细信息,请阅读以下内容: http://msmvps.com/blogs/jon_skeet/archive/2009/11/02/omg-ponies-aka-humanity-epic-fail.aspx
答案 1 :(得分:28)
Apache Commons使用RandomAccessFile进行实现。
答案 2 :(得分:18)
查看我对similar question for C#的回答。代码非常相似,尽管Java中的编码支持有些不同。
基本上,一般来说这不是一件非常容易的事情。正如MSalter指出的那样,UTF-8确实可以很容易地发现\r
或\n
,因为这些字符的UTF-8表示与ASCII相同,并且这些字节不会出现在多个字节。
所以基本上,取一个(比方说)2K的缓冲区,然后逐步向后读(在你之前跳到2K,读下一个2K)检查线路终止。然后跳到流中正确的位置,在顶部创建InputStreamReader
,在其顶部创建BufferedReader
。然后拨打BufferedReader.readLine()
。
答案 3 :(得分:3)
使用FileReader或FileInputStream不起作用 - 您必须使用FileChannel或RandomAccessFile从末尾向后循环文件。但乔恩说,编码将是一个问题。
答案 4 :(得分:1)
您可以轻松更改以下代码以打印最后一行。
用于打印最后5行的MemoryMappedFile:
private static void printByMemoryMappedFile(File file) throws FileNotFoundException, IOException{
FileInputStream fileInputStream=new FileInputStream(file);
FileChannel channel=fileInputStream.getChannel();
ByteBuffer buffer=channel.map(FileChannel.MapMode.READ_ONLY, 0, channel.size());
buffer.position((int)channel.size());
int count=0;
StringBuilder builder=new StringBuilder();
for(long i=channel.size()-1;i>=0;i--){
char c=(char)buffer.get((int)i);
builder.append(c);
if(c=='\n'){
if(count==5)break;
count++;
builder.reverse();
System.out.println(builder.toString());
builder=null;
builder=new StringBuilder();
}
}
channel.close();
}
RandomAccessFile打印最后5行:
private static void printByRandomAcessFile(File file) throws FileNotFoundException, IOException{
RandomAccessFile randomAccessFile = new RandomAccessFile(file, "r");
int lines = 0;
StringBuilder builder = new StringBuilder();
long length = file.length();
length--;
randomAccessFile.seek(length);
for(long seek = length; seek >= 0; --seek){
randomAccessFile.seek(seek);
char c = (char)randomAccessFile.read();
builder.append(c);
if(c == '\n'){
builder = builder.reverse();
System.out.println(builder.toString());
lines++;
builder = null;
builder = new StringBuilder();
if (lines == 5){
break;
}
}
}
}
答案 5 :(得分:1)
Path path = Paths.get(pathString);
List<String> allLines = Files.readAllLines(path);
return allLines.get(allLines.size()-1);
答案 6 :(得分:0)
在 C#中,您应该能够设置流的位置:
来自:http://bytes.com/groups/net-c/269090-streamreader-read-last-line-text-file
using(FileStream fs = File.OpenRead("c:\\file.dat"))
{
using(StreamReader sr = new StreamReader(fs))
{
sr.BaseStream.Position = fs.Length - 4;
if(sr.ReadToEnd() == "DONE")
// match
}
}
答案 7 :(得分:0)
try(BufferedReader reader = new BufferedReader(new FileReader(reqFile))) {
String line = null;
System.out.println("======================================");
line = reader.readLine(); //Read Line ONE
line = reader.readLine(); //Read Line TWO
System.out.println("first line : " + line);
//Length of one line if lines are of even length
int len = line.length();
//skip to the end - 3 lines
reader.skip((reqFile.length() - (len*3)));
//Searched to the last line for the date I was looking for.
while((line = reader.readLine()) != null){
System.out.println("FROM LINE : " + line);
String date = line.substring(0,line.indexOf(","));
System.out.println("DATE : " + date); //BAM!!!!!!!!!!!!!!
}
System.out.println(reqFile.getName() + " Read(" + reqFile.length()/(1000) + "KB)");
System.out.println("======================================");
} catch (IOException x) {
x.printStackTrace();
}
答案 8 :(得分:0)
读取文本文件最后一行的最快方法是使用FileUtils Apache类,该类位于“ org.apache.commons.io”中。我有两百万行的文件,通过使用此类,我花了不到一秒钟的时间找到了最后一行。这是我的代码:
LineIterator lineIterator = FileUtils.lineIterator(newFile(filePath),"UTF-8");
String lastLine="";
while (lineIterator.hasNext()){
lastLine= lineIterator.nextLine();
}
答案 9 :(得分:0)
为了避免与恢复字符串(或 StringBuilder)相关的 Unicode 问题,如 Eric Leschinski 优秀答案中所述,可以读取一个字节列表,从文件末尾,将其恢复为一个字节数组,然后从字节数组创建字符串。
以下是对 Eric Leschinski 答案代码的更改,以使用字节数组来完成。代码更改位于注释的代码行下方:
static public String tail2(File file, int lines) {
java.io.RandomAccessFile fileHandler = null;
try {
fileHandler = new java.io.RandomAccessFile( file, "r" );
long fileLength = fileHandler.length() - 1;
//StringBuilder sb = new StringBuilder();
List<Byte> sb = new ArrayList<>();
int line = 0;
for(long filePointer = fileLength; filePointer != -1; filePointer--){
fileHandler.seek( filePointer );
int readByte = fileHandler.readByte();
if( readByte == 0xA ) {
if (filePointer < fileLength) {
line = line + 1;
}
} else if( readByte == 0xD ) {
if (filePointer < fileLength-1) {
line = line + 1;
}
}
if (line >= lines) {
break;
}
//sb.add( (char) readByte );
sb.add( (byte) readByte );
}
//String lastLine = sb.reverse().toString();
//Revert byte array and create String
byte[] bytes = new byte[sb.size()];
for (int i=0; i<sb.size(); i++) bytes[sb.size()-1-i] = sb.get(i);
String lastLine = new String(bytes);
return lastLine;
} catch( java.io.FileNotFoundException e ) {
e.printStackTrace();
return null;
} catch( java.io.IOException e ) {
e.printStackTrace();
return null;
}
finally {
if (fileHandler != null )
try {
fileHandler.close();
} catch (IOException e) {
}
}
}