快速阅读文本文件的最后一行?

时间:2009-03-26 15:17:33

标签: java file io

从Java中[非常非常大]的文件中读取最后一行文本的最快捷,最有效的方法是什么?

10 个答案:

答案 0 :(得分:81)

下面是两个函数,一个返回文件的最后一个非空行而不加载或单步执行整个文件,另一个返回文件的最后N行而不单步执行整个文件:

尾巴的作用是直接缩放到文件的最后一个字符,然后逐个字符地逐个步骤,记录它看到的内容,直到找到换行符。一旦找到换行符,它就会突破循环。反转记录的内容并将其抛入字符串并返回。 0xA是新行,0xD是回车符。

如果您的行结尾为\r\ncrlf或其他“双换行样式换行符”,那么您必须指定n * 2行才能获得最后n行,因为它会计算2行每一行。

public String tail( File file ) {
    RandomAccessFile fileHandler = null;
    try {
        fileHandler = new RandomAccessFile( file, "r" );
        long fileLength = fileHandler.length() - 1;
        StringBuilder sb = new StringBuilder();

        for(long filePointer = fileLength; filePointer != -1; filePointer--){
            fileHandler.seek( filePointer );
            int readByte = fileHandler.readByte();

            if( readByte == 0xA ) {
                if( filePointer == fileLength ) {
                    continue;
                }
                break;

            } else if( readByte == 0xD ) {
                if( filePointer == fileLength - 1 ) {
                    continue;
                }
                break;
            }

            sb.append( ( char ) readByte );
        }

        String lastLine = sb.reverse().toString();
        return lastLine;
    } catch( java.io.FileNotFoundException e ) {
        e.printStackTrace();
        return null;
    } catch( java.io.IOException e ) {
        e.printStackTrace();
        return null;
    } finally {
        if (fileHandler != null )
            try {
                fileHandler.close();
            } catch (IOException e) {
                /* ignore */
            }
    }
}

但你可能不想要最后一行,你想要最后N行,所以请改用它:

public String tail2( File file, int lines) {
    java.io.RandomAccessFile fileHandler = null;
    try {
        fileHandler = 
            new java.io.RandomAccessFile( file, "r" );
        long fileLength = fileHandler.length() - 1;
        StringBuilder sb = new StringBuilder();
        int line = 0;

        for(long filePointer = fileLength; filePointer != -1; filePointer--){
            fileHandler.seek( filePointer );
            int readByte = fileHandler.readByte();

             if( readByte == 0xA ) {
                if (filePointer < fileLength) {
                    line = line + 1;
                }
            } else if( readByte == 0xD ) {
                if (filePointer < fileLength-1) {
                    line = line + 1;
                }
            }
            if (line >= lines) {
                break;
            }
            sb.append( ( char ) readByte );
        }

        String lastLine = sb.reverse().toString();
        return lastLine;
    } catch( java.io.FileNotFoundException e ) {
        e.printStackTrace();
        return null;
    } catch( java.io.IOException e ) {
        e.printStackTrace();
        return null;
    }
    finally {
        if (fileHandler != null )
            try {
                fileHandler.close();
            } catch (IOException e) {
            }
    }
}

调用上述方法:

File file = new File("D:\\stuff\\huge.log");
System.out.println(tail(file));
System.out.println(tail2(file, 10));

警告 在unicode的狂野西部,这段代码可能会导致此函数的输出错误。例如“Mary?s”而不是“Mary's”。具有hats, accents, Chinese characters等字符可能会导致输出错误,因为重音符号会在字符后添加为修饰符。反转复合字符会改变反转时字符身份的性质。您必须对计划使用的所有语言进行全面的测试。

有关此unicode反转问题的详细信息,请阅读以下内容: http://msmvps.com/blogs/jon_skeet/archive/2009/11/02/omg-ponies-aka-humanity-epic-fail.aspx

答案 1 :(得分:28)

Apache Commons使用RandomAccessFile进行实现。

它被称为ReversedLinesFileReader

答案 2 :(得分:18)

查看我对similar question for C#的回答。代码非常相似,尽管Java中的编码支持有些不同。

基本上,一般来说这不是一件非常容易的事情。正如MSalter指出的那样,UTF-8确实可以很容易地发现\r\n,因为这些字符的UTF-8表示与ASCII相同,并且这些字节不会出现在多个字节。

所以基本上,取一个(比方说)2K的缓冲区,然后逐步向后读(在你之前跳到2K,读下一个2K)检查线路终止。然后跳到流中正确的位置,在顶部创建InputStreamReader,在其顶部创建BufferedReader。然后拨打BufferedReader.readLine()

答案 3 :(得分:3)

使用FileReader或FileInputStream不起作用 - 您必须使用FileChannelRandomAccessFile从末尾向后循环文件。但乔恩说,编码将是一个问题。

答案 4 :(得分:1)

您可以轻松更改以下代码以打印最后一行。

用于打印最后5行的MemoryMappedFile:

private static void printByMemoryMappedFile(File file) throws FileNotFoundException, IOException{
        FileInputStream fileInputStream=new FileInputStream(file);
        FileChannel channel=fileInputStream.getChannel();
        ByteBuffer buffer=channel.map(FileChannel.MapMode.READ_ONLY, 0, channel.size());
        buffer.position((int)channel.size());
        int count=0;
        StringBuilder builder=new StringBuilder();
        for(long i=channel.size()-1;i>=0;i--){
            char c=(char)buffer.get((int)i);
            builder.append(c);
            if(c=='\n'){
                if(count==5)break;
                count++;
                builder.reverse();
                System.out.println(builder.toString());
                builder=null;
                builder=new StringBuilder();
            }
        }
        channel.close();
    }

RandomAccessFile打印最后5行:

private static void printByRandomAcessFile(File file) throws FileNotFoundException, IOException{
        RandomAccessFile randomAccessFile = new RandomAccessFile(file, "r");
        int lines = 0;
        StringBuilder builder = new StringBuilder();
        long length = file.length();
        length--;
        randomAccessFile.seek(length);
        for(long seek = length; seek >= 0; --seek){
            randomAccessFile.seek(seek);
            char c = (char)randomAccessFile.read();
            builder.append(c);
            if(c == '\n'){
                builder = builder.reverse();
                System.out.println(builder.toString());
                lines++;
                builder = null;
                builder = new StringBuilder();
                if (lines == 5){
                    break;
                }
            }

        }
    }

答案 5 :(得分:1)

Path path = Paths.get(pathString);
      List<String> allLines = Files.readAllLines(path);
      return allLines.get(allLines.size()-1);

答案 6 :(得分:0)

C#中,您应该能够设置流的位置:

来自:http://bytes.com/groups/net-c/269090-streamreader-read-last-line-text-file

using(FileStream fs = File.OpenRead("c:\\file.dat"))
{
    using(StreamReader sr = new StreamReader(fs))
    {
        sr.BaseStream.Position = fs.Length - 4;
        if(sr.ReadToEnd() == "DONE")
            // match
    }
}

答案 7 :(得分:0)

try(BufferedReader reader = new BufferedReader(new FileReader(reqFile))) {

    String line = null;

    System.out.println("======================================");

    line = reader.readLine();       //Read Line ONE
    line = reader.readLine();       //Read Line TWO
    System.out.println("first line : " + line);

    //Length of one line if lines are of even length
    int len = line.length();       

    //skip to the end - 3 lines
    reader.skip((reqFile.length() - (len*3)));

    //Searched to the last line for the date I was looking for.

    while((line = reader.readLine()) != null){

        System.out.println("FROM LINE : " + line);
        String date = line.substring(0,line.indexOf(","));

        System.out.println("DATE : " + date);      //BAM!!!!!!!!!!!!!!
    }

    System.out.println(reqFile.getName() + " Read(" + reqFile.length()/(1000) + "KB)");
    System.out.println("======================================");
} catch (IOException x) {
    x.printStackTrace();
}

答案 8 :(得分:0)

据我所知

读取文本文件最后一行的最快方法是使用FileUtils Apache类,该类位于“ org.apache.commons.io”中。我有两百万行的文件,通过使用此类,我花了不到一秒钟的时间找到了最后一行。这是我的代码:

LineIterator lineIterator = FileUtils.lineIterator(newFile(filePath),"UTF-8");
String lastLine="";
while (lineIterator.hasNext()){
 lastLine=  lineIterator.nextLine();
}

答案 9 :(得分:0)

为了避免与恢复字符串(或 StringBuilder)相关的 Unicode 问题,如 Eric Leschinski 优秀答案中所述,可以读取一个字节列表,从文件末尾,将其恢复为一个字节数组,然后从字节数组创建字符串。

以下是对 Eric Leschinski 答案代码的更改,以使用字节数组来完成。代码更改位于注释的代码行下方:

static public String tail2(File file, int lines) {
    java.io.RandomAccessFile fileHandler = null;
    try {
        fileHandler = new java.io.RandomAccessFile( file, "r" );
        long fileLength = fileHandler.length() - 1;
        //StringBuilder sb = new StringBuilder();
        List<Byte> sb = new ArrayList<>();
        int line = 0;

        for(long filePointer = fileLength; filePointer != -1; filePointer--){
            fileHandler.seek( filePointer );
            int readByte = fileHandler.readByte();

            if( readByte == 0xA ) {
                if (filePointer < fileLength) {
                    line = line + 1;
                }
            } else if( readByte == 0xD ) {
                if (filePointer < fileLength-1) {
                    line = line + 1;
                }
            }
            if (line >= lines) {
                break;
            }
            //sb.add( (char) readByte );
            sb.add( (byte) readByte );
        }

        //String lastLine = sb.reverse().toString();
        //Revert byte array and create String
        byte[] bytes = new byte[sb.size()];
        for (int i=0; i<sb.size(); i++) bytes[sb.size()-1-i] = sb.get(i);
        String lastLine = new String(bytes);
        return lastLine;
    } catch( java.io.FileNotFoundException e ) {
        e.printStackTrace();
        return null;
    } catch( java.io.IOException e ) {
        e.printStackTrace();
        return null;
    }
    finally {
        if (fileHandler != null )
            try {
                fileHandler.close();
            } catch (IOException e) {
            }
    }
}