Question

我想逐行阅读文件。 BufferedReader比RandomAccessFile或BufferedInputStream快得多。但问题是我不知道我读了多少字节。如何知道字节读取（偏移）？我试过了。

String buffer;
int offset = 0;

while ((buffer = br.readLine()) != null)
    offset += buffer.getBytes().length + 1; // 1 is for line separator

如果文件很小，我就会工作。但是，当文件变大时，偏移量变得小于实际值。我怎样才能抵消？

Answer 1

由于两种影响，BufferedReader没有简单的方法可以做到这一点：字符结束和行结尾。在Windows上，行结尾为\r\n，这是两个字节。在Unix上，行分隔符是单个字节。 BufferedReader会在没有您注意的情况下处理这两种情况，因此在readLine()之后，您将不知道跳过了多少字节。

当您的默认编码和文件中的数据编码意外碰巧相同时，buffer.getBytes()仅返回正确的结果。使用byte[]＆lt; - ＆gt;时任何类型的String转换，您应始终确切指定应使用的编码。

您也无法使用计数InputStream，因为缓冲的读取器以大块读取数据。因此，在读取第一行，例如5个字节后，内部InputStream中的计数器将返回4096，因为读取器总是将多个字节读入其内部缓冲区。

你可以看看NIO。您可以使用低级ByteBuffer来跟踪偏移并将其包装在CharBuffer中以将输入转换为行。

Answer 2

这是应该有用的东西。它假设为UTF-8，但您可以轻松更改它。

import java.io.*;

class main {
    public static void main(final String[] args) throws Exception {
        ByteCountingLineReader r = new ByteCountingLineReader(new ByteArrayInputStream(toUtf8("Hello\r\nWorld\n")));

        String line = null;
        do {
            long count = r.byteCount();
            line = r.readLine();
            System.out.println("Line at byte " + count + ": " + line);
        } while (line != null);

        r.close();
    }

    static class ByteCountingLineReader implements Closeable {
        InputStream in;
        long _byteCount;
        int bufferedByte = -1;
        boolean ended;

        // in should be a buffered stream!
        ByteCountingLineReader(InputStream in) {
            this.in = in;
        }

        ByteCountingLineReader(File f) throws IOException {
            in = new BufferedInputStream(new FileInputStream(f), 65536);
        }

        String readLine() throws IOException {
            ByteArrayOutputStream baos = new ByteArrayOutputStream();
            if (ended) return null;
            while (true) {
                int c = read();
                if (ended && baos.size() == 0) return null;
                if (ended || c == '\n') break;
                if (c == '\r') {
                    c = read();
                    if (c != '\n' && !ended)
                        bufferedByte = c;
                    break;
                }
                baos.write(c);
            }
            return fromUtf8(baos.toByteArray());
        }

        int read() throws IOException {
            if (bufferedByte >= 0) {
                int b = bufferedByte;
                bufferedByte = -1;
                return b;
            }
            int c = in.read();
            if (c < 0) ended = true; else ++_byteCount;
            return c;
        }

        long byteCount() {
            return bufferedByte >= 0 ? _byteCount - 1 : _byteCount;
        }

        public void close() throws IOException {
            if (in != null) try {
                in.close();
            } finally {
                in = null;
            }
        }

        boolean ended() {
            return ended;
        }
    }

    static byte[] toUtf8(String s) {
        try {
            return s.getBytes("UTF-8");
        } catch (Exception __e) {
            throw rethrow(__e);
        }
    }

    static String fromUtf8(byte[] bytes) {
        try {
            return new String(bytes, "UTF-8");
        } catch (Exception __e) {
            throw rethrow(__e);
        }
    }

    static RuntimeException rethrow(Throwable t) {

        throw t instanceof RuntimeException ? (RuntimeException) t : new RuntimeException(t);
    }
}

Answer 3

尝试使用RandomAccessFile

     RandomAccessFile raf = new RandomAccessFile(filePath, "r");
     while ((cur_line = raf.readLine()) != null){
        System.out.println(curr_line);
        // get offset
        long rowIndex = raf.getFilePointer();
     }

按偏移量搜索：

raf.seek(offset);

Answer 4

我想知道你的最终解决方案，但是，我认为使用long类型而不是int可以满足上面代码中的大多数情况。

Answer 5

如果你想逐行阅读文件，我会推荐这段代码：

import java.io.*;
class FileRead 
{
 public static void main(String args[])
  {
  try{
  // Open the file that is the first 
  // command line parameter
  FileInputStream fstream = new FileInputStream("textfile.txt");
  // Use DataInputStream to read binary NOT text.
  BufferedReader br = new BufferedReader(new InputStreamReader(fstream));
  String strLine;
  //Read File Line By Line
  while ((strLine = br.readLine()) != null)   {
  // Print the content on the console
  System.out.println (strLine);
  }
  //Close the input stream
  in.close();
    }catch (Exception e){//Catch exception if any
  System.err.println("Error: " + e.getMessage());
  }
  }
}

过去我总是使用那种方法，效果很好！

来源：Here

如何知道BufferedReader的字节读取（偏移）？

5 个答案: