Question

我想知道文本文件中每行的偏移量。

现在我试过了，

path=FileSystems.getDefault().getPath(".",filename);
br=Files.newBufferedReader(path_doc_title_index_path, Charset.defaultCharset());
int offset=0; //offset of first line.       
String strline=br.readline();
offset+=strline.length()+1; //offset of second line

通过这种方式，我可以循环遍历整个文件，以了解整个文本文件中行开头的偏移量。但是，如果我使用RandomAccessFile来搜索文件并使用上述方法计算的偏移来访问一行，那么我发现自己处于某个行的中间。这似乎是偏移不正确。

怎么了？这种方法计算偏移是不正确的吗？有什么更好更快的方法吗？

Answer 1

您的代码仅适用于ASCII编码文本。由于某些字符需要多个字节，因此您必须更改以下行

offset += strline.length() + 1;

到

offset += strline.getBytes(Charset.defaultCharset()).length + 1;

如我在下面的评论中所述，您必须指定文件的正确编码。例如。 Charset.forName("UTF-8")此处以及您初始化BufferedReader的位置。

Answer 2

显然，这给了我预期的结果。在下面的程序中，我通过一组通过BufferedReader收集的偏移打印出文件的每一行。这是你的情况吗？

public static void main(String[] args) {
    File readFile = new File("/your/file/here");
    BufferedReader reader = null;
    try
    {
        reader = new BufferedReader( new FileReader(readFile) );
    }
    catch (IOException ioe)
    {
        System.err.println("Error: " + ioe.getMessage());     
    }
    List<Integer> offsets=new ArrayList<Integer>(); //offset of first line.       
    String strline;
    try {
        strline = reader.readLine();
        while(strline!=null){
            offsets.add(strline.length()+System.getProperty("line.separator").length()); //offset of second line
            strline = reader.readLine();
        }
    } catch (IOException e) {
        // TODO Auto-generated catch block
        e.printStackTrace();
    }

    try {
        RandomAccessFile raf = new RandomAccessFile(readFile, "rw");
        for(Integer offset : offsets){
            try {
                raf.seek(offset);
                System.out.println(raf.readLine());
            } catch (IOException e) {
                // TODO Auto-generated catch block
                e.printStackTrace();
            }
        }
    } catch (FileNotFoundException e) {
        // TODO Auto-generated catch block
        e.printStackTrace();
    }   
}

如何知道java中文本文件中一行开头的偏移量？

2 个答案: