Question

以下是为拖尾文件行'n'而编写的代码。

 <code>

import java.io.RandomAccessFile;
import java.util.HashMap;
import java.util.Map;

class TailCommand {
public static void main(String args[]) {
    int j;
    try {
        /*
         * Receive file name and no of lines to tail as command line
         * argument
         */
        RandomAccessFile randomFile = new RandomAccessFile(args[0], "r");
        long numberOfLines = Long.valueOf(args[1]).longValue();
        long lineno = 0;
        String str;
        String outstr;
        StringBuilder sb = new StringBuilder();
        Map<Long, String> strmap = new HashMap<Long, String>();
        while ((str = randomFile.readLine()) != null) {
            strmap.put(lineno + 1, str);
            lineno++;
        }
        System.out.println("Total no of lines in file is " + lineno);
        long startPosition = lineno - numberOfLines;
        while (startPosition <= lineno) {
            if (strmap.containsKey(startPosition)) {
            // System.out.println("HashMap contains "+  startPosition
                // +" as key");
                outstr = (String) strmap.get(startPosition);
                sb.append(outstr);
                System.out.println(outstr);
            }
            startPosition++;
        }
        // Collection coll = strmap.values();
        // System.out.println(coll+"size"+strmap.size());
        // System.out.println(sb);
    } catch (Exception e) {
        e.printStackTrace();
    }
}
}

我使用了以下方法：要作为命令行参数

接收要挂起的行的文件和否

使用readLine方法获取文件中的总行数
为每个readLine调用使用增量器
将此增量函数和readLinemethod返回的字符串存储在哈希映射
因此整个文件存储在Hash Map
现在，您可以使用哈希映射键来检索特定行号
您可以使用stringbuilder打印特定行的选择

我的怀疑，

我的方法是否有效，我可以将此方法用于大小超过10MB的大文件吗？如果更多人不得不同时从同一个文件中拖尾，我需要做些什么改进？我也可以将StringBuilder用于更大的文件吗？

Answer 1

正如我在回答djna的回答中提到的那样，你并没有非常有效地做到这一点：

您正在阅读整个文件。如果文件很大并且 n 的行很小，那么你只是在浪费时间，I / O以及你有什么。
此外，你在浪费记忆。
没有缓冲（除了RandomAccessFile#readLine() may or may not provide 之外的），这也会导致一些可能的减速。

所以，我要做的就是从块中向后读取文件并分别处理块。

RandomAccessFile raf = new RandomAccessFile(new File(file), "r"); List<String> lines = new ArrayList<String>(); final int chunkSize = 1024 * 32; long end = raf.length(); boolean readMore = true; while (readMore) { byte[] buf = new byte[chunkSize]; // Read a chunk from the end of the file long startPoint = end - chunkSize; long readLen = chunkSize; if (startPoint < 0) { readLen = chunkSize + startPoint; startPoint = 0; } raf.seek(startPoint); readLen = raf.read(buf, 0, (int)readLen); if (readLen <= 0) { break; } // Parse newlines and add them to an array int unparsedSize = (int)readLen; int index = unparsedSize - 1; while (index >= 0) { if (buf[index] == '\n') { int startOfLine = index + 1; int len = (unparsedSize - startOfLine); if (len > 0) { lines.add(new String(buf, startOfLine, len)); } unparsedSize = index + 1; } --index; } // Move end point back by the number of lines we parsed // Note: We have not parsed the first line in the chunked // content because could be a partial line end = end - (chunkSize - unparsedSize); readMore = lines.size() < linesToRead && startPoint != 0; } // Only print the requested number of lines if (linesToRead > lines.size()) { linesToRead = lines.size(); } for (int i = linesToRead - 1; i >= 0; --i) { pw.print(lines.get(i)); }

Answer 2

我的方法是否有效，我可以将此方法用于大小超过10MB的大文件吗？

是的，它是有效的。是的，您“可以”将其用于较大的文件，但由于您始终扫描整个文件，因此文件获得的时间越长，性能就越低。同样地，由于您将整个内容存储在内存中，因此内存需求将一直增加到非常大的文件将开始导致OutOfMemoryError问题的程度。

如果更多人不得不同时从同一个文件拖尾，我需要做些什么改进？

没有，因为你只是拖尾最后n行。每个人都可以简单地运行自己的程序实例。如果您想跟踪该文件，因为随着时间的推移会进行更新（例如tail如果省略-n参数），那么您必须进行一些更改。

我也可以将StringBuilder用于更大的文件吗？

你当然可以，但我不清楚你会得到什么。

就个人而言，我建议按如下方式重构算法：

寻找文件的末尾。
向后解析，直到遇到所需数量的\n个字符。
读取文件末尾，然后打印。

然后就不需要缓冲文件中的每一行，也不需要在非常大的文件大小上降低性能。

Answer 3

好像你将整个文件保存在内存中，你只需要保留“n”行。因此，请分配一个大小为n的数组，将其用作环形缓冲区。

在您展示的代码中，您似乎没有使用StringBuilder，我猜您正在使用构建输出。因为这应该只取决于n，而不是文件的大小我不明白为什么使用StringBuilder应该是一个问题。

Answer 4

你基本上是在内存中读取整个文件 - 要做到这一点，你真的不需要随机访问文件。

如果文件很大，可能不是最佳选择。

为什么不使用HashMap存储（行号，文件中的位置），而不是（行号 - ＆gt;行）。通过这种方式，您可以知道要寻找最后n行的位置。

另一种方法是使用n个字符串的缓冲区（数组） - 到目前为止最后n行。但要小心，在阅读新行时，您不想移动缓冲区中的所有元素（即1-> 0,2-> 1，...，n->（n-1），然后在最后添加新行）。请改用循环缓冲区。（将索引放入缓冲区到结束位置，并在添加新行时覆盖下一个位置。如果位于n-1位置，则下一个为0 - 因此是循环的。）

Answer 5

我已根据上述建议修改了代码：请参阅下面提到的更新代码：

使用的逻辑如下所述：

1.使用文件长度来查看EOF文件    2.从EOF向后移动文件指针并检查是否发生    '\ n'。
   3.如果找到'\ n'，请增加你的行计数器和    把readline的输出放到hashMap
上    4.按降序从hashMap中检索值。我希望    上述方法不会导致内存问题，很明显。    请建议。

                                                                                    import java.io.RandomAccessFile;
   import java.util.HashMap;
   import java.util.Map;

   class NewTailCommand {
    public static void main(String args[]) {
    Map<Long, String> strmap = new HashMap<Long, String>();
    long numberOfLines = Long.valueOf(args[1]).longValue();
    try {
        /*
         * Receive file name and no of lines to tail as command line
         * argument
         */
        RandomAccessFile randomFile = new RandomAccessFile(args[0], "r");

        long filelength = randomFile.length();
        long filepos = filelength - 1;
        long linescovered = 1;
        System.out.println(filepos);
        for (linescovered = 1; linescovered <= numberOfLines; filepos--) {
            randomFile.seek(filepos);
            if (randomFile.readByte() == 0xA)
                if (filepos == filelength - 1)
                    continue;
                else {
                         strmap.put(linescovered,randomFile.readLine());
                    linescovered++;
                }

        }
    } catch (Exception e) {
        e.printStackTrace();
    }
    long startPosition = numberOfLines;
    while (startPosition != 0) {
        if (strmap.containsKey(startPosition)) {
            // System.out.println("HashMap contains "+ startPosition
            // +" as key");
            String outstr = (String) strmap.get(startPosition);
            System.out.println(outstr);
            startPosition--;

        }
    }
}
}

尾部n行文件的Java代码，相当于unix中的tail命令

5 个答案: