查找具有最多更新信息的文件

时间:2013-01-06 18:16:10

标签: java random-access

我有一个日志文件列表,我需要查找哪一个具有特定行的最新版本,并且所有或者没有可以拥有此行。

文件中的行如下所示:

2013/01/06 16:01:00:283  INFO ag.doLog: xxxx xxxx xxxx xxxx

我需要一条线说

xx/xx/xx xx:xx:xx:xxx  INFO ag.doLog: the line i need

我知道如何获取文件数组,如果我向后扫描,我可以在每个文件中找到最新的最新行(如果存在)。

最大的问题是文件可能很大(2k行?),我想以相对快的方式(几秒钟)找到该行,所以我愿意接受建议。

个人想法: 如果文件在X时间具有该行,那么在X时间之前未找到该行的任何文件都不应再扫描。这将需要同时搜索所有文件,我不知道如何。

Atm代码中断,我想如果内存不足。

代码:

if(files.length>0)  {  //in case no log files exist
    System.out.println("files.length: " + files.length);
    for(int i = 0; i < files.length; i++)  {  ///for each log file look for string
        System.out.println("Reading file: " + i + " " + files[i].getName());
        RandomAccessFile raf = new RandomAccessFile(files[i].getAbsoluteFile(), "r"); //open log file
        long lastSegment = raf.length(); //Finds how long is the files
        lastSegment = raf.length()-5;    //Sets a point to start looking
        String leido = "";
        byte array[] = new byte[1024];    
        /*
        * Going back until we find line or file is empty.
        */
        while(!leido.contains(lineToSearch)||lastSegment>0)  {
            System.out.println("leido: " + leido);
            raf.seek(lastSegment);           //move the to that point
            raf.read(array);                 //Reads 1024 bytes and saves in array
            leido = new String(array);       //Saves what is read as a string
            lastSegment = lastSegment-15;    //move the point a little further back
        }
        if(lastSegment<0)   {
           raf.seek(leido.indexOf(lineToSearch) - 23); //to make sure we get the date (23 characters long) NOTE: it wont be negative. 
           raf.read(array);                 //Reads 1024 bytes and saves in array
           leido = new String(array);       //make the array into a string
           Date date = new SimpleDateFormat("MMMM d, yyyy", Locale.ENGLISH).parse(leido.substring(0, leido.indexOf(" INFO "))); //get only the date part
           System.out.println(date); 
           //if date is bigger than the other save file name
        }
     }
}

1 个答案:

答案 0 :(得分:1)

我发现代码很难验证。可以在后向阅读器中分割任务,该阅读器从文件结束读取行以开始。并使用它来明确解析日期。

请注意,我不是要使用漂亮的代码,而是这样:

public class BackwardsReader implements Closeable {

    private static final int BUFFER_SIZE = 4096;

    private String charset;
    private RandomAccessFile raf;
    private long position;
    private int readIndex;
    private byte[] buffer = new byte[BUFFER_SIZE];

    /**
     * @param file a text file.
     * @param charset with bytes '\r' and '\n' (no wide chars).
     */
    public BackwardsReader(File file, String charset) throws IOException {
        this.charset = charset;
        raf = new RandomAccessFile(file, "r");
        position = raf.length();
    }

    public String readLine() throws IOException {
        if (position + readIndex == 0) {
            raf.close();
            raf = null;
            return null;
        }

        String line = "";
        for (;;) { // Loop adding blocks without newline '\n'.

            // Search line start:

            boolean lineStartFound = false;
            int lineStartIndex = readIndex;
            while (lineStartIndex > 0) {
                if (buffer[lineStartIndex - 1] == (byte)'\n') {
                    lineStartFound = true;
                    break;
                }
                --lineStartIndex;
            }
            String line2;
            try {
                line2 = new String(buffer, lineStartIndex, readIndex - lineStartIndex,
                        charset).replaceFirst("\r?\n?", "");
                readIndex = lineStartIndex;
            } catch (UnsupportedEncodingException ex) {
                Logger.getLogger(BackwardsReader.class.getName())
                        .log(Level.SEVERE, null, ex);
                return null;
            }
            line = line2 + line;
            if (lineStartFound) {
                --readIndex;
                break;
            }

            // Read a prior block:

            int toRead = BUFFER_SIZE;
            if (position - toRead < 0) {
                toRead = (int) position;
            }
            if (toRead == 0) {
                break;
            }
            position -= toRead;
            raf.seek(position);
            raf.readFully(buffer, 0, toRead);
            readIndex = toRead;
            if (buffer[readIndex - 1] == (byte)'\r') {
                --readIndex;
            }
        }
        return line;
    }

    @Override
    public void close() throws IOException {
        if (raf != null) {
            raf.close();
        }
    }
}

一个用法示例:

public static void main(String[] args) {
    try {
        File file = new File(args[0]);
        BackwardsReader reader = new BackwardsReader(file, "UTF-8");
        int lineCount = 0;
        for (;;) {
            String line = reader.readLine();
            if (line == null) {
                break;
            }
            ++lineCount;
            System.out.println(line);
        }
        reader.close();
        System.out.println("Lines: " + lineCount);
    } catch (IOException ex) {
        Logger.getLogger(App.class.getName()).log(Level.SEVERE, null, ex);
    }
}