在内存中缓存文件并并行读取

时间:2016-02-08 14:59:07

标签: java performance file-io randomaccessfile

我有一个程序(简单的日志解析器),在某些情况下,它必须完全扫描输入文件。所以我认为预先缓存整个文件(~100MB)并用多个线程读取它。

通过实际配置,我使用BufferedReader进行"主读取"和RandomAccessFile转到特定的偏移量并读取我需要的东西。

我试过这种方式:

..
Reader reader = null;
if (cache) {
    // caching file in memory
    br = new BufferedReader(new FileReader(file));
    buffer = new StringBuilder();
    for (String line = br.readLine(); line != null; line = br.readLine()) {
        buffer.append(line).append(CR);
    }
    br.close();
    reader = new StringReader(buffer.toString());
} else {
    reader = new FileReader(file);
}
br = new BufferedReader(reader);
for (String line = br.readLine(); line != null; line = br.readLine()) {
    offset += line.length() + 1; // Il +1 è per il line.separator
    matcher = Constants.PT_BEGIN_COMPOSITION.matcher(line);
    if (matcher.matches()) {
        linecount++;
        record = new Record();
        record.setCompositionCode(matcher.group(1));
        matcher = Constants.PT_PREFIX.matcher(line);
        if (matcher.matches()) {
            record.setBeginComposition(Constants.SDF_DATE.parse(matcher.group(1)));
            record.setProcessId(matcher.group(2));
            if (cache) {
                executor.submit(new PubblicationParser(buffer, offset, record));
            } else {
                executor.submit(new PubblicationParser(file, offset, record));
            }
            records.add(record);
        } else {
            br.close();
            throw new ParseException(line, 0);
        }
    }
}

PubblicationParser中,有init()方法可选择要使用的自定义阅读器。一个RandomAccessFileReader:

if (file != null) {
    this.logReader = new RandomAccessFileReader(file, offset);
} else if (sb != null) {
    this.logReader = new StringBuilderReader(sb, (int) offset);
}

这是我的2个自定义阅读器:

//
public class StringBuilderReader implements LogReader {
    public static final String CR = System.getProperty("line.separator");
    private final StringBuilder sb;
    private int offset;

    public StringBuilderReader(StringBuilder sb, int offset) {
        super();
        this.sb = sb;
        this.offset = offset;
    }

    @Override
    public String readLine() throws IOException {
        if (offset >= sb.length()) {
            return null;
        }
        int indexOf = sb.indexOf(CR, offset);
        if (indexOf < 0) {
            indexOf = sb.length();
        }
        String substring = sb.substring(offset, indexOf);
        offset = indexOf + CR.length();
        return substring;
    }

    @Override
    public void close() throws IOException {
        // TODO Auto-generated method stub
    }
}
//
public class RandomAccessFileReader implements LogReader {
    private static final String FILEMODE_R = "r";
    private final RandomAccessFile raf;

    public RandomAccessFileReader(File file, long offset) throws IOException {
        this.raf = new RandomAccessFile(file, FILEMODE_R);
        this.raf.seek(offset);
    }

    @Override
    public void close() throws IOException {
        raf.close();
    }

    @Override
    public String readLine() throws IOException {
        return raf.readLine();
    }
}

问题在于&#34;缓存方式&#34;太慢了,我理解为什么!

1 个答案:

答案 0 :(得分:1)

您应该确保它确实是I / O使您的应用程序变慢,而不是其他东西(例如解析器中的低效逻辑)。为此,您可以使用Java分析器(例如,JProfiler)。

如果确实是I / O,那么最好使用一些现成的解决方案将文件加载到内存中 - 基本上就是你自己试图实现的。

查看MappedByteBufferByteBuffer