正在测试实施的代码

Question

在竞争性编程过程中，日志过滤器（使用多种编程语言/技术）我发现从stdin读取Java的性能非常差。

首先，与其他技术相比，我将问题简化为从stdin读取行的性能（尚未进行文本处理或正则表达式）。

受到 Fastest way for line-by-line reading STDIN? 答案的启发，我编写了自己的读卡器，但速度却慢了1.3倍。

正在测试实施的代码

代码

LineReader.java

package org.acme.logfilter;

import java.io.IOException;
import java.io.InputStreamReader;

public class LineReader {

  private static final int DEFAULT_READ_BUFFER_SIZE = 32768;
  private static final int INITIAL_LINE_BUFFER_SIZE = 128;

  private InputStreamReader isr;
  private int lineBufferSize;

  // To buffer the read from the input stream
  private char[] readBuffer;

  // The extracted line
  private char[] lineBuffer;

  // Bytes read from the input stream
  private int readBufferCapacity = 0;

  // Position in the read buffer
  private int readIdx = 0;

  // The line length remembered with the last readLine() 
  private int lineLength = 0;

  public LineReader(InputStreamReader isr) {
    this(isr, DEFAULT_READ_BUFFER_SIZE);
  }

  public LineReader(InputStreamReader isr, int readBufferSize) {
    this.isr = isr;
    this.lineBufferSize = INITIAL_LINE_BUFFER_SIZE;

    this.readBuffer = new char[readBufferSize];
    this.lineBuffer = new char[lineBufferSize];
  }

  public boolean readLine() throws IOException {
    // Copy reference & value for slightly improved performance
    char[] readBuffer = this.readBuffer;
    // A local reference improves performance slightly
    int readIdx = this.readIdx;
    // Index of the (target) line array (equals to the line length)
    int lineIdx = 0;

    while (true) {
      if (readIdx == readBufferCapacity) {
        // Read buffer not filled yet or exceeded
        // (The line buffer might not be complete yet)

        // Reset the read buffer index (it has exceeded)
        readIdx = 0;

        // (Re)fill the buffer ...
        readBufferCapacity = isr.read(readBuffer, 0, readBuffer.length);

        if (readBufferCapacity <= 0) {
          // Though the stream ended, we previously read a line 
          // without CR 
          return lineIdx > 0 ? true : false;
        }
      }

      if (lineIdx == lineBufferSize) {
        // Line buffer is full, create new buffer and "backup" line 

        // Remember current buffer before creating new one
        char[] oldLineBuffer = lineBuffer;
        // Extend by initial size
        lineBufferSize += INITIAL_LINE_BUFFER_SIZE;
        lineBuffer = new char[lineBufferSize];

        // Copy incomplete line to the bigger buffer ... 
        System.arraycopy(oldLineBuffer, 0, lineBuffer, 0, lineIdx);
      }

      char chr = readBuffer[readIdx];
      readIdx++;

      if (chr == '\n') {
        this.lineLength = lineIdx;
        // "Export" localized variables
        this.readIdx = readIdx;
        return true;
      }

      lineBuffer[lineIdx] = chr;
      lineIdx++;    
    }
  }

  public char[] getLine() {
    return lineBuffer;
  }

  public int getLineLength() {
    return lineLength;
  }
}

注意代码

目前可以接受的是，它不能正确处理CRLF新行，这不是问题（因为它的功能越来越差，性能越来越差）。仅故意处理一个char[]缓冲区。这个想法是为了节省任何StringBuffer或重复char[]分配开销和复制。由于使用程序只是为了读取，而不是操作字符串，我认为将char[]作为CharSequence包装为char序列输入到其他方法是个好主意。

如果我只能获得微小的性能优势，那么我永远不会使用这样的代码实现日志过滤器。这仅用于改善BufferedReader的不良表现的过程。

测试类实现

FilterLogStdBufferedReader.java

InputStreamReader isr = new InputStreamReader(System.in);
BufferedReader br = new BufferedReader(isr, 32768 * 1024);

String line;
long lines = 0;

while ((line = br.readLine()) != null) {
  lines++;
}

FilterLogCustomLineparserExt.java

InputStreamReader isr = new InputStreamReader(System.in);
LineReader reader = new LineReader(isr, 32768 * 1024);

long lines = 0;

while (reader.readLine()) {
  lines++;
}

分析结果

`time()`结果

$ time ( cat /ramdisk/1gb.txt | java -cp bin/ org.acme.logfilter.FilterLogStdBufferedReader )

real 8.10
user 6.08
 sys 3.73


$ time ( cat /ramdisk/1gb.txt | java -cp bin/ org.acme.logfilter.FilterLogCustomLineparserExt )

real 9.49
user 7.92
 sys 3.22

平均进行了10次迭代。从ramdisk读取了每行79个字符的1GB文件。

-Xprof

-Xprof概述了JVM如何解释和运行代码（解释代码或执行JIT编译或本机代码所花费的时间）。

结果

FilterLogStdBufferedReader.java

Flat profile of 9.80 secs (768 total ticks): main

  Interpreted + native   Method                        
  0.7%     5  +     0    org.acme.logfilter.FilterLogStdBufferedReader.main
  0.4%     0  +     3    java.io.FileInputStream.available
  0.4%     3  +     0    sun.nio.cs.UTF_8$Decoder.decodeArrayLoop
  0.3%     2  +     0    java.io.BufferedReader.readLine
  ...
  2.2%    13  +     4    Total interpreted

     Compiled + native   Method                        
 45.3%   347  +     1    org.acme.logfilter.FilterLogStdBufferedReader.main
  0.8%     6  +     0    sun.nio.cs.UTF_8$Decoder.decodeArrayLoop
  0.5%     0  +     4    java.io.BufferedReader.readLine
  0.4%     0  +     3    java.io.BufferedReader.readLine
  ...
 47.3%   354  +     9    Total compiled

         Stub + native   Method                        
 33.7%     0  +   259    java.io.FileInputStream.available
 16.7%     0  +   128    java.io.FileInputStream.readBytes
  0.1%     0  +     1    java.lang.System.arraycopy
 50.5%     0  +   388    Total stub


Global summary of 9.80 seconds:
100.0%   777             Received ticks
  1.2%     9             Received GC ticks
  4.4%    34             Compilation

FilterLogCustomLineparserExt.java

Flat profile of 13.88 secs (1017 total ticks): main

  Interpreted + native   Method                        
  0.3%     3  +     0    org.acme.logfilter.FilterLogCustomLineparserExt.main
  0.2%     0  +     2    java.io.FileInputStream.available
  0.2%     2  +     0    org.acme.logfilter.LineReader.readLine
  0.2%     2  +     0    sun.nio.cs.UTF_8$Decoder.decodeArrayLoop
  ...
  1.2%    10  +     2    Total interpreted

     Compiled + native   Method                        
 57.7%   587  +     0    org.acme.logfilter.FilterLogCustomLineparserExt.main
  1.7%    17  +     0    sun.nio.cs.UTF_8$Decoder.decodeArrayLoop
  0.2%     1  +     1    org.acme.logfilter.LineReader.readLine
  ...
 59.8%   606  +     2    Total compiled

         Stub + native   Method                        
 24.0%     0  +   244    java.io.FileInputStream.available
 14.8%     0  +   151    java.io.FileInputStream.readBytes
  0.2%     0  +     2    java.lang.System.arraycopy
 39.0%     0  +   397    Total stub


Global summary of 13.88 seconds:
100.0%  1018             Received ticks
  2.7%    27             Compilation

（为简洁起见，我删除了百分比<= 0.1％的行块，并用“......”替换它们。）

观察/结论

观察：

JVM花费更多时间编译FilterLogStdBufferedReader，
JVM花费更多时间执行编译代码而不是执行FilterLogCustomLineparserExt中的本机代码，

sun.nio.cs.UTF_8$Decoder.decodeArrayLoop

FilterLogCustomLineparserExt被更频繁地调用或找到更长时间有效，
在两种实现中，解释代码的时间可以忽略不计，

结论：

LineReader以使JVM及时编译更多代码（解释更少）和
LineReader应该进行优化以执行“不必要的”内容，以便（编译的）代码不会“浪费”那么多时间

hprof = cpu =次数结果

cpu=times计算对方法的调用，并计算调用对CPU时间的影响。

结果

的BufferedReader

$ cat /ramdisk/1gb.txt | java -agentlib:hprof=cpu=times,file=stdbufferedreader.hprof.txt -cp bin/ org.acme.logfilter.FilterLogStdBufferedReader

CPU TIME (ms) BEGIN (total = 321694) Sat Aug 26 09:42:52 2017
rank   self  accum   count trace method
   1 28.49% 28.49% 13107201 301905 java.io.BufferedReader.readLine
   2 17.69% 46.17% 13107201 301906 java.io.BufferedReader.readLine
   3 17.59% 63.77% 13107154 301904 java.lang.String.<init>
   4 10.07% 73.84%       1 302038 org.acme.logfilter.FilterLogStdBufferedReader.main
   5  7.86% 81.70% 13107154 301903 java.util.Arrays.copyOfRange
   6  7.31% 89.01% 13107201 301826 java.io.BufferedReader.ensureOpen
   7  1.86% 90.87%  128061 301866 sun.nio.cs.UTF_8$Decoder.decodeArrayLoop
   8  1.00% 91.87%  128001 301894 sun.nio.cs.StreamDecoder.readBytes
   9  0.97% 92.84%  128001 301880 java.nio.HeapByteBuffer.compact
  10  0.67% 93.51%      61 301898 sun.nio.cs.StreamDecoder.implRead
  11  0.66% 94.17%  128001 301888 java.io.FileInputStream.read
  12  0.48% 94.65%  128061 301849 sun.nio.cs.UTF_8.updatePositions
  13  0.41% 95.07%  128001 301889 java.io.BufferedInputStream.read1
  ...

LineReader（自定义实施）

$ cat /ramdisk/1gb.txt | java -agentlib:hprof=cpu=times,file=custom.hprof.txt -cp bin/ org.acme.logfilter.FilterLogCustomLineparserExt

CPU TIME (ms) BEGIN (total = 103141) Sat Aug 26 09:39:02 2017
rank   self  accum   count trace method
   1 34.11% 34.11% 13107201 301921 org.acme.logfilter.LineReader.readLine
   2 31.22% 65.32%       1 302011 org.acme.logfilter.FilterLogCustomLineparserExt.main
   3  5.75% 71.07%  128040 301886 sun.nio.cs.UTF_8$Decoder.decodeArrayLoop
   4  3.10% 74.17%  128001 301914 sun.nio.cs.StreamDecoder.readBytes
   5  3.01% 77.18%  128001 301900 java.nio.HeapByteBuffer.compact
   6  2.65% 79.83%  128001 301908 java.io.FileInputStream.read
   7  2.10% 81.93%      40 301918 sun.nio.cs.StreamDecoder.implRead
   8  1.46% 83.38%  128040 301869 sun.nio.cs.UTF_8.updatePositions
   9  1.24% 84.63%  128040 301890 java.nio.charset.CharsetDecoder.decode
  10  1.20% 85.83%  128001 301909 java.io.BufferedInputStream.read1
  11  1.17% 86.99%  128040 301887 sun.nio.cs.UTF_8$Decoder.decodeLoop
  12  0.91% 87.90%  127971 301916 java.io.BufferedInputStream.available
  13  0.85% 88.76%  128001 301910 java.io.BufferedInputStream.read
  14  0.61% 89.36%  127971 301917 sun.nio.cs.StreamDecoder.inReady
  15  0.53% 89.90%  128040 301885 sun.nio.cs.UTF_8$Decoder.xflow
  16  0.52% 90.42%  128040 301870 sun.nio.cs.UTF_8.access$200
  17  0.48% 90.90%  256080 301867 java.nio.Buffer.position
  18  0.46% 91.36%  256080 301860 java.nio.ByteBuffer.arrayOffset
  19  0.44% 91.80%  256080 301861 java.nio.Buffer.position
  20  0.44% 92.24%  256002 301894 java.nio.HeapByteBuffer.ix
  21  0.43% 92.68%  256080 301862 java.nio.Buffer.limit
  22  0.43% 93.11%  256002 301895 java.nio.Buffer.remaining
  23  0.42% 93.53%  256080 301864 java.nio.CharBuffer.arrayOffset
  ...

观察/结论

观察：

自定义实现在readLine()中花费更多时间。
自定义实现中的CPU时间缩短了三倍（total = 103141）。

结论：

自定义实现不会经常意外调用本机代码。
当对已分析的执行进行计时时，CPU时间值与user时间匹配。我认为这是由于BufferedReader实现运行时间更长，因为代码更多，因此更多的工具。这与没有剖析的反向运行时间并不矛盾。

到目前为止的优化尝试

让lineIdx和readIdx本地帮助提高当前（仍然很差）状态的性能
用CharSequence直接返回的readLine()取代多个getter（它的性能无显着降低）

问题（S）

我对剖析器结果的解释是否正确？

与LineReader相比，BufferedReader有哪些属性使其表现如此糟糕？StringBuffers一次又一次地创建char[]和{{1}}个实例并不断复制数据？

如何改进实施？

Answer 1

您的LineReader实施存在许多问题，使其不够理想。

首先，readLine是一个具有复杂控制流的大型方法，这使得JVM难以应用优化。
lineBuffer逐个字符填充，而使用批量复制则更快。
访问readBuffer和lineBuffer数组时，索引变量没有明显的约束，因此JVM将对每个数组操作发出数组边界检查。

我的建议是：

使用短的单独循环查找\n字符的索引。它将受益于许多JIT优化，如循环展开，数组边界检查消除，更好的寄存器分配等。
找到\n后，立即使用System.arraycopy填充lineBuffer。

这是一个不完全正常的示例，但它可能会让您了解它的外观。

public boolean readLine() throws IOException {
    do {
        int cr = findCR(readBuffer, readIdx, readBufferCapacity);
        if (cr >= 0) {
            lineLength = cr - readIdx - 1;
            System.arraycopy(readBuffer, readIdx, lineBuffer, 0, lineLength);
            readIdx = cr;
            return true;
        }
    } while (refill());
    return false;
}

private int findCR(char[] readBuffer, int pos, int limit) {
    // Ensuring that limit <= readBuffer.length helps JIT to eliminate array bounds check
    limit = Math.min(limit, readBuffer.length);
    while (pos < limit) {
        if (readBuffer[pos++] == '\n') {
            return pos;
        }
    }
    return -1;
}

旁注

您的缓冲区大小太大，会对CPU缓存产生负面影响。 32K和256K之间的性能应该更好。
不要使用hprof，它会修改运行的代码并经常会产生扭曲的结果。我相信async-profiler会更精确;它还显示了在本机代码和内核代码中花费的时间。

Answer 2

我在竞争性的编程竞赛中使用下面的代码。这段代码在codechef和网络上共享了一段时间。执行时间将大大减少：）

import java.util.InputMismatchException;
import java.io.*;
public class Solution {

public static void main(String args[]) throws Exception {
    InputReader sc = new InputReader(System.in);
    PrintWriter pw = new PrintWriter(System.out);
    int t = sc.nextInt();
    for(int i=0;i<t;i++){
        //unimplemented.
    }
}

static class InputReader {
    private InputStream stream;
    private byte[] buf = new byte[1024];
    private int curChar;
    private int numChars;
    private SpaceCharFilter filter;

    public InputReader(InputStream stream) {
        this.stream = stream;
    }

    public int read() {
        if (numChars == -1)
            throw new InputMismatchException();

        if (curChar >= numChars) {
            curChar = 0;
            try {
                numChars = stream.read(buf);
            } catch (IOException e) {
                throw new InputMismatchException();
            }

            if (numChars <= 0)
                return -1;
        }
        return buf[curChar++];
    }

    public String nextLine() {
        BufferedReader br = new BufferedReader(new InputStreamReader(System.in));
        String str = "";
        try {
            str = br.readLine();
        } catch (IOException e) {
            e.printStackTrace();
        }
        return str;
    }

    public int nextInt() {
        int c = read();

        while (isSpaceChar(c))
            c = read();

        int sgn = 1;

        if (c == '-') {
            sgn = -1;
            c = read();
        }

        int res = 0;
        do {
            if (c < '0' || c > '9')
                throw new InputMismatchException();
            res *= 10;
            res += c - '0';
            c = read();
        }
        while (!isSpaceChar(c));

        return res * sgn;
    }

    public long nextLong() {
        int c = read();
        while (isSpaceChar(c))
            c = read();
        int sgn = 1;
        if (c == '-') {
            sgn = -1;
            c = read();
        }
        long res = 0;

        do {
            if (c < '0' || c > '9')
                throw new InputMismatchException();
            res *= 10;
            res += c - '0';
            c = read();
        }
        while (!isSpaceChar(c));
        return res * sgn;
    }

    public double nextDouble() {
        int c = read();
        while (isSpaceChar(c))
            c = read();
        int sgn = 1;
        if (c == '-') {
            sgn = -1;
            c = read();
        }
        double res = 0;
        while (!isSpaceChar(c) && c != '.') {
            if (c == 'e' || c == 'E')
                return res * Math.pow(10, nextInt());
            if (c < '0' || c > '9')
                throw new InputMismatchException();
            res *= 10;
            res += c - '0';
            c = read();
        }
        if (c == '.') {
            c = read();
            double m = 1;
            while (!isSpaceChar(c)) {
                if (c == 'e' || c == 'E')
                    return res * Math.pow(10, nextInt());
                if (c < '0' || c > '9')
                    throw new InputMismatchException();
                m /= 10;
                res += (c - '0') * m;
                c = read();
            }
        }
        return res * sgn;
    }

    public String readString() {
        int c = read();
        while (isSpaceChar(c))
            c = read();
        StringBuilder res = new StringBuilder();
        do {
            res.appendCodePoint(c);
            c = read();
        }
        while (!isSpaceChar(c));

        return res.toString();
    }

    public boolean isSpaceChar(int c) {
        if (filter != null)
            return filter.isSpaceChar(c);
        return c == ' ' || c == '\n' || c == '\r' || c == '\t' || c == -1;
    }

    public String next() {
        return readString();
    }

    public interface SpaceCharFilter {
        public boolean isSpaceChar(int ch);
    }
 }

}

通过InputStreamReader从stdin读取行到char []

正在测试实施的代码

代码

注意代码

测试类实现

分析结果

`time()`结果

-Xprof

结果

观察/结论

hprof = cpu =次数结果

结果

观察/结论

到目前为止的优化尝试

问题（S）

2 个答案:

通过InputStreamReader从stdin读取行到char []

正在测试实施的代码

代码

注意代码

测试类实现

分析结果

time()结果

-Xprof

结果

观察/结论

hprof = cpu =次数结果

结果

观察/结论

到目前为止的优化尝试

问题（S）

2 个答案:

`time()`结果