更新

Question

我必须应用于我的流位操作和算术运算的每个字节。

我在代码示例中将for循环识别为输出流的瓶颈，并且喜欢优化它。我只是出于想法;）

    private static final long A = 0x1ABCDE361L;
    private static final long C = 0x87;
    private long x;

     //This method belongs to a class that extends java.io.FilteredOutputStream 
    @Override
    public void write(byte[] buffer, int offset, int length) throws IOException {
        for (int i = 0; i < length; i++) {
            x = A * x + C & 0xffffffffffffL;
            buffer[offset + i] = 
                        (byte) (buffer[offset + i] ^ (x>>>16));
        }

        out.write(buffer, offset, length);
    }

该代码主要用于Android设备。

更新

我寻求至少50％的执行时间。我从CRC32的基准测试中了解到，CRC32#update(byte[] b, int off, int len)在大于30字节的块上比CRC32#update(byte b)快十倍。（我的块大于4096字节）所以，我想我需要一些可以同时处理数组的实现。

Answer 1

32位cpus的后续速度要快一点：

private static final long A = 0x1ABCDE361L;
private static final long C = 0x87;
private long x;

//This method belongs to a class that extends java.io.FilteredOutputStream
@Override
public void write(byte[] buffer, int offset, int length) throws IOException {
    for (int i = 0; i < length; i++) {
        x = A * x + C;
        buffer[offset + i] = (byte) (buffer[offset + i] ^ ((int)x>>>16));
    }

    out.write(buffer, offset, length);
}

由于x向右移16位并且向xor-operation结果的转换为byte，因此{{1}只使用了16到23位。在} 右移操作之前，它可以被转换为32位，在32位cpus上使两个操作更快。

显着优化for循环中的字节操作（通过避免循环？）

更新

1 个答案: