在java中苦苦挣扎 - 将long分解为long []的bitmasks

时间:2010-10-18 17:00:34

标签: java bit-manipulation

我正在通过

将一个long分解为long []的单位long
public static int decompose(long[] buffer, long base) {
  int count = Long.bitCount(base);
  for (int i=0; i<count; i++) {
    base ^= (buffer[i] = Long.lowestOneBit(base));
  }
  return count;
}

但我觉得可能有更快的方法来做到这一点,因为似乎有一些重复的步骤。例如,计算这些位应该已经非常接近获得填充结果所需的所有信息。

有什么建议吗?我熟悉过早的优化口头禅,因此为什么我不会在我的时间推进我的解决方案,但也许其他人已经看过这个或者沉迷于优化...编辑:请通过{运行任何建议{3}}。我的第一次暗示实际上是在坚持下去,我感到有些惊讶。

测试代码,它位于JUnit方法中:

Random rng = new Random();
long size = 0;
long[] hold = new long[Long.SIZE];
System.out.println("init:"+Long.toBinaryString(BitTwiddling.bitmask(rng.nextInt(Long.SIZE)))); //initialize BitTwiddling internals
long start = System.currentTimeMillis();
for (int i=0; i<compareIterations; i++) size+= BitTwiddling.decompose(hold,rng.nextInt(Integer.MAX_VALUE));
long delta = System.currentTimeMillis() - start;
double base = (double)size/delta;

size = 0;
start = System.currentTimeMillis();
for (int i=0; i<compareIterations; i++) size += BitTwiddling.decomposeAlt(hold, rng.nextInt(Integer.MAX_VALUE));
long delta2 = System.currentTimeMillis() - start;
double base2 = (double)size/delta2;

然后比较base和base2

5 个答案:

答案 0 :(得分:3)

在尝试了很长时间并且难以击败原始代码之后,我找到了一个通过无耻地窃取您的代码并进行调整来持续击败您的代码(在我的环境中)。

    protected int decompose(long[] buffer, long base) {
        int count = Long.bitCount(base);
        for (int i = 0; i < count; i++) {
            base -= (buffer[i] = Long.lowestOneBit(base));
        }
        return count;
    }

唯一区别:使用减法而不是xor。我认为没有任何边缘案件;我当然没有找到任何。不知道为什么那些没有被优化到同一个东西(假设不是边缘情况),但它确实使我的速度更快(再次,在我的机器上)。我已将此版本添加到测试答案中。

编辑我想它无法优化的原因是因为编译器不知道操作数总是小于基数(因为我们从低位到高位)

编辑#2 设置-server标志后,Carl的速度一直比我的快一点。

编辑#3 是的,当在64位JVM上运行时,速度要快得多:

    protected int decompose(long[] buffer, long base) {
        int count = 0;
        while ( 0 != (base -= (buffer[count++] = Long.lowestOneBit(base))));
        return count;
    }

(或等效的xor版本),因为它可以使用本机处理器操作执行64位比较。

答案 1 :(得分:2)

由于我喜欢优化*,这里有一个版本,你可以试试:

public static int decompose(long[] buffer, long base) {
    int count = 0;
    while (base != 0) {
        long mask = base & (-base);
        base &= ~mask;
        buffer[count++] = mask;
    }
    return count;
}

我做的主要事情是:

  • 内联最低一位计算以避免方法调用开销。这可能是(罕见但可能)的情况下的胜利,其中编译器/ JVM不够聪明,无法为您完成....
  • 传递一个long数组作为输入参数,以避免内存分配,并需要对这些位进行计数以确定数组大小。函数现在返回找到的位掩码数。
  • 随着时间的推移归零,这样你就可以尽早摆脱循环

*因为它很有趣,无论它是否过早

答案 2 :(得分:2)

这是我目前正在使用的线束。 我一直得到问题中的代码(方法1),以便更快地执行。目前Mark Peters的答案通常是最快的,但Carl设置-server标志的速度更快。此外,迈克拉在服务器模式下的竞争力更强(尽管仍然比卡尔/马克的速度慢)。

import java.util.Random;

public abstract class Harness {
    public static void main(String[] args) {
        while (true) {
            long[] randomData = new long[4096];
            Random r = new Random();
            for (int i = 0; i < randomData.length; i++) {
                randomData[i] = r.nextLong();
            }
            long[] buffer = new long[64];

            Harness[] versions = new Harness[] {
                    new Control(randomData, buffer),
                    new Carl(randomData, buffer),
                    new MarkPeters(randomData, buffer),
                    new MarkPetersServer64Bit(randomData, buffer),
//                    new Rsp1(randomData, buffer), 
                    new Rsp2(randomData, buffer),
                    new Mikera(randomData, buffer)
                    };
            for (Harness v : versions) {
                v.doTest();
            }
        }
    }

    private final long[] buffer;
    private final long[] randomData;

    protected Harness(long[] randomData, long[] buffer) {
        this.randomData = randomData;
        this.buffer = buffer;
    }

    public void doTest() {
        long start = System.nanoTime();
        long count = 0;
        for (int times = 0; times < 1000; times++) {
            for (int i = 0; i < randomData.length; i++) {
                count = decompose(buffer, randomData[i]);
            }
        }
        long end = System.nanoTime();

        //verify decomposition of last item
        long l = 0;
        for ( int i = 0; i < count; i++ ) {
            l |= buffer[i];
        }

        System.out.println(getClass().getSimpleName() + " took " + (end - start)
                / 1000000.0 + " ms - last base: " + l);
    }

    protected abstract int decompose(long[] buffer, long base);

    private static class Control extends Harness {
        protected Control(long[] randomData, long[] buffer) {
            super(randomData, buffer);
        }

        @Override
        protected int decompose(long[] buffer, long base) {
            return 0;
        }
    }

    private static class Carl extends Harness {
        protected Carl(long[] randomData, long[] buffer) {
            super(randomData, buffer);
        }

        @Override
        protected int decompose(long[] buffer, long base) {
            final int count = Long.bitCount(base);
            for (int i = 0; i < count; i++) {
                base ^= (buffer[i] = Long.lowestOneBit(base));
            }
            return count;
        }
    }

    private static class Mikera extends Harness {
        protected Mikera(long[] randomData, long[] buffer) {
            super(randomData, buffer);
        }

        @Override
        protected int decompose(long[] buffer, long base) {
            int count = 0;
            while (base != 0) {
                long mask = base & (-base);
                base &= ~mask;
                buffer[count++] = mask;
            }
            return count;
        }
    }

    private static class Rsp1 extends Harness {
        protected Rsp1(long[] randomData, long[] buffer) {
            super(randomData, buffer);
        }

        @Override
        protected int decompose(long[] buffer, long base) {
            int count = 0;

            if (0 != (base & 1)) {
                buffer[count++] = 1;
            }

            base >>>= 1;

            for (long bit = 1; 0 < bit && bit <= base; ) {
                if (0 < (base & bit)) {
                    buffer[count++] = (bit <<= 1);
                }
            }
            return count;
        }
    }

    private static class Rsp2 extends Harness {
        protected Rsp2(long[] randomData, long[] buffer) {
            super(randomData, buffer);
        }

        @Override
        protected int decompose(long[] buffer, long base) {
            int count = 0;

            for (long bit = 1; 0 != base; bit <<= 1, base >>>= 1) {
                if (0 < (base & 1)) {
                    buffer[count++] = bit;
                }
            }
            return count;
        }
    }

    private static class MarkPeters extends Harness {
        protected MarkPeters(long[] randomData, long[] buffer) {
            super(randomData, buffer);
        }

        @Override
        protected int decompose(long[] buffer, long base) {
            int count = Long.bitCount(base);
            for (int i = 0; i < count; i++) {
                base -= (buffer[i] = Long.lowestOneBit(base));
            }
            return count;

        }
    }

    private static class MarkPetersServer64Bit extends Harness {
        protected MarkPetersServer64Bit(long[] randomData, long[] buffer) {
            super(randomData, buffer);
        }

        @Override
        protected int decompose(long[] buffer, long base) {
            int count = 0;
            while ( 0 != (base ^= (buffer[count++] = Long.lowestOneBit(base))));
            return count;
        }
    }
}

样本输出

哪种方法最好取决于具体情况。

非服务器,32位JVM

Control took 41.175272 ms - last base: 0
Carl took 691.966919 ms - last base: 5852835112840111303
MarkPeters took 642.230253 ms - last base: 5852835112840111303
MarkPetersServer64Bit took 742.594626 ms - last base: 5852835112840111303
Rsp2 took 3886.203787 ms - last base: 5852835112840111303
Mikera took 1044.451494 ms - last base: 5852835112840111303

获胜者: MarkPeters

服务器,32位JVM

Control took 2.354383 ms - last base: 0
Carl took 508.687401 ms - last base: 338317437500027646
MarkPeters took 521.831297 ms - last base: 338317437500027646
MarkPetersServer64Bit took 727.052206 ms - last base: 338317437500027646
Rsp2 took 3811.75662 ms - last base: 338317437500027646
Mikera took 665.252599 ms - last base: 338317437500027646

获胜者: Carl

服务器(隐式或显式),64位JVM

Control took 0.007186 ms - last base: 0
Carl took 543.701859 ms - last base: -8898262206218882664
MarkPeters took 439.706079 ms - last base: -8898262206218882664
MarkPetersServer64Bit took 391.831055 ms - last base: -8898262206218882664
Rsp2 took 1861.40449 ms - last base: -8898262206218882664
Mikera took 435.964319 ms - last base: -8898262206218882664

获胜者: MarkPetersServer64Bit

注意:据我所知,没有类似的64位JVM可以在非服务器模式下运行。 See here

  

64位Java中是否提供-client和-server VM模式?

     

目前只有Java HotSpot Server VM支持64位操作,而-server选项是隐式的,使用-d64。这可能会在将来的版本中发生变化。

答案 3 :(得分:1)

我会使用一个掩码,使用移位运算符计算每个循环:

for (int i= 0; i < result.length; i++)
    result[i]= base & (1<<i);

既清晰又快速。

答案 4 :(得分:1)

这个版本怎么样?

public static int decompose(long[] buffer, long base) {
    int count = 0;

    if (0 != (base & 1)) {
        buffer[count++] = 1;
    }

    base >>>= 1;

    for (long bit = 1; 0 < bit && bit <= base; ) {
        if (0 < (base & bit)) {
            buffer[count++] = (bit <<= 1);
        }
    }
    return count;
}

在这种情况下,base的符号位使结束条件复杂化,因此我们通过在开始时“调整”基值来防止这种情况发生。

或者不是检查位掩码与基数,我们可以自行移位:

public static int decompose(long[] buffer, long base) {
    int count = 0;

    for (long bit = 1; 0 != base; bit <<= 1, base >>>= 1) {
        if (0 < (base & 1)) {
            buffer[count++] = bit;
        }
    }
    return count;
}