无论如何在没有内存开销的情况下在Java中存储位?

时间:2015-10-21 14:17:36

标签: java memory memory-management jvm bitset

从昨天起,这引起了我的兴趣和关注。我试图在Java中存储位并通过Memory Overhead命中。

我的第一个问题是What is size of my Bitset?

根据答案,我查看了其他参考资料,并找到了Memory Usage指南。

然后我查看了BitSet看起来像

的源代码
public class BitSet implements Cloneable, java.io.Serializable {
    /*
     * BitSets are packed into arrays of "words."  Currently a word is
     * a long, which consists of 64 bits, requiring 6 address bits.
     * The choice of word size is determined purely by performance concerns.
     */
    private final static int ADDRESS_BITS_PER_WORD = 6;
    private final static int BITS_PER_WORD = 1 << ADDRESS_BITS_PER_WORD;
    private final static int BIT_INDEX_MASK = BITS_PER_WORD - 1;

    /* Used to shift left or right for a partial word mask */
    private static final long WORD_MASK = 0xffffffffffffffffL;

    /**
     * @serialField bits long[]
     *
     * The bits in this BitSet.  The ith bit is stored in bits[i/64] at
     * bit position i % 64 (where bit position 0 refers to the least
     * significant bit and 63 refers to the most significant bit).
     */
    private static final ObjectStreamField[] serialPersistentFields = {
        new ObjectStreamField("bits", long[].class),
    };

    /**
     * The internal field corresponding to the serialField "bits".
     */
    private long[] words;

    /**
     * The number of words in the logical size of this BitSet.
     */
    private transient int wordsInUse = 0;

    /**
     * Whether the size of "words" is user-specified.  If so, we assume
     * the user knows what he's doing and try harder to preserve it.
     */
    private transient boolean sizeIsSticky = false;

    /* use serialVersionUID from JDK 1.0.2 for interoperability */
    private static final long serialVersionUID = 7997698588986878753L;

    /**
     * Given a bit index, return word index containing it.
     */
    private static int wordIndex(int bitIndex) {
        return bitIndex >> ADDRESS_BITS_PER_WORD;
    }

.....
}

根据基于Memory Guide的计算,这是我计算的

8  Bytes: housekeeping space
12 Bytes: 3 ints
8  Bytes: long
12 Bytes: long[]
4  Bytes: transient int // does it count?
1  Byte : transient boolean
3  Bytes: padding

此总和为45 + 3 bytes (padding to reach multiple of 8)

这意味着空BitSet本身保留48 bytes

但我的要求是存储位,我缺少什么?我有什么选择?

非常感谢

更新

我的要求是,我希望将64 bits的总数存储在两个单独的字段中

class MyClass{
    BitSet timeStamp
    BitSet id
}

我希望在内存中存储数百万个MyClass个对象

3 个答案:

答案 0 :(得分:4)

  

我的要求是我希望将总共64位存储为两位   单独的字段

所以只需使用long(64位整数)。并将其用作一个位域。我曾经需要类似的东西,但32位对我来说已经足够了,所以写了一个小库类来使用int作为位集: https://github.com/claudemartin/smallset

随意分叉,只需用长,32乘64,1乘1L等替换int。

答案 1 :(得分:3)

  

这总计为45 + 3个字节(填充达到8的倍数)这意味着   空BitSet本身保留48个字节。

首先,我想建议正确的工具来分析JVM中的对象布局方案 - JOL。在您的情况下(java -jar jol-cli/target/jol-cli.jar internals java.util.BitSet)JOL产生以下结果:

Running 64-bit HotSpot VM.
Using compressed references with 3-bit shift.
Objects are 8 bytes aligned.
Field sizes by type: 4, 1, 1, 2, 2, 4, 4, 8, 8 [bytes]
Array element sizes: 4, 1, 1, 2, 2, 4, 4, 8, 8 [bytes]

java.util.BitSet object internals:
 OFFSET  SIZE    TYPE DESCRIPTION                    VALUE
      0     4         (object header)                01 00 00 00 (00000001 00000000 00000000 00000000) (1)
      4     4         (object header)                00 00 00 00 (00000000 00000000 00000000 00000000) (0)
      8     4         (object header)                f4 df 9f e0 (11110100 11011111 10011111 11100000) (-526393356)
     12     4     int BitSet.wordsInUse              0
     16     1 boolean BitSet.sizeIsSticky            false
     17     3         (alignment/padding gap)        N/A
     20     4  long[] BitSet.words                   [0]
Instance size: 24 bytes (reported by Instrumentation API)
Space losses: 3 bytes internal + 0 bytes external = 3 bytes total

由于静态字段,您的计算不正确,因此空BitSet本身保留24个字节。请注意,这些计算并非100%准确,因为未将long[]对象的大小考虑在内。所以正确的结果是java -jar jol-cli/target/jol-cli.jar externals java.util.BitSet

Running 64-bit HotSpot VM.
Using compressed references with 3-bit shift.
Objects are 8 bytes aligned.
Field sizes by type: 4, 1, 1, 2, 2, 4, 4, 8, 8 [bytes]
Array element sizes: 4, 1, 1, 2, 2, 4, 4, 8, 8 [bytes]

java.util.BitSet@6b25f76bd object externals:
          ADDRESS       SIZE TYPE             PATH                           VALUE
        7ae321a48         24 java.util.BitSet                                (object)
        7ae321a60         24 [J               .words                         [0]

这意味着空BitSet本身使用48个字节,包括长数组。 为了优化内存占用,您可以编写自己的BitSet实现。例如,在您的用例中,可以使用以下选项:

public class MyOwnBitSet {
    long word1;
    long word2;
}

public class MyOwnBitSet2 {
    long[] word = new long[2];
}

public class MyOwnBitSet3 {
    int index;
}

JOL产生以下结果:

MyOwnBitSet@443b7951d object externals:
          ADDRESS       SIZE TYPE                                                   PATH                           VALUE
        76ea4c7f8         32 MyOwnBitSet                                (object)


MyOwnBitSet2@69663380d object externals:
          ADDRESS       SIZE TYPE                                                    PATH                           VALUE
        76ea53800         16 MyOwnBitSet2                                (object)
        76ea53810         32 [J                                                      .word                          [0, 0]


MyOwnBitSet3@5a2e4553d object externals:
          ADDRESS       SIZE TYPE                                                    PATH                           VALUE
        76ea5c070         16 MyOwnBitSet3                                (object)

让我解释一下最后一个例子MyOwnBitSet3。为了减少内存占用,您可以预先分配大量long / int个对象,并仅将指针存储在右侧单元格上。对于足够多的对象,此选项是最有利的。

答案 2 :(得分:0)

要在对象中存储总共64位

class MyClass{
    int timeStamp
    int id
}

或者如果你不想要对象的开销,你可以做

long timeStampAndId;

问题是如何封装您的操作。对于原始人。 Java没有多大帮助,但你可以做的是

enum TimeStampAndId {
    /* no instances */ ;
    public static boolean isTimeStampSet(long timeStampAndId, int n) { ... }
    public static boolean isIdSet(long timeStampAndId, int n) { ... }

即。使用实用程序类来支持基元类型。

将来Java将支持不会产生对象开销的值类型。