Question

由于某种原因，我无法弄清楚我仅包含位字段的结构的并集所设置的字节数是任何单个结构所必需的两倍。

#include <stdio.h>
#include <stdlib.h>

union instructionSet {
    struct Brane{
        unsigned int opcode: 4;
        unsigned int address: 12;
    } brane;
    struct Cmp{
        unsigned int opcode: 4;
        unsigned int blank: 1;
        unsigned int rsvd: 3;
        unsigned char letter: 8;
    } cmp;
    struct {
        unsigned int rsvd: 16;
    } reserved;
};

int main() {

    union instructionSet IR;// = (union instructionSet*)calloc(1, 2);

    printf("size of union %ld\n", sizeof(union instructionSet));
    printf("size of reserved %ld\n", sizeof(IR.reserved));
    printf("size of brane %ld\n", sizeof(IR.brane));
    printf("size of brane %ld\n", sizeof(IR.cmp));


    return 0;
}

所有对sizeof的调用都返回4，但是据我所知它们应该返回2。

Answer 1

C 2018 6.7.2.1 11允许C实现选择用于位字段的容器的大小：

实现可以分配任何足够大的可寻址存储单元来容纳位字段。如果有足够的空间，则应将紧随结构中另一个位域之后的位域打包到同一单元的相邻位中。如果剩余空间不足，则将实现不当的位字段放入下一个单元中还是将相邻单元重叠进行实现定义。…

您正在使用的实现显然选择使用四字节单元。可能也是实现中的int的大小，表明它对于实现来说是一个方便的大小。

Answer 2

这里有两个问题，首先，您的位域Brane使用的是4字节的unsigned int。

即使只使用一半的位，也仍然使用完整的32位宽度的无符号整数。

第二，您的Cmp位字段使用两种不同的字段类型，因此您将32位unsigned int的8位用于前三个字段，然后将unsigned char用作完整的8位。由于数据对齐规则，此结构将至少为6个字节，但可能会更多。

如果您想优化并集的大小以仅使用16位。您首先需要使用unsigned short，然后才需要始终使用相同的字段类型来将所有内容保留在同一空间中。

类似这样的方法可以完全优化您的联合：

union instructionSet {
    struct Brane{
        unsigned short opcode: 4;
        unsigned short address: 12;
    } brane;
    struct Cmp{
        unsigned short opcode: 4;
        unsigned short blank: 1;
        unsigned short rsvd: 3;
        unsigned short letter: 8;
    } cmp;
    struct {
        unsigned short rsvd: 16;
    } reserved;
};

这将使您的周围大小为2。

Answer 3

阅读有关内存结构填充/内存对齐的信息。默认情况下，32位处理器从内存中读取32位（4字节）是因为速度更快。因此在内存中，char + uint32将被写入4 + 4 = 8个字节（1byte-char，3bytes空间，4bytes uint32）。

在程序的开始和结束处添加这些行，结果将为2。

crew_db(TesBebasTernyata,_,_,_)

这是对编译器的一种说法：将内存对齐为1个字节（默认情况下，在32位处理器上为4个字节）。

PS：尝试使用不同的#pragma pack(1) #pragma unpack集设置此示例：

#pragma pack

Answer 4

未指定此代码将执行的操作，并且在没有特定系统和编译器的情况下对其进行推理也没有意义。在标准中，位字段的定义太差而无法可靠地用于诸如内存布局之类的事情。

union instructionSet {

    /* any number of padding bits may be inserted here */ 

    /* we don't know if what will follow is MSB or LSB */

    struct Brane{
        unsigned int opcode: 4; 
        unsigned int address: 12;
    } brane;
    struct Cmp{
        unsigned int opcode: 4;
        unsigned int blank: 1;
        unsigned int rsvd: 3;
        /* anything can happen here, "letter" can merge with the previous 
           storage unit or get placed in a new storage unit */
        unsigned char letter: 8; // unsigned char does not need to be supported
    } cmp;
    struct {
        unsigned int rsvd: 16;
    } reserved;

    /* any number of padding bits may be inserted here */ 
};

该标准允许编译器为任何位域类型选择一个“存储单元”，该位域类型可以是任何大小。该标准仅声明：

实现可以分配足够大的可寻址存储单元来容纳位域。

我们不知道的事情：

unsigned int类型的位域有多大。 32位可能有意义，但不能保证。
如果允许unsigned char用于位域。
unsigned char类型的位域有多大。可以是8到32之间的任意大小。
如果编译器选择的存储单元小于预期的32位，并且这些位不适合其中，将会发生什么情况。
如果unsigned int位字段遇到unsigned char位字段会发生什么情况。
如果在联合的结尾或开头（对齐）中有填充。
结构中各个存储单元的对齐方式。
MSB的位置。

我们知道的事情：

我们在内存中创建了某种二进制blob。
Blob的第一个字节位于内存中的最低有效地址上。它可能包含数据或填充。

可以通过考虑非常特定的系统和编译器来获得更多的知识。

我们可以使用100％可移植的确定性按位运算来代替位域，无论如何它们都会产生相同的机器代码。

仅位域的结构的并集，sizeof函数加倍字节，C

4 个答案: