使用位字段将自定义的n位数据写入新的二进制文件中

时间:2019-06-16 18:00:12

标签: c struct bit-fields

我已经阅读了多个有关位域的类似主题,但是我对它的理解不够,因此我无法使用它。这是我的问题。我有这个struct R

struct R{
   unsigned int opcode: 6;
   unsigned int rs: 5;
   unsigned int rt: 5;
   unsigned int rd: 5;
   unsigned int shamt: 5;
   unsigned int funct: 6;
};

我正在使用位字段,以便定义我的结构是32位数据。对于那些想知道的人,此结构表示R类型MIPS指令。

我想要的是将数据写入名为result的文件中,为此我使用了以下代码:

struct R test  = {32,0,11,21,19,0}

FILE *fp = fopen("./result", "rb");
fwrite(&test,sizeof(test),1,result);

使用此代码,如果我在控制台xxd -b result中运行,则希望看到以下内容:

00000000: 00100000 01011000 01110101 00000010

反而我得到

00000000: 00100000 10100000 01110011 00000001

我想问题是fwrite,但我不太了解。

这是一项作业,所以我想到了另一种选择:

  1. 创建一个数组char sequence[32],该数组的索引模拟1位。
  2. 具有数组结构:
struct R{
    char opcode[6];
    char rs[5];
    char rt[5];
    char rd[5];
    char shamt[5];
    char funct[6];
};
  1. 用所有数组的串联构造我的二进制序列。
  2. 将每8行转换为十六进制-例如:00100000给出0x20
  3. 使用putc写入我的文件。

我的选择很长,所以有办法直接做到吗?还是我应该知道另一种选择?

2 个答案:

答案 0 :(得分:5)

标准未说明

正如我在评论中指出的那样,

  

位字段是C标准中令人生厌的部分。他们行为的大多数方面都是实现定义的。特别是,单元中不同字段的映射是实现定义的,因此opcode字段是占据最高6位还是最低6位是实现定义。

请参见C11 §6.7.2.1 Structure and union specifiers,尤其是¶10之后。

C标准没有规定位域的布局;它只是说实现必须记录它的工作。如果您发现当opcode列在第一位时,它的有效位最低,那么就去吧。那就是您的编译器所做的。如果您想以最高有效位来使用它,则可能需要将其移至结构的另一端(并且还需要反转其他字段的顺序)。所有这些都依赖于编译器-尽管编译器可能符合平台ABI。例如,请参见Implementation defined behaviour: Structures, unions, enumerations, and bit-fields上的GCC文档。 GCC在某些地方引用(并顺应)平台ABI。您可以通过Google找到ABI信息-您发现的信息不一定很易读,但是信息在那里。

一些代码来分析您的结构

以下是一些基于您的结构的代码(以及一些二进制数字格式的代码):

#include <stdio.h>
#include <assert.h>

static
void format_binary8v(unsigned char x, int n, char buffer[static 9])
{
    assert(n > 0 && n <= 8);
    int start = 1 << (n - 1);
    for (int b = start; b != 0; b /= 2)
    {
        *buffer++ = ((b & x) != 0) ? '1' : '0';
        x &= ~b;
    }
    *buffer = '\0';
}

static
void format_binary32(unsigned int x, char buffer[static 33])
{
    for (unsigned b = 2147483648; b != 0; b /= 2)
    {
        *buffer++ = ((b & x) != 0) ? '1' : '0';
        x &= ~b;
    }
    *buffer = '\0';
}

struct R
{
    unsigned int opcode : 6;
    unsigned int rs : 5;
    unsigned int rt : 5;
    unsigned int rd : 5;
    unsigned int shamt : 5;
    unsigned int funct : 6;
};

static void dump_R(const char *tag, struct R r)
{
    union X
    {
        struct R r;
        unsigned int i;
    };

    printf("%s:\n", tag);
    union X x = { .r = r };
    char buffer[33];
    format_binary32(x.i, buffer);
    printf("Binary: %s\n", buffer);
    format_binary8v(x.r.opcode, 6, buffer);
    printf(" - opcode: %s\n", buffer);
    format_binary8v(x.r.rs, 5, buffer);
    printf(" - rs:      %s\n", buffer);
    format_binary8v(x.r.rt, 5, buffer);
    printf(" - rt:      %s\n", buffer);
    format_binary8v(x.r.rd, 5, buffer);
    printf(" - rd:      %s\n", buffer);
    format_binary8v(x.r.shamt, 5, buffer);
    printf(" - shamt:   %s\n", buffer);
    format_binary8v(x.r.funct, 6, buffer);
    printf(" - funct:  %s\n", buffer);
}

int main(void)
{
    char filename[] = "filename.bin";
    FILE *fp = fopen(filename, "w+b");
    if (fp == NULL)
    {
        fprintf(stderr, "failed to open file '%s' for reading and writing\n", filename);
        return 1;
    }
    //struct R test  = {32, 0, 11, 21, 19, 0};
    struct R test  = { 32, 7, 11, 21, 19, 3 };

    fwrite(&test, sizeof(test), 1, fp);

    dump_R("test - after write", test);

    rewind(fp);
    fread(&test, sizeof(test), 1, fp);
    dump_R("test - after read", test);

    fclose(fp);
    return 0;
}

在运行带有GCC 9.1.0的macOS 10.14.5 Mojave的MacBook Pro上运行时,我得到:

test - after write:
Binary: 00001110011101010101100111100000
 - opcode: 100000
 - rs:      00111
 - rt:      01011
 - rd:      10101
 - shamt:   10011
 - funct:  000011
test - after read:
Binary: 00001110011101010101100111100000
 - opcode: 100000
 - rs:      00111
 - rt:      01011
 - rd:      10101
 - shamt:   10011
 - funct:  000011

以及原始二进制输出文件:

$ xxd -b filename.bin
00000000: 11100000 01011001 01110101 00001110                    .Yu.
$

我的解释是,在我的机器上,opcode位字段的数据位于存储单元的最低6位中,funct位字段的数据位于存储单元中的最低位6位有效位,其他元素在中间。当查看32位值时,这一点很明显。 xxd -b拆分方式需要更多说明:

  • 第一个字节是最低有效字节— Intel体系结构。
  • 它以最低有效位包含opcode的所有6位;它的最高有效位也包含rs的两个最低有效位。
  • 第二个字节包含rs的三个最高有效位作为其最低有效位,以及rt中的所有5个比特作为其最高有效位。
  • 第三个字节的最低有效位包含rd的所有5位,而最高有效位包含shamt的3个最低有效位。
  • 第四个也是最高有效的再见在其最低有效位中包含shamt的2个最高有效位,在其最高有效位中包含funct的所有6位。

这真让人难以置信!

当我恢复为test结构(struct R test = {32, 0, 11, 21, 19, 0};)的值时,我得到:

test - after write:
Binary: 00000010011101010101100000100000
 - opcode: 100000
 - rs:      00000
 - rt:      01011
 - rd:      10101
 - shamt:   10011
 - funct:  000000
test - after read:
Binary: 00000010011101010101100000100000
 - opcode: 100000
 - rs:      00000
 - rt:      01011
 - rd:      10101
 - shamt:   10011
 - funct:  000000

00000000: 00100000 01011000 01110101 00000010                     Xu.

您的硬件和/或编译器与我的不同;对于位域的布局可能有不同的规则。

请注意,此代码假设未经测试unsignedunsigned int是32位数字。如果您使用的系统不成立,则需要修改代码以使用uint32_t中的uint8_t<stdint.h>等类型(和格式说明符(如<inttypes.h>所示)。

精炼代码

与原始代码相比,该代码在各种方式上都有更好的组织。

#include <assert.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

struct R
{
    unsigned int opcode : 6;
    unsigned int rs     : 5;
    unsigned int rt     : 5;
    unsigned int rd     : 5;
    unsigned int shamt  : 5;
    unsigned int funct  : 6;
};

static void test_r(const char *tag, struct R r, FILE *fp);
static void run_xxd(const char *file);

int main(void)
{
    char filename[] = "filename.bin";
    FILE *fp = fopen(filename, "w+b");
    if (fp == NULL)
    {
        fprintf(stderr, "failed to open file '%s' for reading and writing\n", filename);
        return 1;
    }

    struct R r[]  =
    {
        { 32,  0, 11, 21, 19,  0 },
        { 32,  7, 11, 21, 19,  3 },
        {  6, 21, 10, 14, 10,  8 },
    };
    enum { NUM_R = sizeof(r) / sizeof(r[0]) };

    for (int i = 0; i < NUM_R; i++)
    {
        char name[16];
        snprintf(name, sizeof(name), "r%d", i+1);
        test_r(name, r[i], fp);
    }

    fclose(fp);

    run_xxd(filename);

    return 0;
}

static void run_one_xxd(const char *command, const char *filename)
{
    char cmd[256];
    snprintf(cmd, sizeof(cmd), "%s %s", command, filename);
    printf("\nCommand: %s\n", cmd);
    fflush(stdout);
    system(cmd);
    putchar('\n');
}

static void run_xxd(const char *filename)
{
    run_one_xxd("xxd -c 4 -b     ", filename);
    run_one_xxd("xxd -c 4 -g 1 -u", filename);
}

static void format_binary8v(unsigned char x, int n, char buffer[static 9]);
static void format_binary32(unsigned x, char buffer[static 33]);
static void dump_bitfield(int nbits, unsigned value, const char *name);
static void dump_bytes(const char *tag, struct R r);
static void dump_R(const char *tag, struct R r);

static void test_r(const char *tag, struct R r, FILE *fp)
{
    char buffer[32];
    long offset = sizeof(struct R);
    putchar('\n');
    fwrite(&r, sizeof(r), 1, fp);
    snprintf(buffer, sizeof(buffer), "%s - after write", tag);
    dump_R(buffer, r);
    fseek(fp, -offset, SEEK_CUR);
    struct R s;
    fread(&s, sizeof(s), 1, fp);
    fseek(fp, 0, SEEK_CUR);             // Ready for reading or writing!
    snprintf(buffer, sizeof(buffer), "%s - after read", tag);
    dump_R(buffer, s);
    /* Safe regardless of whether struct R uses all bits in its storage unit */
    assert(r.opcode == s.opcode);
    assert(r.rs     == s.rs    );
    assert(r.rs     == s.rs    );
    assert(r.rs     == s.rs    );
    assert(r.shamt  == s.shamt );
    assert(r.funct  == s.funct );
    /* Only safe because struct R uses all bits of its storage unit */
    assert(memcmp(&r, &s, sizeof(struct R)) == 0);
}

static void dump_R(const char *tag, struct R r)
{
    printf("%s:\n", tag);
    dump_bytes("Binary", r);
    dump_bitfield(6, r.opcode, "opcode");
    dump_bitfield(5, r.rs,     "rs");
    dump_bitfield(5, r.rt,     "rt");
    dump_bitfield(5, r.rd,     "rd");
    dump_bitfield(5, r.shamt,  "shamt");
    dump_bitfield(6, r.funct,  "funct");
}

static void dump_bytes(const char *tag, struct R r)
{
    union X
    {
        struct R r;
        unsigned i;
    };
    union X x = { .r = r };
    char buffer[33];
    printf("%s: 0x%.8X\n", tag, x.i);
    format_binary32(x.i, buffer);
    //printf("%s: MSB %s LSB\n", tag, buffer);
    printf("%s: MSB", tag);
    for (int i = 0; i < 4; i++)
        printf(" %.8s", &buffer[8 * i]);
    puts(" LSB (big-endian)");
    printf("%s: LSB", tag);
    for (int i = 0; i < 4; i++)
        printf(" %.8s", &buffer[8 * (3 - i)]);
    puts(" MSB (little-endian)");
}

static void dump_bitfield(int nbits, unsigned value, const char *name)
{
    assert(nbits > 0 && nbits <= 32);
    char vbuffer[33];
    char nbuffer[8];
    snprintf(nbuffer, sizeof(nbuffer), "%s:", name);
    format_binary8v(value, nbits, vbuffer);
    printf(" - %-7s  %6s  (%u)\n", nbuffer, vbuffer, value);
}

static
void format_binary8v(unsigned char x, int n, char buffer[static 9])
{
    assert(n > 0 && n <= 8);
    int start = 1 << (n - 1);
    for (int b = start; b != 0; b /= 2)
    {
        *buffer++ = ((b & x) != 0) ? '1' : '0';
        x &= ~b;
    }
    *buffer = '\0';
}

static
void format_binary32(unsigned x, char buffer[static 33])
{
    for (unsigned b = 2147483648; b != 0; b /= 2)
    {
        *buffer++ = ((b & x) != 0) ? '1' : '0';
        x &= ~b;
    }
    *buffer = '\0';
}

它产生输出:

r1 - after write:
Binary: 0x02755820
Binary: MSB 00000010 01110101 01011000 00100000 LSB (big-endian)
Binary: LSB 00100000 01011000 01110101 00000010 MSB (little-endian)
 - opcode:  100000  (32)
 - rs:       00000  (0)
 - rt:       01011  (11)
 - rd:       10101  (21)
 - shamt:    10011  (19)
 - funct:   000000  (0)
r1 - after read:
Binary: 0x02755820
Binary: MSB 00000010 01110101 01011000 00100000 LSB (big-endian)
Binary: LSB 00100000 01011000 01110101 00000010 MSB (little-endian)
 - opcode:  100000  (32)
 - rs:       00000  (0)
 - rt:       01011  (11)
 - rd:       10101  (21)
 - shamt:    10011  (19)
 - funct:   000000  (0)

r2 - after write:
Binary: 0x0E7559E0
Binary: MSB 00001110 01110101 01011001 11100000 LSB (big-endian)
Binary: LSB 11100000 01011001 01110101 00001110 MSB (little-endian)
 - opcode:  100000  (32)
 - rs:       00111  (7)
 - rt:       01011  (11)
 - rd:       10101  (21)
 - shamt:    10011  (19)
 - funct:   000011  (3)
r2 - after read:
Binary: 0x0E7559E0
Binary: MSB 00001110 01110101 01011001 11100000 LSB (big-endian)
Binary: LSB 11100000 01011001 01110101 00001110 MSB (little-endian)
 - opcode:  100000  (32)
 - rs:       00111  (7)
 - rt:       01011  (11)
 - rd:       10101  (21)
 - shamt:    10011  (19)
 - funct:   000011  (3)

r3 - after write:
Binary: 0x214E5546
Binary: MSB 00100001 01001110 01010101 01000110 LSB (big-endian)
Binary: LSB 01000110 01010101 01001110 00100001 MSB (little-endian)
 - opcode:  000110  (6)
 - rs:       10101  (21)
 - rt:       01010  (10)
 - rd:       01110  (14)
 - shamt:    01010  (10)
 - funct:   001000  (8)
r3 - after read:
Binary: 0x214E5546
Binary: MSB 00100001 01001110 01010101 01000110 LSB (big-endian)
Binary: LSB 01000110 01010101 01001110 00100001 MSB (little-endian)
 - opcode:  000110  (6)
 - rs:       10101  (21)
 - rt:       01010  (10)
 - rd:       01110  (14)
 - shamt:    01010  (10)
 - funct:   001000  (8)

Command: xxd -c 4 -b      filename.bin
00000000: 00100000 01011000 01110101 00000010   Xu.
00000004: 11100000 01011001 01110101 00001110  .Yu.
00000008: 01000110 01010101 01001110 00100001  FUN!

Command: xxd -c 4 -g 1 -u filename.bin
00000000: 20 58 75 02   Xu.
00000004: E0 59 75 0E  .Yu.
00000008: 46 55 4E 21  FUN!

答案 1 :(得分:3)

我已经测试了您的示例,并且得到了预期的结果:

#include <stdlib.h>
#include <stdio.h>
#include <string.h>
int binpr(unsigned char Byte, FILE *f)
{
    char buf[8];
    for(int i=0; i<8; i++) buf[i]=(char)(((Byte>>(7-i))&1)+'0');
    return (int)+fwrite(buf,1,8,f);
}
struct R{

   unsigned int opcode: 6;
   unsigned int rs: 5;
   unsigned int rt: 5;
   unsigned int rd: 5;
   unsigned int shamt: 5;
   unsigned int funct: 6;
};
int main()
{

    struct R test  = {32,0,11,21,19,0};
    system(": > result"); //rb requires that the file already exists
    FILE *fp = fopen("./result", "rb+");
    if(!fp) return perror("fopen"),1;
    if(1!=fwrite(&test,sizeof(test),1,fp)) return perror("fwrite"),1;
    rewind(fp);
    char buf[sizeof(struct R)];
    if(1!=fread(&buf,sizeof(buf),1,fp)) return perror("fread"),1;
    fputs(" ",stdout); if(0!=memcmp(buf,&test,sizeof test)) abort();
    for(size_t i=0; i<sizeof(test); i++) { binpr(*((unsigned char*)&test+i),stdout); fputs(" ",stdout); } puts("");
    system("xxd -b result |cut -d: -f2");

     /*OUTPUT:*/
     /*00100000 01011000 01110101 00000010 */
     /*00100000 01011000 01110101 00000010                     Xu.*/
}

请注意,要打开文件进行更新和读取,您需要"rb+"而不是"rb"。否则,您会在fwrite上收到错误消息(因为您没有进行任何错误检查,所以您不会看到它)。

(您的编译器也有可能以一种不寻常的方式放置位域,尽管它可能比错误指定的fopen标志要少。)