如何从二进制文件中读取bitN整数数据?

时间:2011-11-09 18:59:47

标签: c++

我有一个由硬件生成的数据文件。有些数据是4位宽,有些是12位宽。 Matlab能够使用fread(fp,1,'ubit4 => uint16')处理这些数据。我试图用C ++做,但似乎没有简单的方法。我可以通过byte / int / long / long long读取,然后提取出请求的位。但处理数百兆字节数据似乎效率低下。

要概括这个问题,问题是如何读取bitN整数(例如,N从1到64)?任何人都可以推荐一种从c ++文件中读取此类数据的好方法吗?

5 个答案:

答案 0 :(得分:3)

#include <iostream>
#include <climits>
#include <stdexcept>
#include <cassert>

class bitbuffer {
    char buffer;
    char held_bits;
public:
    bitbuffer() :held_bits(0), buffer(0) {}
    unsigned long long read(unsigned char bits) { 
        unsigned long long result = 0;
        //if the buffer doesn't hold enough bits
        while (bits > held_bits) {
            //grab the all bits in the buffer
            bits -= held_bits;
            result |= ((unsigned long long)buffer) << bits;
            //reload the buffer
            if (!std::cin)
                throw std::runtime_error("");
            std::cin.get(buffer);
            held_bits = (char)std::cin.gcount() * CHAR_BIT;
        }
        //append the bits left to the end of the result
        result |= buffer >> (held_bits-bits);
        //remove those bits from the buffer
        held_bits -= bits;
        buffer &= (1ull<<held_bits)-1;
        return result;
    };
};

int main() {
    std::cout << "enter 65535: ";  
    bitbuffer reader;  //0x3535353335
    assert(reader.read(4) == 0x3);
    assert(reader.read(4) == 0x6);
    assert(reader.read(8) == 0x35);
    assert(reader.read(1) == 0x0);
    assert(reader.read(1) == 0x0);
    assert(reader.read(1) == 0x1);
    assert(reader.read(1) == 0x1);
    assert(reader.read(4) == 0x5);
    assert(reader.read(16) == 0x3335);
    assert(reader.read(8) == 0x0A);
    std::cout << "enter FFFFFFFF: ";
    assert(reader.read(64) == 0x4646464646464646);
    return 0;
}

请注意,这会从std::cin读取并在失败时抛出一般错误,但根据您的需要自定义这些部分应该不会太难。

答案 1 :(得分:1)

在我的项目中,我有从流中读取N位的相同要求。

源代码可在此处获取:https://bitbucket.org/puntoexe/imebra/src/6a3d67b378c8/project_files/library/base

或者您可以使用https://bitbucket.org/puntoexe/imebra/downloads中的文档下载整个软件包,并仅使用baseClasses。 它是开源的(FreeBSD)并经过测试。除STL外,不需要其他库。

基本上,您创建一个流,然后将streamReader连接到它。 流读取器能够读取字节块或请求的位数。可以将多个streamReader对象连接到同一个流。

这些类目前用于读取jpeg文件或医学图像文件。

适用于多种操作系统(包括iOS),大端和低端机器。

示例:

#include "../../library/imebra/include/imebra.h"

// Open the file containing the dicom dataset
ptr<puntoexe::stream> inputStream(new puntoexe::stream);
inputStream->openFile(argv[1], std::ios_base::in);

// Connect a stream reader to the dicom stream. Several stream reader
//  can share the same stream
ptr<puntoexe::streamReader> reader(new streamReader(inputStream));

答案 2 :(得分:1)

感谢所有人提供的答案,他们都非常有帮助。我并不想回答我的问题而得到信任,但我觉得我有义务就这个问题的进展提出反馈意见。所有学分都归结为上述答案。

为了实现matlab fread的类似功能来读取bitN整数,我觉得模板类不合适,所以我提出了几个函数来处理&lt; 8bit&lt; 16bit&lt; 32bit&lt; 64bit case并单独处理它们。

我的想法是:我将几个字节(从2到8个字节)复制到我的对象并处理这些字节并保留未处理的字节以供下一次处理。这是我的代码和测试结果(仅实现&lt; 8bit情况):

#include <math.h>
#include <memory.h>
typedef unsigned _int8 _uint8;
typedef unsigned _int16 _uint16;
typedef unsigned _int32 _uint32;
typedef unsigned _int64 _uint64;

class bitbuffer
{
    _uint8 *pbuf;
    _uint8 *pelem; //can be casted to int16/32/64
    _uint32 pbuf_len; //buf length in byte
    _uint32 pelem_len; //element length in byte
    union membuf
    {
        _uint64 buf64;
        _uint32 buf32;
        _uint16 buf16;
        _uint8 buf8[2];
    } tbuf;

    //bookkeeping information
    _uint8 start_bit; //
    _uint32 byte_pos; //current byte position
    _uint32 elem_pos;
public:
    bitbuffer(_uint8 *src,_uint32 src_len,_uint8 *dst,_uint32 dst_len)
    {
        pbuf=src;pelem=dst;
        pbuf_len=src_len;pelem_len=dst_len;
        start_bit=0;byte_pos=0;elem_pos=0;
    } //to define the source and destination
    void set_startbit(_uint8 bit) {start_bit=bit;}
    void set_bytepos(_uint32 pos) {byte_pos=pos;}
    void set_elempos(_uint32 pos) {elem_pos=pos;}
    void reset() {start_bit=0;byte_pos=0;elem_pos=0;} //for restart something from somewhere else
    //OUT getbits(IN a, _uint8 nbits); //get nbits from a using start and byte_pos
    _uint32 get_elem_uint8(_uint32 num_elem,_uint8 nbits) //output limit to 8/16/32/64 only
    {
        _uint32 num_read=0;
        _uint16 mask=pow(2,nbits)-1;//00000111 for example nbit=3 
        while(byte_pos<=pbuf_len-2)
        {
            //memcpy((char*)&tbuf.buf16,pbuf+byte_pos,2); //copy 2 bytes into our buffer, this may introduce redundant copy
            tbuf.buf8[1]=pbuf[byte_pos]; //for little endian machine, swap the bytes
            tbuf.buf8[0]=pbuf[byte_pos+1];
            //now we have start_bits, byte_pos, elem_pos, just finish them all
            while(start_bit<=16-nbits)
            {
                pelem[elem_pos++]=(tbuf.buf16>>(16-start_bit-nbits))&mask;//(tbuf.buf16&(mask<<(16-start_bit))
                start_bit+=nbits; //advance by nbits
                num_read++;
                if(num_read>=num_elem)
                {
                    break;
                }
            }
            //need update the start_bit and byte_pos
            byte_pos+=(start_bit/8);
            start_bit%=8;
            if(num_read>=num_elem)
            {
                break;
            }

        }
        return num_read;
    }
/*  
    _uint32 get_elem_uint16(_uint32 num_elem,_uint8 nbits) //output limit to 8/16/32/64 only
    {
        _uint32 num_read=0;
        _uint32 mask=pow(2,nbits)-1;//00000111 for example nbit=3 
        while(byte_pos<pbuf_len-4)
        {
            memcpy((char*)&tbuf.buf32,pbuf+byte_pos,4); //copy 2 bytes into our buffer, this may introduce redundant copy
            //now we have start_bits, byte_pos, elem_pos, just finish them all
            while(start_bit<=32-nbits)
            {
                pelem[elem_pos++]=(tbuf.buf32>>(32-start_bit-nbits))&mask;//(tbuf.buf16&(mask<<(16-start_bit))
                start_bit+=nbits; //advance by nbits
                num_read++;
                if(num_read>=num_elem)
                {
                    break;
                }
            }
            //need update the start_bit and byte_pos
            start_bit%=8;
            byte_pos+=(start_bit/8);
            if(num_read>=num_elem)
            {
                break;
            }

        }
        return num_read;
    }
    _uint32 get_elem_uint32(_uint32 num_elem,_uint8 nbits) //output limit to 8/16/32/64 only
    {
        _uint32 num_read=0;
        _uint64 mask=pow(2,nbits)-1;//00000111 for example nbit=3 
        while(byte_pos<pbuf_len-8)
        {
            memcpy((char*)&tbuf.buf16,pbuf+byte_pos,8); //copy 2 bytes into our buffer, this may introduce redundant copy
            //now we have start_bits, byte_pos, elem_pos, just finish them all
            while(start_bit<=64-nbits)
            {
                pelem[elem_pos++]=(tbuf.buf64>>(64-start_bit-nbits))&mask;//(tbuf.buf16&(mask<<(16-start_bit))
                start_bit+=nbits; //advance by nbits
                num_read++;
                if(num_read>=num_elem)
                {
                    break;
                }
            }
            //need update the start_bit and byte_pos
            start_bit%=8;
            byte_pos+=(start_bit/8);
            if(num_read>=num_elem)
            {
                break;
            }

        }
        return num_read;
    }

    //not work well for 64 bit!
    _uint64 get_elem_uint64(_uint32 num_elem,_uint8 nbits) //output limit to 8/16/32/64 only
    {
        _uint32 num_read=0;
        _uint64 mask=pow(2,nbits)-1;//00000111 for example nbit=3 
        while(byte_pos<pbuf_len-2)
        {
            memcpy((char*)&tbuf.buf16,pbuf+byte_pos,8); //copy 2 bytes into our buffer, this may introduce redundant copy
            //now we have start_bits, byte_pos, elem_pos, just finish them all
            while(start_bit<=16-nbits)
            {
                pelem[elem_pos++]=(tbuf.buf16>>(16-start_bit-nbits))&mask;//(tbuf.buf16&(mask<<(16-start_bit))
                start_bit+=nbits; //advance by nbits
                num_read++;
                if(num_read>=num_elem)
                {
                    break;
                }
            }
            //need update the start_bit and byte_pos
            start_bit%=8;
            byte_pos+=(start_bit/8);
            if(num_read>=num_elem)
            {
                break;
            }

        }
        return num_read;
    }*/
};

#include <iostream>
using namespace std;

int main()
{
    _uint8 *pbuf=new _uint8[10];
    _uint8 *pelem=new _uint8[80];
    for(int i=0;i<10;i++) pbuf[i]=i*11+11;
    bitbuffer vbit(pbuf,10,pelem,10);

    cout.setf(ios_base::hex,ios_base::basefield);
    cout<<"Bytes: ";
    for(i=0;i<10;i++) cout<<pbuf[i]<<" ";
    cout<<endl;
    cout<<"1 bit: ";
    int num_read=vbit.get_elem_uint8(80,1);
    for(i=0;i<num_read;i++) cout<<(int)pelem[i];
    cout<<endl;
    vbit.reset();
    cout<<"2 bit: ";
    num_read=vbit.get_elem_uint8(40,2);
    for(i=0;i<num_read;i++) cout<<(int)pelem[i]<<" ";
    cout<<endl;
    vbit.reset();
    cout<<"3 bit: ";
    num_read=vbit.get_elem_uint8(26,3);
    for(i=0;i<num_read;i++) cout<<(int)pelem[i]<<' ';
    cout<<endl;
    vbit.reset();
    cout<<"4 bit: ";
    num_read=vbit.get_elem_uint8(20,4);//get 10 bit-12 integers 
    for(i=0;i<num_read;i++) cout<<(int)pelem[i]<<" ";
    cout<<endl;
    vbit.reset();
    cout<<"5 bit: ";
    num_read=vbit.get_elem_uint8(16,5);//get 10 bit-12 integers 
    for(i=0;i<num_read;i++) cout<<(int)pelem[i]<<" ";
    cout<<endl;
    vbit.reset();
    cout<<"6 bit: ";
    num_read=vbit.get_elem_uint8(13,6);//get 10 bit-12 integers 
    for(i=0;i<num_read;i++) cout<<(int)pelem[i]<<" ";
    cout<<endl;
    vbit.reset();
    cout<<"7 bit: ";
    num_read=vbit.get_elem_uint8(11,7);//get 10 bit-12 integers 
    for(i=0;i<num_read;i++) cout<<(int)pelem[i]<<" ";
    cout<<endl;
    vbit.reset();
    cout<<"8 bit: ";
    num_read=vbit.get_elem_uint8(10,8);//get 10 bit-12 integers 
    for(i=0;i<num_read;i++) cout<<(int)pelem[i]<<" ";
    cout<<endl;
    vbit.reset();

    return 0;
}

测试结果:

Bytes: b 16 21 2c 37 42 4d 58 63 6e
1 bit: 0000101100010110001000010010110000110111010000100100110101011000011000110
1101110
2 bit: 0 0 2 3 0 1 1 2 0 2 0 1 0 2 3 0 0 3 1 3 1 0 0 2 1 0 3 1 1 1 2 0 1 2 0 3 1
 2 3 2
3 bit: 0 2 6 1 3 0 4 1 1 3 0 3 3 5 0 2 2 3 2 5 4 1 4 3
4 bit: 0 b 1 6 2 1 2 c 3 7 4 2 4 d 5 8 6 3 6 e
5 bit: 1 c b 2 2 b 1 17 8 9 6 15 10 18 1b e
6 bit: 2 31 18 21 b 3 1d 2 13 15 21 23
7 bit: 5 45 44 12 61 5d 4 4d 2c 18 6d
8 bit: b 16 21 2c 37 42 4d 58 63 6e
Press any key to continue

答案 3 :(得分:0)

你的问题不是很具体,所以我只能推荐一些一般性的想法。

您可能希望以块的形式读取文件,例如一次4096字节(这是典型的页面大小) - 尽管更大的块也应该没问题(可能是64kiB或512kiB甚至只是实验)。获得块读取后,从内存中处理它。

为了正确,我们应该将块内存生成为目标整数的数组。例如,对于4字节整数,我们可以这样做:

#include <cstdint>
#include <memory>
#include <cstdio>

uint32_t buf[1024];

typedef std::unique_ptr<std::FILE, int (*)(std::FILE *)> unique_file_ptr;

static unique_file_ptr make_file(const char * filename, const char * flags)
{
  std::FILE * const fp = std::fopen(filename, flags);
  return unique_file_ptr(fp ? fp : nullptr, std::fclose);
}

int main()
{
  auto fp = make_file("thedata.bin", "rb");

  if (!fp) return 1;

  while (true)
  {
    if (4096 != std::fread(reinterpret_cast<char*>(buf), 4096, fp.get())) break;
    // process buf[0] up to buf[1023]
  }
}

出于性能原因,我在C ++ iostream上选择了C库fopen / fread;我实际上无法声称该决定是基于个人经验。 (如果你有一个旧的编译器,你可能需要标题<stdint.h>,也许你没有unique_ptr,在这种情况下你可以使用std::FILE*和{{1}手动。)

替代全局std::fopen,您还可以制作buf,将其大小调整为足够大并直接读入其数据缓冲区(std::vector<uint32_t>&buf[0]

如果需要读取长度不是2,4,8或16个字节的整数,则必须读入char数组并使用代数运算手动提取数字(例如buf.data()为3 - 字节整数)。如果您的包装甚至不是字节对齐的,那么您将不得不付出更大的努力。

答案 4 :(得分:0)

下面是一个如何从变量中获取一系列位的示例。

该示例模拟某些二进制数据已从文件中读取并存储在vector<unsigned char>中。

将数据从向量复制(提取函数模板)到变量中。之后,get_bits函数将请求的位返回到一个新变量中。就这么简单!

#include <vector>

using namespace std;

template<typename T>
T extract(const vector<unsigned char> &v, int pos)
{
  T value;
  memcpy(&value, &v[pos], sizeof(T));
  return value;
}

template<typename IN, typename OUT>
OUT get_bits(IN value, int first_bit, int last_bit)
{
  value = (value >> first_bit);
  double the_mask = pow(2.0,(1 + last_bit - first_bit)) - 1;
  OUT result = value & static_cast<IN>(the_mask);
  return result;
}

int main()
{
  vector<unsigned char> v;
  //Simulate that we have read a binary file.
  //Add some binary data to v.
  v.push_back(255);
  v.push_back(1);
  //0x01 0xff
  short a = extract<short>(v,0);

  //Now get the bits from the extracted variable.
  char b = get_bits<short,char>(a,8,8);
  short c = get_bits<short,short>(a,2,5);
  int d = get_bits<short,int>(a,0,7);

  return 0;
}

这只是一个没有任何错误检查的简单示例。

您可以使用提取功能模板从向量中的任何位置开始获取数据。这个向量只有2个元素,short的大小是2个字节,所以这就是提取函数的pos参数为0的原因。

祝你好运!