使用C ++编写便携式字节序正确文件读取/写入代码的简洁方法

时间:2015-05-26 20:09:44

标签: c++ boost endianness

我想编写一些可以以字节顺序正确的方式从文件读写的C ++代码。更准确地说,我希望能够读取特定类型的文件,我可以很容易地检测到它的字节序(它的幻数是否反转)。

但是,我怎么才能正确地阅读文件呢?我已阅读以下文章,该文章提供了一个有用的想法:

http://www.gamedev.net/page/resources/_/technical/game-programming/writing-endian-independent-code-in-c-r2091

这个想法是创建一个类,它具有一些函数指针,指向期望的endianness-correct read()函数。但根据我的经验,函数指针很慢,特别是当你必须如此频繁地调用它们时。另一种选择是拥有

if (file_was_detected_big_endian) { read_bigendian(); } else { read_littleendian(); }

对于我所拥有的每一个read_x_bit_int()函数,但这似乎效率低下。

我使用Boost所以我有它的所有辉煌来帮助我。特别是,有endian子库:

http://www.boost.org/doc/libs/develop/libs/endian/doc/buffers.html

虽然我不确定如何干净地使用此代码来执行我想要的操作。我喜欢在我可以阅读的地方写一些代码直接将16个字节写入代表文件一部分的struct指针,同时自动纠正字节顺序。我当然可以自己编写这段代码,但我觉得必须已经存在一个可靠的解决方案。

我认为我所拥有的所有代码都将手动填充并防止对齐问题。

谢谢!

3 个答案:

答案 0 :(得分:3)

这个问题有两种方法:

  1. 以与字节序无关的方式编写文件,
  2. 添加标记,并以字节顺序识别方式读取文件。
  3. 第一种方法需要更多的写作工作,而第二种方法则需要写作"无开销"。

    这两种方法都可以在没有函数指针的情况下实现:由于虚函数 * ,对C ++的需求大大减少。

    实现这两种方法是类似的:您需要创建一个抽象基类来序列化原始数据类型,创建该类的实例以读取正确的字节序,并调用其虚拟成员函数进行读写:

    struct PrimitiveSerializer {
        virtual void serializeInt(ostream& out, const int val) = 0;
        virtual void serializeChar(ostream& out, const char val) = 0;
        virtual void serializeString(ostream& out, const std::string& val) = 0;
        ...
        virtual int deserializeInt(istream& in) = 0;
        virtual char deserializeChar(istream& in) = 0;
        virtual std::string deserializeString(istream& in) = 0;
    };
    struct BigEndianSerializer : public PrimitiveSerializer {
        ...
    };
    struct LittleEndianSerializer : public PrimitiveSerializer {
        ...
    };
    

    根据方法,决定使用哪个子类的方式不同。如果您使用第一种方法(即编写与字节顺序无关的文件),那么您将实例化与系统的字节顺序相匹配的序列化程序。如果你采用第二种方法,你将从文件中读取幻数,并选择与文件的字节顺序相匹配的子类。

    此外,第一种方法可以使用hton / ntoh函数实现。

    * 函数指针不是"慢"它们本身,虽然它们更容易编写低效的代码。

答案 1 :(得分:3)

因此,dasblinkenlight提出的虚函数方法可能就足够了 - 特别是因为I / O可能是时间的主导者。但是,如果你发现你的读取函数占用了大量的cpu时间,你可以通过模板化文件读取器来摆脱虚函数调度。

这里有一些伪代码证明了这一点:

基本上,创建两个读者类,每个字节序一个:

class LittleReader {
  public:
  LittleReader(std::istream& is) : m_is(is) {}
  char read_char() {//read byte from m_is}
  int read_int32() {//read 32-bit int and convert;}
  float read_float()....
  private:
  std::istream& m_is;
};

class BigReader {
  public:
  BigReader(std::istream& is): m_is(is){}
  char read_char(){...}
  int read_int32(){..}
  float read_float(){...}
  private:
  std::istream& m_is;
}

将读取逻辑的主要部分(幻数位除外)分离为一个函数模板,该模板将上述类之一的实例作为参数:

template <class Reader>
void read_endian(Reader &rdr){
  field1 = rdr.read_int32();
  field2 = rdr.read_float();
  // process rest of data file
  ...
}

本质上,编译器将创建read_endian函数的两个实现 - 每个字节序一个。由于没有动态调度,编译器也可以内联对read_int32,read_float等的所有调用。

最后,在您的主要阅读器功能中,查看幻数以确定要实例化哪种阅读器:

void read_file(std::istream& is){
  int magic(read_magic_no(is));
  if (magic == MAGIC_BIG_ENDIAN)
     read_endian(BigReader(is));
  else
     read_endian(LittleReader(is));
}

这种技术为您提供了灵活性,而不会产生任何虚拟调度开销,代价是增加(二进制)代码大小。如果你有非常紧凑的循环,你需要挤压每一滴性能,这非常有用。

答案 2 :(得分:1)

我已经编写了一个小的.h和.cpp,现在可以处理(可能)所有字节序问题。虽然我已经为我自己的应用程序调整了这些功能,但它们可能对某人有帮助。

endian_bis.h:

/**
 * endian_bis.h - endian-gnostic binary input stream functions
 * Copyright (C) 2015
 *
 * This program is free software; you can redistribute it and/or modify
 * it under the terms of the GNU General Public License as published by
 * the Free Software Foundation; either version 2 of the License, or
 * (at your option) any later version.
 *
 * This program is distributed in the hope that it will be useful,
 * but WITHOUT ANY WARRANTY; without even the implied warranty of
 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
 * GNU General Public License for more details.
 *
 * You should have received a copy of the GNU General Public License along
 * with this program; if not, write to the Free Software Foundation, Inc.,
 * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
 */

#pragma once

#include <cstdint>
#include <istream>

class BinaryInputStream {

public:
    inline int8_t   read_int8(std::istream &in)   { char buf[1]; in.read(buf, 1); return read_int8(buf, 0);   }
    inline int16_t  read_int16(std::istream &in)  { char buf[2]; in.read(buf, 2); return read_int16(buf, 0);  }
    inline int32_t  read_int32(std::istream &in)  { char buf[4]; in.read(buf, 4); return read_int32(buf, 0);  }
    inline int64_t  read_int64(std::istream &in)  { char buf[8]; in.read(buf, 8); return read_int64(buf, 0);  }
    inline uint8_t  read_uint8(std::istream &in)  { char buf[1]; in.read(buf, 1); return read_uint8(buf, 0);  }
    inline uint16_t read_uint16(std::istream &in) { char buf[2]; in.read(buf, 2); return read_uint16(buf, 0); }
    inline uint32_t read_uint32(std::istream &in) { char buf[4]; in.read(buf, 4); return read_uint32(buf, 0); }
    inline uint64_t read_uint64(std::istream &in) { char buf[8]; in.read(buf, 8); return read_uint64(buf, 0); }
    inline float    read_float(std::istream &in)  { char buf[4]; in.read(buf, 4); return read_float(buf, 0);  }
    inline double   read_double(std::istream &in)  { char buf[8]; in.read(buf, 8); return read_double(buf, 0); }

    inline int8_t    read_int8(char buf[], int off)  { return (int8_t)buf[off]; }
    inline uint8_t   read_uint8(char buf[], int off) { return (uint8_t)buf[off]; }
    virtual int16_t  read_int16(char buf[], int off)   = 0;
    virtual int32_t  read_int32(char buf[], int off)   = 0;
    virtual int64_t  read_int64(char buf[], int off)   = 0;
    virtual uint16_t read_uint16(char buf[], int off)  = 0;
    virtual uint32_t read_uint32(char buf[], int off)  = 0;
    virtual uint64_t read_uint64(char buf[], int off)  = 0;
    virtual float    read_float(char buf[], int off)   = 0;
    virtual double   read_double(char buf[], int off)  = 0;

    static BinaryInputStream *endianCorrectStream(int streamIsBigEndian);
    static BinaryInputStream *endianCorrectStream(std::istream &in,
                                                  uint32_t expectedBigEndianMagic,
                                                  uint32_t expectedLittleEndianMagic);

};

endian_bis.cpp:

/**
 * endian_bis.cpp - endian-gnostic binary input stream functions
 * Copyright (C) 2015 Jonah Schreiber (jonah.schreiber@gmail.com)
 *
 * This program is free software; you can redistribute it and/or modify
 * it under the terms of the GNU General Public License as published by
 * the Free Software Foundation; either version 2 of the License, or
 * (at your option) any later version.
 *
 * This program is distributed in the hope that it will be useful,
 * but WITHOUT ANY WARRANTY; without even the implied warranty of
 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
 * GNU General Public License for more details.
 *
 * You should have received a copy of the GNU General Public License along
 * with this program; if not, write to the Free Software Foundation, Inc.,
 * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
 */

#include "endian_bis.h"

#include <cstring>

/*
 * Delegated functions
 */

static inline int16_t  read_be_int16(char buf[], int off) {
    return (int16_t)(((buf[off]   & 0xff) << 8) |
                     ((buf[off+1] & 0xff)));
}

static inline int32_t  read_be_int32(char buf[], int off) {
    return (int32_t)(((buf[off]   & 0xff) << 24) |
                     ((buf[off+1] & 0xff) << 16) |
                     ((buf[off+2] & 0xff) << 8)  |
                     ((buf[off+3] & 0xff)));
}

template<int> static inline int64_t read_be_int64(char buf[], int off); // template indicates default word size (size_t)
template<> inline int64_t read_be_int64<4>(char buf[], int off) {
    return (((int64_t)(((buf[off]   & 0xff) << 24) |
                       ((buf[off+1] & 0xff) << 16) |
                       ((buf[off+2] & 0xff) << 8)  |
                       ((buf[off+3] & 0xff)))
                      ) << 32) | (
             (int64_t)(((buf[off+4] & 0xff) << 24) |
                       ((buf[off+5] & 0xff) << 16) |
                       ((buf[off+6] & 0xff) << 8)  |
                       ((buf[off+7] & 0xff))));
}

static inline uint16_t read_be_uint16(char buf[], int off) {
    return (uint16_t)(((buf[off]   & 0xff) << 8) |
                      ((buf[off+1] & 0xff)));
}

static inline uint32_t read_be_uint32(char buf[], int off) {
    return (uint32_t)(((buf[off]   & 0xff) << 24) |
                      ((buf[off+1] & 0xff) << 16) |
                      ((buf[off+2] & 0xff) << 8)  |
                      ((buf[off+3] & 0xff)));
}

template<int> static inline uint64_t read_be_uint64(char buf[], int off); // template indicates default word size (size_t)
template<> inline uint64_t read_be_uint64<4>(char buf[], int off) {
    return (((uint64_t)(((buf[off]   & 0xff) << 24) |
                        ((buf[off+1] & 0xff) << 16) |
                        ((buf[off+2] & 0xff) << 8)  |
                        ((buf[off+3] & 0xff)))
                       ) << 32) | (
             (uint64_t)(((buf[off+4] & 0xff) << 24) |
                        ((buf[off+5] & 0xff) << 16) |
                        ((buf[off+6] & 0xff) << 8)  |
                        ((buf[off+7] & 0xff))));
}

inline static int16_t  read_le_int16(char buf[], int off) {
    return (int16_t)(((buf[off+1] & 0xff) << 8) |
                     ((buf[off]   & 0xff)));
}

inline static int32_t  read_le_int32(char buf[], int off) {
    return (int32_t)(((buf[off+3] & 0xff) << 24) |
                     ((buf[off+2] & 0xff) << 16) |
                     ((buf[off+1] & 0xff) << 8)  |
                     ((buf[off]   & 0xff)));
}

template<int> static inline int64_t read_le_int64(char buf[], int off); // template indicates default word size (size_t)
template<> inline int64_t read_le_int64<4>(char buf[], int off) {
    return (((int64_t)(((buf[off+7] & 0xff) << 24) |
                       ((buf[off+6] & 0xff) << 16) |
                       ((buf[off+5] & 0xff) << 8)  |
                       ((buf[off+4] & 0xff)))
                      ) << 32) | (
             (int64_t)(((buf[off+3] & 0xff) << 24) |
                       ((buf[off+2] & 0xff) << 16) |
                       ((buf[off+1] & 0xff) << 8)  |
                       ((buf[off]   & 0xff))));
}

inline static uint16_t read_le_uint16(char buf[], int off) {
    return (uint16_t)(((buf[off+1] & 0xff) << 8) |
                      ((buf[off]   & 0xff)));
}

inline static uint32_t read_le_uint32(char buf[], int off) {
    return (uint32_t)(((buf[off+3] & 0xff) << 24) |
                      ((buf[off+2] & 0xff) << 16) |
                      ((buf[off+1] & 0xff) << 8)  |
                      ((buf[off]   & 0xff)));
}

template<int> static inline uint64_t read_le_uint64(char buf[], int off); // template indicates default word size (size_t)
template<> inline uint64_t read_le_uint64<4>(char buf[], int off) {
    return (((uint64_t)(((buf[off+7] & 0xff) << 24) |
                        ((buf[off+6] & 0xff) << 16) |
                        ((buf[off+5] & 0xff)<< 8)  |
                        ((buf[off+4] & 0xff)))
                      ) << 32) | (
             (uint64_t)(((buf[off+3] & 0xff) << 24) |
                        ((buf[off+2] & 0xff) << 16) |
                        ((buf[off+1] & 0xff) << 8)  |
                        ((buf[off]   & 0xff))));
}

/* WARNING: UNTESTED FOR 64 BIT ARCHITECTURES; FILL IN 3 MORE METHODS LIKE THIS TO TEST
   THE CORRECT FUNCTION WILL BE SELECTED AUTOMATICALLY AT COMPILE TIME
template<> inline uint64_t read_uint64_branch<8>(char buf[], int off) {
    return (int64_t)((buf[off]   << 56) |
                     (buf[off+1] << 48) |
                     (buf[off+2] << 40) |
                     (buf[off+3] << 32) |
                     (buf[off+4] << 24) |
                     (buf[off+5] << 16) |
                     (buf[off+6] << 8)  |
                     (buf[off+7]));
}*/

inline static float  read_matching_float(char buf[], int off) {
    float f;
    memcpy(&f, &buf[off], 4);
    return f;
}

inline static float  read_mismatched_float(char buf[], int off) {
    float f;
    char buf2[4] = {buf[3], buf[2], buf[1], buf[0]};
    memcpy(&f, buf2, 4);
    return f;
}

inline static double  read_matching_double(char buf[], int off) {
    double d;
    memcpy(&d, &buf[off], 8);
    return d;
}

inline static double  read_mismatched_double(char buf[], int off) {
    double d;
    char buf2[8] = {buf[7], buf[6], buf[5], buf[4], buf[3], buf[2], buf[1], buf[0]};
    memcpy(&d, buf2, 4);
    return d;
}


/*
 * Types (singleton instantiations)
 */

/*
 * Big-endian stream, Big-endian runtime
 */
static class : public BinaryInputStream {

public:
    int16_t  read_int16(char buf[], int off)  { return read_be_int16(buf, off); }
    int32_t  read_int32(char buf[], int off)  { return read_be_int32(buf, off); }
    int64_t  read_int64(char buf[], int off)  { return read_be_int64<sizeof(size_t)>(buf, off); }
    uint16_t read_uint16(char buf[], int off) { return read_be_uint16(buf, off); }
    uint32_t read_uint32(char buf[], int off) { return read_be_uint32(buf, off); }
    uint64_t read_uint64(char buf[], int off) { return read_be_uint64<sizeof(size_t)>(buf, off); }
    float    read_float(char buf[], int off)  { return read_matching_float(buf, off); }
    double   read_double(char buf[], int off) { return read_matching_double(buf, off); }
} beStreamBeRuntime;

/*
 * Big-endian stream, Little-endian runtime
 */
static class : public BinaryInputStream {

public:
    int16_t  read_int16(char buf[], int off)  { return read_be_int16(buf, off); }
    int32_t  read_int32(char buf[], int off)  { return read_be_int32(buf, off); }
    int64_t  read_int64(char buf[], int off)  { return read_be_int64<sizeof(size_t)>(buf, off); }
    uint16_t read_uint16(char buf[], int off) { return read_be_uint16(buf, off); }
    uint32_t read_uint32(char buf[], int off) { return read_be_uint32(buf, off); }
    uint64_t read_uint64(char buf[], int off) { return read_be_uint64<sizeof(size_t)>(buf, off); }
    float    read_float(char buf[], int off)  { return read_mismatched_float(buf, off); }
    double   read_double(char buf[], int off) { return read_mismatched_double(buf, off); }
} beStreamLeRuntime;

/*
 * Little-endian stream, Big-endian runtime
 */
static class : public BinaryInputStream {

public:
    int16_t  read_int16(char buf[], int off)  { return read_le_int16(buf, off); }
    int32_t  read_int32(char buf[], int off)  { return read_le_int32(buf, off); }
    int64_t  read_int64(char buf[], int off)  { return read_le_int64<sizeof(size_t)>(buf, off); }
    uint16_t read_uint16(char buf[], int off) { return read_le_uint16(buf, off); }
    uint32_t read_uint32(char buf[], int off) { return read_le_uint32(buf, off); }
    uint64_t read_uint64(char buf[], int off) { return read_le_uint64<sizeof(size_t)>(buf, off); }
    float    read_float(char buf[], int off)  { return read_mismatched_float(buf, off); }
    double   read_double(char buf[], int off) { return read_mismatched_double(buf, off); }
} leStreamBeRuntime;

/*
 * Little-endian stream, Little-endian runtime
 */
static class : public BinaryInputStream {

public:
    int16_t  read_int16(char buf[], int off)  { return read_le_int16(buf, off); }
    int32_t  read_int32(char buf[], int off)  { return read_le_int32(buf, off); }
    int64_t  read_int64(char buf[], int off)  { return read_le_int64<sizeof(size_t)>(buf, off); }
    uint16_t read_uint16(char buf[], int off) { return read_le_uint16(buf, off); }
    uint32_t read_uint32(char buf[], int off) { return read_le_uint32(buf, off); }
    uint64_t read_uint64(char buf[], int off) { return read_le_uint64<sizeof(size_t)>(buf, off); }
    float    read_float(char buf[], int off)  { return read_matching_float(buf, off); }
    double   read_double(char buf[], int off) { return read_matching_double(buf, off); }
} leStreamLeRuntime;

/*
 * "Factory" singleton methods (plus helper)
 */

static inline int isRuntimeBigEndian() {
    union { int32_t i; int8_t c[4]; } bint = {0x01020304};
    return bint.c[0] == 1;
}

BinaryInputStream *BinaryInputStream::endianCorrectStream(int streamIsBigEndian) {
    if (streamIsBigEndian) {
        if (isRuntimeBigEndian()) {
            return &beStreamBeRuntime;
        } else {
            return &beStreamLeRuntime;
        }
    } else {
        if (isRuntimeBigEndian()) {
            return &leStreamBeRuntime;
        } else {
            return &leStreamLeRuntime;
        }
    }
}

BinaryInputStream *BinaryInputStream::endianCorrectStream(std::istream &in,
                                                          uint32_t expectedBigEndianMagic,
                                                          uint32_t expectedLittleEndianMagic) {
    uint32_t magic = ((BinaryInputStream*)&beStreamBeRuntime)->read_uint32(in);
    if (magic == expectedBigEndianMagic) {
        if (isRuntimeBigEndian()) {
            return &beStreamBeRuntime;
        } else {
            return &beStreamLeRuntime;
        }
    } else if (magic == expectedLittleEndianMagic) {
        if (isRuntimeBigEndian()) {
            return &leStreamBeRuntime;
        } else {
            return &leStreamLeRuntime;
        }
    } else {
        return 0; /* not expected magic number */
    }
}

建议用途:

BinaryInputStream *bis = BinaryInputStream::endianCorrectStream(in, 0x01020304, 0x04030201);

if (bis == 0) {
    cerr << "error: infile is not an Acme EarthQUAKEZ file" << endl;
    return 1;
}

in.ignore(4);
int32_t number = bis->read_int32(in);
...