Question

我正在使用带有Big Endian字节顺序格式的Java将Byte Array值写入文件中。现在我需要从C ++程序中读取该文件...

我写入文件的字节数组由三个字节数组组成，如下所述 -

short employeeId = 32767;
long lastModifiedDate = "1379811105109L";
byte[] attributeValue = os.toByteArray();

我正在将employeeId，lastModifiedDate和attributeValue一起写入单个字节数组，并将生成的字节数组写入文件然后我将使用我的C ++程序将从文件中检索该字节数组，然后对其进行反序列化以从中提取employeeId，lastModifiedDate和attributeValue。

下面是我正在使用的Java代码，它将Byte Array值写入Big Endian格式的文件中：

public class ByteBufferTest {

    public static void main(String[] args) {

        String text = "Byte Array Test For Big Endian";
        byte[] attributeValue = text.getBytes();

        long lastModifiedDate = 1289811105109L;
        short employeeId = 32767;

        int size = 2 + 8 + 4 + attributeValue.length; // short is 2 bytes, long 8 and int 4

        ByteBuffer bbuf = ByteBuffer.allocate(size); 
        bbuf.order(ByteOrder.BIG_ENDIAN);

        bbuf.putShort(employeeId);
        bbuf.putLong(lastModifiedDate);
        bbuf.putInt(attributeValue.length);
        bbuf.put(attributeValue);

        bbuf.rewind();

        // best approach is copy the internal buffer
        byte[] bytesToStore = new byte[size];
        bbuf.get(bytesToStore);

        writeFile(bytesToStore);

    }

    /**
     * Write the file in Java
     * @param byteArray
     */
    public static void writeFile(byte[] byteArray) {

        try{
            File file = new File("bytebuffertest");

            FileOutputStream output = new FileOutputStream(file);
            IOUtils.write(byteArray, output);           

        } catch (Exception ex) {
            ex.printStackTrace();
        }
    }
}

现在我需要使用下面的C ++程序从同一个文件中检索字节数组，然后对其进行反序列化以从中提取employeeId，lastModifiedDate和attributeValue。我不确定C ++方面最好的方法是什么。以下是我到目前为止的代码：

int main() {

    string line;

    std::ifstream myfile("bytebuffertest", std::ios::binary);

    if (myfile.is_open()) {

        uint16_t employeeId;
        uint64_t lastModifiedDate;
        uint32_t attributeLength;

        char buffer[8]; // sized for the biggest read we want to do

        // read two bytes (will be in the wrong order)
        myfile.read(buffer, 2);

        // swap the bytes
        std::swap(buffer[0], buffer[1]);

        // only now convert bytes to an integer
        employeeId = *reinterpret_cast<uint16_t*>(buffer);

        cout<< employeeId <<endl;

        // read eight bytes (will be in the wrong order)
        myfile.read(buffer, 8);

        // swap the bytes
        std::swap(buffer[0], buffer[7]);
        std::swap(buffer[1], buffer[6]);
        std::swap(buffer[2], buffer[5]);
        std::swap(buffer[3], buffer[4]);

        // only now convert bytes to an integer
        lastModifiedDate = *reinterpret_cast<uint64_t*>(buffer);

        cout<< lastModifiedDate <<endl;

        // read 4 bytes (will be in the wrong order)
        myfile.read(buffer, 4);

        // swap the bytes
        std::swap(buffer[0], buffer[3]);
        std::swap(buffer[1], buffer[2]);

        // only now convert bytes to an integer
        attributeLength = *reinterpret_cast<uint32_t*>(buffer);

        cout<< attributeLength <<endl;

        myfile.read(buffer, attributeLength);


        // now I am not sure how should I get the actual attribute value here?

        //close the stream:
        myfile.close();
    }

    else
        cout << "Unable to open file";

    return 0;
}

我专门将存储Java端设置为big-endian，这意味着我知道每个字节所属的位置。那么如何在将字节移动到每个值的正确位置的同时对其进行编码？现在我把它编码为little-endian我猜这不是我想要的......

我正在读某个地方，我可以在C ++中使用ntoh来反序列化字节数组。不确定htons是否会比我目前的解决方案更好？...

如果是，那么我不确定如何在我当前的C ++代码中使用它？

任何人都可以看看C ++代码，看看我能做些什么来改进它，因为我觉得它看起来效率不高？有没有更好的方法来反序列化字节数组并在C ++端提取相关信息？

Answer 1

如果Java和C ++代码是由您开发的，那么最好使用Google协议缓冲区（https://developers.google.com/protocol-buffers/docs/overview）来编写您自己的序列化器/解串器。

如果你真的想编写自己的实现，最好的方法是编写一个缓冲类，它接收字节流作为参数（例如作为Constructor参数）并制作一些访问方法readShort / readLong / readInt / readByte ...并且只交换所需的字节。

class ByteBuffer{
  explicit ByteBuffer(uint8_t* byteStream, uint16_t streamLength);
  uint8_t readUInt8(uint16_t readPos)const {return m_byteStream[readPos];} // no conversion needed
  uint16_t readUInt16(uint16_t readPos)const {
    const uint8_t byteCount = 2;
    union{
      uint16_t u16;
      uint8_t u8[byteCount];
    }tmp;
    for(uint8_t i=0; i<byteCount; ++i){
      tmp.u8[i] = readUInt8(readPos+i*8);
    }
    return ntohs(tmp.u16); // do conversion
  }
  ...
}

此处缺少检查缓冲区后面的读取。如果您的代码应该是可移植的，那么您可以使用ntohl / ntohs（请参阅：http://forums.codeguru.com/showthread.php?298741-C-General-What-do-ntohl%28%29-and-htonl%28%29-actually-do）。如果您使用我们自己的字节交换，那么您的代码不可移植（仅在Little-Endian机器上运行）。如果你使用ntoh那么它也可以在这样的机器上运行。

为方便起见，我还会编写一个包装类，您可以直接读取和编写字段（例如employeeId）：

class MyBuffer{
  uint16_t readEmployeeId()const{return m_Buffer.readuint16(EmployeeId_Pos);}
  ....
  static const uint16_t EmployeeId_Pos = 0;
  ....
}

如何在C ++中使用Big Endian格式而不是Little Endian来移动字节？

1 个答案: