Question

我正在处理由事件构建的二进制文件。每个事件都可以有一个可变长度。由于我的读缓冲区是固定大小，我按如下方式处理：

const int bufferSize = 0x500000;
const int readSize = 0x400000;
const int eventLengthMask = 0x7FFE0000;
const int eventLengthShift = 17;
const int headerLengthMask = 0x1F000;
const int headerLengthShift = 12;
const int slotMask = 0xF0;
const int slotShift = 4;
const int channelMask = 0xF;
...
//allocate the buffer we allocate 5 MB even though we read in 4MB chunks
//to deal with unprocessed data from the end of a read
char* allocBuff = new char[bufferSize]; //inFile reads data into here
unsigned int* buff = reinterpret_cast<unsigned int*>(allocBuff); //data is interpretted from here
inFile.open(fileName.c_str(),ios_base::in | ios_base::binary);
int startPos = 0;
while(!inFile.eof())
{
    int index = 0;
    inFile.read(&(allocBuff[startPos]), readSize);
    int size = ((readSize + startPos)>>2);
    //loop to process the buffer
    while (index<size)
    {
        unsigned int data = buff[index];
        int eventLength = ((data&eventLengthMask)>>eventLengthShift);
        int headerLength = ((data&headerLengthMask)>>headerLengthShift);
        int slot = ((data&slotMask)>>slotShift);
        int channel = data&channelMask;
        //now check if the full event is in the buffer
        if( (index+eventLength) > size )
        {//the full event is not in the buffer
            break;
        }
        ++index;
        //further processing of the event
    }

    //move the data at the end of the buffer to the beginning and set start position
    //for the next read
    for(int i = index; i<size; ++i)
    {
        buff[i-index] = buff[i];
    }
    startPos = ((size-index)<<2);
}

我的问题是：在缓冲区末尾处理未处理的数据是否更好？

Answer 1

您可以使用循环缓冲区而不是简单数组来改进它。那个，或者是数组上的循环迭代器。然后你不需要做所有复制 - 数组的“开始”移动。

除此之外，不，不是真的。

Answer 2

当我在过去遇到这个问题时，我只是复制了未处理的数据，然后从它的末尾读取。这个如果个人是一个有效的解决方案（并且到目前为止最简单）元素相当小，缓冲区很大。（在现代机器，“相当小”可以很容易地达到几个百KB。）当然，你必须记录多少你已经复制下来，调整指针和大小下一读。

除此之外：

最好使用std::vector<char>作为缓冲区。
您无法将从磁盘读取的四个字节转换为 unsigned int只需投下地址;你必须插入每个字节进入它所属的unsigned int。
最后：你没有检查读取是否成功在处理数据之前。使用无缓冲输入 istream有点棘手：你的循环应该是就像是 while ( inFile.read( addr, len ) || inFile.gcount() != 0 )...。

有没有更好的方法来处理缓冲区和读取中的不完整数据？

2 个答案: