Question

我即将开始阅读大量二进制文件，每个文件包含1000条或更多条记录。不断添加新文件，因此我正在编写Windows服务来监视目录并处理收到的新文件。这些文件是用c ++程序创建的。我在c＃中重新创建了结构定义，可以很好地读取数据，但我担心我这样做会最终导致我的应用程序死机。

using (BinaryReader br = new BinaryReader(File.Open("myfile.bin", FileMode.Open)))
{
    long pos = 0L;
    long length = br.BaseStream.Length;

    CPP_STRUCT_DEF record;
    byte[] buffer = new byte[Marshal.SizeOf(typeof(CPP_STRUCT_DEF))];
    GCHandle pin;

    while (pos < length)
    {
        buffer = br.ReadBytes(buffer.Length);
        pin = GCHandle.Alloc(buffer, GCHandleType.Pinned);
        record = (CPP_STRUCT_DEF)Marshal.PtrToStructure(pin.AddrOfPinnedObject(), typeof(CPP_STRUCT_DEF));
        pin.Free();

        pos += buffer.Length;

        /* Do stuff with my record */
    }
}

我认为我不需要使用GCHandle，因为我实际上并没有与C ++应用程序通信，所有内容都是通过托管代码完成的，但我不知道另一种方法。

Answer 1

使用Marshal.PtrToStructure相当慢。我在CodeProject上找到了以下文章，它比较（和基准测试）阅读二进制数据的不同方式非常有帮助：

Fast Binary File Reading with C#

Answer 2

对于您的特定应用，只有一件事会给您明确的答案：简介。

这里所说的是我在使用大型PInvoke解决方案时学到的经验教训。编组数据的最有效方法是编组可以闪现的字段。这意味着CLR可以简单地执行相当于memcpy的操作来在本机代码和托管代码之间移动数据。简单来说，从结构中获取所有非内联数组和字符串。如果它们存在于本机结构中，则使用IntPtr表示它们，并根据需要将值封送到托管代码中。

我还没有描述过使用Marshal.PtrToStructure与使用本机API取消引用该值之间的区别。如果通过剖析将PtrToStructure显示为瓶颈，那么这可能是您应该投资的。

对于大型层次结构，按需编组，而不是一次将整个结构拉入托管代码。在处理大型树结构时，我遇到的问题最多。编组一个单独的节点非常快，如果它是快速的，并且性能明智，它只能在那一刻编组你需要的东西。

Answer 3

除了JaredPar的全面答案，您不需要使用GCHandle，而是可以使用不安全的代码。

fixed(byte *pBuffer = buffer) {
     record = *((CPP_STRUCT_DEF *)pBuffer);
}

GCHandle / fixed语句的全部目的是固定/修复特定的内存段，从GC的角度来看，使内存不可移动。如果内存是可移动的，任何重定位都会使指针无效。

不确定哪种方式更快。</ p>

Answer 4

这可能超出了你的问题范围，但我倾向于在Managed C ++中编写一个小程序集，它在结构中执行fread（）或类似的快速读取。一旦你读完它们，就可以使用C＃来完成你需要的所有其他工作。

Answer 5

这是一个小班，我在玩结构化文件时做了一段时间。这是我能够在不安全的时候弄清楚的最快的方法（这是我试图取代并保持可比的性能。）

using System;
using System.Collections.Generic;
using System.IO;
using System.Runtime.InteropServices;

namespace PersonalUse.IO {

    public sealed class RecordReader<T> : IDisposable, IEnumerable<T> where T : new() {

        const int DEFAULT_STREAM_BUFFER_SIZE = 2 << 16; // default stream buffer (64k)
        const int DEFAULT_RECORD_BUFFER_SIZE = 100; // default record buffer (100 records)

        readonly long _fileSize; // size of the underlying file
        readonly int _recordSize; // size of the record structure
        byte[] _buffer; // the buffer itself, [record buffer size] * _recordSize
        FileStream _fs;

        T[] _structBuffer;
        GCHandle _h; // handle/pinned pointer to _structBuffer 

        int _recordsInBuffer; // how many records are in the buffer
        int _bufferIndex; // the index of the current record in the buffer
        long _recordPosition; // position of the record in the file

        /// <overloads>Initializes a new instance of the <see cref="RecordReader{T}"/> class.</overloads>
        /// <summary>
        /// Initializes a new instance of the <see cref="RecordReader{T}"/> class.
        /// </summary>
        /// <param name="filename">filename to be read</param>
        public RecordReader(string filename) : this(filename, DEFAULT_STREAM_BUFFER_SIZE, DEFAULT_RECORD_BUFFER_SIZE) { }

        /// <summary>
        /// Initializes a new instance of the <see cref="RecordReader{T}"/> class.
        /// </summary>
        /// <param name="filename">filename to be read</param>
        /// <param name="streamBufferSize">buffer size for the underlying <see cref="FileStream"/>, in bytes.</param>
        public RecordReader(string filename, int streamBufferSize) : this(filename, streamBufferSize, DEFAULT_RECORD_BUFFER_SIZE) { }

        /// <summary>
        /// Initializes a new instance of the <see cref="RecordReader{T}"/> class.
        /// </summary>
        /// <param name="filename">filename to be read</param>
        /// <param name="streamBufferSize">buffer size for the underlying <see cref="FileStream"/>, in bytes.</param>
        /// <param name="recordBufferSize">size of record buffer, in records.</param>
        public RecordReader(string filename, int streamBufferSize, int recordBufferSize) {
            _fileSize = new FileInfo(filename).Length;
            _recordSize = Marshal.SizeOf(typeof(T));
            _buffer = new byte[recordBufferSize * _recordSize];
            _fs = new FileStream(filename, FileMode.Open, FileAccess.Read, FileShare.None, streamBufferSize, FileOptions.SequentialScan);

            _structBuffer = new T[recordBufferSize];
            _h = GCHandle.Alloc(_structBuffer, GCHandleType.Pinned);

            FillBuffer();
        }

        // fill the buffer, reset position
        void FillBuffer() {
            int bytes = _fs.Read(_buffer, 0, _buffer.Length);
            Marshal.Copy(_buffer, 0, _h.AddrOfPinnedObject(), _buffer.Length);
            _recordsInBuffer = bytes / _recordSize;
            _bufferIndex = 0;
        }

        /// <summary>
        /// Read a record
        /// </summary>
        /// <returns>a record of type T</returns>
        public T Read() {
            if(_recordsInBuffer == 0)
                return new T(); //EOF
            if(_bufferIndex < _recordsInBuffer) {
                // update positional info
                _recordPosition++;
                return _structBuffer[_bufferIndex++];
            } else {
                // refill the buffer
                FillBuffer();
                return Read();
            }
        }

        /// <summary>
        /// Advances the record position without reading.
        /// </summary>
        public void Next() {
            if(_recordsInBuffer == 0)
                return; // EOF
            else if(_bufferIndex < _recordsInBuffer) {
                _bufferIndex++;
                _recordPosition++;
            } else {
                FillBuffer();
                Next();
            }
        }

        public long FileSize {
            get { return _fileSize; }
        }

        public long FilePosition {
            get { return _recordSize * _recordPosition; }
        }

        public long RecordSize {
            get { return _recordSize; }
        }

        public long RecordPosition {
            get { return _recordPosition; }
        }

        public bool EOF {
            get { return _recordsInBuffer == 0; }
        }

        public void Close() {
            Dispose(true);
        }

        void Dispose(bool disposing) {
            try {
                if(disposing && _fs != null) {
                    _fs.Close();
                }
            } finally {
                if(_fs != null) {
                    _fs = null;
                    _buffer = null;
                    _recordPosition = 0;
                    _bufferIndex = 0;
                    _recordsInBuffer = 0;
                }
                if(_h.IsAllocated) {
                    _h.Free();
                    _structBuffer = null;
                }
            }
        }

        #region IDisposable Members

        public void Dispose() {
            Dispose(true);
        }

        #endregion

        #region IEnumerable<T> Members

        public IEnumerator<T> GetEnumerator() {
            while(_recordsInBuffer != 0) {
                yield return Read();
            }
        }

        #endregion

        #region IEnumerable Members

        System.Collections.IEnumerator System.Collections.IEnumerable.GetEnumerator() {
            return GetEnumerator();
        }

        #endregion

    } // end class

} // end namespace

使用：

using(RecordReader<CPP_STRUCT_DEF> reader = new RecordReader<CPP_STRUCT_DEF>(path)) {
    foreach(CPP_STRUCT_DEF record in reader) {
        // do stuff
    }
}

（这里很新，希望发布的内容不是太多......只是粘贴在课堂上，没有删除评论或任何缩短它的内容。）

Answer 6

这似乎与C ++和编组无关。你知道结构还需要什么。

显然你需要一个简单的代码来读取代表一个结构的字节组，然后使用BitConverter将字节放入相应的C＃字段中。

将C ++结构编组到C＃的最有效方法是什么？

6 个答案: