Question

我正在使用需要XML文档的界面。到目前为止，我已经能够使用XmlSerializer序列化大多数对象。但是，有一个属性证明是有问题的。它应该是包装文档的对象的集合。文档本身被编码为base64字符串。

基本结构是这样的：

//snipped out of a parent object
public List<Document> DocumentCollection { get; set; }
//end snip

public class Document
    {
        public string DocumentTitle { get; set; }
        public Code DocumentCategory { get; set; }
        /// <summary>
        /// Base64 encoded file
        /// </summary>
        public string BinaryDocument { get; set; }
        public string DocumentTypeText { get; set; }
    }

问题是较小的值可以正常工作，但如果文档太大，序列化程序只会跳过集合中的该文档项。

我遇到了一些限制吗？

更新：我改变了

public string BinaryDocument { get; set; }

到

public byte[] BinaryDocument { get; set; }

我仍然得到相同的结果。较小的文档（~150kb）序列化很好，但其余的都没有。要清楚，它不仅仅是属性的值，而是整个包含的Document对象被删除。

更新2：

这里是带有简单repro的序列化代码。它出自我放在一起的控制台项目。问题是这个代码在测试项目中工作正常。我很难在这里打包完整的对象结构，因为在测试用例中几乎不可能使用实际的对象，因为填充字段很复杂，所以我试图减少代码主要的应用。填充的对象进入序列化代码，DocumentCollection填充了四个文档，并带有一个Document。

using System.Collections.Generic;
using System.IO;
using System.Text;
using System.Xml;
using System.Xml.Serialization;

namespace ConsoleApplication2
{
    class Program
    {
        static void Main(string[] args)
        {
            var container = new DocumentContainer();
            var docs = new List<Document>();
            foreach (var f in Directory.GetFiles(@"E:\Software Projects\DA\Test Documents"))
            {
                var fileStream = new MemoryStream(File.ReadAllBytes(f));
                var doc = new Document
                {
                    BinaryDocument = fileStream.ToArray(),
                    DocumentTitle = Path.GetFileName(f)
                };

                docs.Add(doc);
            }

            container.DocumentCollection = docs;

            var serializer = new XmlSerializer(typeof(DocumentContainer));
            var ms = new MemoryStream();
            var writer = XmlWriter.Create(ms);

            serializer.Serialize(writer, container);
            writer.Flush();
            ms.Seek(0, SeekOrigin.Begin);

            var reader = new StreamReader(ms, Encoding.UTF8);
            File.WriteAllText(@"C:\temp\testexport.xml", reader.ReadToEnd());
        }
    }

    public class Document
    {
        public string DocumentTitle { get; set; }
        public byte[] BinaryDocument { get; set; }
    }

    // test class
    public class DocumentContainer
    {
        public List<Document> DocumentCollection { get; set; }
    }
}

Answer 1

XmlSerializer对可序列化的字符串长度没有限制。

然而，<。> .Net有一个maximum string length of int.MaxValue。此外，由于内部字符串是作为连续的内存缓冲区实现的，因此在32位进程上，由于进程空间fragmentation，您很可能无法在该较大的字符串附近分配字符串。而且，因为ac＃base64字符串需要大约2.67倍的byte []数组的内存（1.33 for the encoding乘以2，因为.Net char类型实际上是两个字节）获取OutOfMemoryException将大型二进制文档编码为完整的base64字符串，然后吞下并忽略它，留下BinaryDocument属性null。

话虽如此，您没有理由将二进制文档手动编码为base64，因为XmlSerializer会自动为您执行此操作。即如果我序列化以下类：

public class Document
{
    public string DocumentTitle { get; set; }
    public Code DocumentCategory { get; set; }
    public byte [] BinaryDocument { get; set; }
    public string DocumentTypeText { get; set; }
}

我得到以下XML：

<Document>
  <DocumentTitle>my title</DocumentTitle>
  <DocumentCategory>Default</DocumentCategory>
  <BinaryDocument>AAECAwQFBgcICQoLDA0ODxAREhM=</BinaryDocument>
  <DocumentTypeText>document text type</DocumentTypeText>
</Document>

如您所见，BinaryDocument是base64编码的。因此，您应该能够将二进制文档保存在更紧凑的byte []表示中，并仍然可以获得所需的XML输出。

更好的是，XmlWriter使用System.Xml.Base64Encoder来完成此操作。该类以块的形式对其输入进行编码，从而避免了上述过多的内存使用和潜在的内存不足异常。

Answer 2

我无法重现您遇到的问题。即使单个文件大到267 MB到1.92 GB，我也没有看到任何元素被跳过。我看到的唯一问题是临时var ms = new MemoryStream();最终超过其2 GB缓冲区限制，因此抛出异常。我用直接流替换了这个，然后问题就消失了：

using (var stream = File.Open(outputPath, FileMode.Create, FileAccess.ReadWrite))

话虽这么说，你的设计最终会遇到足够大量足够大的文件的内存限制，因为你在序列化之前将所有文件加载到内存中。如果发生这种情况，您的生产代码中某处可能会抓住并吞下OutOfMemoryException而未意识到这一点，从而导致您遇到的问题。

作为替代方案，我建议使用流式解决方案，通过制作XmlSerializer类实现IXmlSerializable，您可以逐步将每个文件的内容复制到Document内的XML输出中：

public class Document : IXmlSerializable
{
    public string DocumentPath { get; set; }

    public string DocumentTitle
    {
        get
        {
            if (DocumentPath == null)
                return null;
            return Path.GetFileName(DocumentPath);
        }
    }

    const string DocumentTitleName = "DocumentTitle";
    const string BinaryDocumentName = "BinaryDocument";

    #region IXmlSerializable Members

    System.Xml.Schema.XmlSchema IXmlSerializable.GetSchema()
    {
        return null;
    }

    void ReadXmlElement(XmlReader reader)
    {
        if (reader.Name == DocumentTitleName)
            DocumentPath = reader.ReadElementContentAsString();
    }

    void IXmlSerializable.ReadXml(XmlReader reader)
    {
        reader.ReadXml(null, ReadXmlElement);
    }

    void IXmlSerializable.WriteXml(XmlWriter writer)
    {
        writer.WriteElementString(DocumentTitleName, DocumentTitle ?? "");
        if (DocumentPath != null)
        {
            try
            {
                using (var stream = File.OpenRead(DocumentPath))
                {
                    // Write the start element if the file was successfully opened
                    writer.WriteStartElement(BinaryDocumentName);
                    try
                    {
                        var buffer = new byte[6 * 1024];
                        int read;
                        while ((read = stream.Read(buffer, 0, buffer.Length)) > 0)
                            writer.WriteBase64(buffer, 0, read);
                    }
                    finally
                    {
                        // Write the end element even if an error occurred while streaming the file.
                        writer.WriteEndElement();
                    }
                }
            }
            catch (Exception ex)
            {
                // You could log the exception as an element or as a comment, as you prefer.
                // Log as a comment
                writer.WriteComment("Caught exception with message: " + ex.Message);
                writer.WriteComment("Exception details:");
                writer.WriteComment(ex.ToString());
                // Log as an element.
                writer.WriteElementString("ExceptionMessage", ex.Message);
                writer.WriteElementString("ExceptionDetails", ex.ToString());
            }
        }
    }

    #endregion
}

// test class
public class DocumentContainer
{
    public List<Document> DocumentCollection { get; set; }
}

public static class XmlSerializationExtensions
{
    public static void ReadXml(this XmlReader reader, Action<IList<XAttribute>> readXmlAttributes, Action<XmlReader> readXmlElement)
    {
        if (reader.NodeType != XmlNodeType.Element)
            throw new InvalidOperationException("reader.NodeType != XmlNodeType.Element");

        if (readXmlAttributes != null)
        {
            var attributes = new List<XAttribute>(reader.AttributeCount);
            while (reader.MoveToNextAttribute())
            {
                attributes.Add(new XAttribute(XName.Get(reader.Name, reader.NamespaceURI), reader.Value));
            }
            // Move the reader back to the element node.
            reader.MoveToElement();
            readXmlAttributes(attributes);
        }

        if (reader.IsEmptyElement)
        {
            reader.Read();
            return;
        }

        reader.ReadStartElement(); // Advance to the first sub element of the wrapper element.

        while (reader.NodeType != XmlNodeType.EndElement)
        {
            if (reader.NodeType != XmlNodeType.Element)
                // Comment, whitespace
                reader.Read();
            else
            {
                using (var subReader = reader.ReadSubtree())
                {
                    while (subReader.NodeType != XmlNodeType.Element) // Read past XmlNodeType.None
                        if (!subReader.Read())
                            break;
                    if (readXmlElement != null)
                        readXmlElement(subReader);
                }
                reader.Read();
            }
        }

        // Move past the end of the wrapper element
        reader.ReadEndElement();
    }
}

然后按如下方式使用：

public static void SerializeFilesToXml(string directoryPath, string xmlPath)
{
    var docs = from file in Directory.GetFiles(directoryPath)
               select new Document { DocumentPath = file };
    var container = new DocumentContainer { DocumentCollection = docs.ToList() };

    using (var stream = File.Open(xmlPath, FileMode.Create, FileAccess.ReadWrite))
    using (var writer = XmlWriter.Create(stream, new XmlWriterSettings { Indent = true, IndentChars = " " }))
    {
        new XmlSerializer(container.GetType()).Serialize(writer, container);
    }

    Debug.WriteLine("Wrote " + xmlPath);
}

使用流媒体解决方案，当序列化每个大约250 MB的4个文件时，我的内存使用量增加了0.8 MB。使用原始类，我的内存增加了1022 MB。

<强>更新

如果您需要将XML写入内存流，请注意c＃MemoryStream的最大流长度为int.MaxValue（即2 GB），因为底层内存只是一个字节阵列。在32位进程中，有效最大长度将小得多，请参阅OutOfMemoryException while populating MemoryStream: 256MB allocation on 16GB system。

要以编程方式检查您的进程是否实际为32位，请参阅How to determine programmatically whether a particular process is 32-bit or 64-bit。要更改为64位，请参阅What is the purpose of the “Prefer 32-bit” setting in Visual Studio 2012 and how does it actually work?。

如果您确定以64位模式运行且仍然超过MemoryStream的硬件大小限制，可能会看到alternative to MemoryStream for large data volumes或MemoryStream replacement?。

是否有序列化属性的大小限制？

2 个答案: