如何使用XmlSerializer反序列化包含\ u0000的XML文件?

时间:2017-08-23 08:16:37

标签: c# xml serialization xmlserializer xmlreader

我有关于XmlSerializer的问题。在我的巨大XML文件中,有一些Null字符(\ u0000),因此XmlSerializer(Deserializer)给了我一个错误。我发现我需要将Normalization设置为false(通过:https://msdn.microsoft.com/en-us/library/aa302290.aspx),所以我尝试了这个:

XmlSerializer deserializer = new XmlSerializer(typeof(T));
XmlTextReader reader = new XmlTextReader(filename);
reader.Normalization = false;
return (T)deserializer.Deserialize(reader);

我尝试了第二种可能性,当我使用XmlReader时,因为MSDN也提示,我尝试将CheckCharacters设置为false,如下所示:

 XmlSerializer deserializer = new XmlSerializer(typeof(T));
 XmlReaderSettings settings = new XmlReaderSettings() { CheckCharacters = false }; 
 using (XmlReader reader = XmlReader.Create(filename, settings))
 {
       return (T)deserializer.Deserialize(reader);
 }

`

但两个解决方案都给出了相同的结果:XML中的行和列上的InvalidOperationException,其中是Null字符。

你可以给我一个建议吗?我需要将XML结构“加载”到我定义的类中。如果没有带有这些字符的行,它的工作正常。

谢谢! :)

编辑:我忘了说,我已经尝试将内容加载到字符串并更新字符串,但插入的内容很大,所以我得到System.OutOfMemoryException,如果我尝试逐行解析文件,这太慢了。 :(

1 个答案:

答案 0 :(得分:0)

您可以转到读者级别 - 子类TextReader类来执行清理&将其提取到XmlSerializer

var deserializer = new XmlSerializer(typeof(T));
T instance;
using(var cleanupTextReader = new CleanupTextReader(reader)) {
  instance = deserializer.Deserialize(cleanupTextReader);
}

CleanupTextReader类似于:

internal sealed class CleanupTextReader : TextReader
{
    private TextReader _in;

    internal CleanupTextReader(TextReader t)
    {
        _in = t;
    }

    public override void Close()
    {
        _in.Close();
    }

    protected override void Dispose(bool disposing)
    {
        if (disposing)
        {
            ((IDisposable) _in).Dispose();
        }
    }

    public override int Peek()
    {
        return _in.Peek();
    }

    public override int Read()
    {
        while(true)
        {
            var result = _in.Read();
            if (result != '\u0000')
            {
                return result;
            }
        }
    }

    private string CleanupString(string value)
    {
        if (string.IsNullOrEmpty(value) || value.IndexOfAny(new char['\u0000']) < 0)
        {
            return value;
        }
        var builder = new StringBuilder(value.Length);
        foreach (var ch in value)
        {
            if (ch != '\u0000')
            {
                builder.Append(ch);
            }
        }
        return builder.ToString();
    }

    private int CleanupBuffer(char[] buffer, int index, int count)
    {
        int adjustedCount = count;
        if (count > 0)
        {
            var readIndex = index;
            var writeIndex = index;
            while (readIndex < index + count)
            {
                var ch = buffer[readIndex];
                readIndex++;
                if (ch == '\u0000')
                {
                    adjustedCount--;
                }
                else
                {
                    buffer[writeIndex] = ch;
                    writeIndex++;
                }
            }
        }
        return adjustedCount;
    }

    public override int Read(char[] buffer, int index, int count)
    {
        while (true)
        {
            int reallyRead = _in.Read(buffer, index, count);
            if (reallyRead <= 0)
            {
                return reallyRead;
            }

            int cleanRead = CleanupBuffer(buffer, index, reallyRead);
            if (cleanRead != 0)
            {
                return cleanRead;
            }
        }
    }

    public override int ReadBlock(char[] buffer, int index, int count)
    {
        while (true)
        {
            int reallyRead = _in.ReadBlock(buffer, index, count);
            if (reallyRead <= 0)
            {
                return reallyRead;
            }

            int cleanRead = CleanupBuffer(buffer, index, reallyRead);
            if (cleanRead != 0)
            {
                return cleanRead;
            }
        }
    }

    public override string ReadLine()
    {
        return CleanupString(_in.ReadLine());
    }

    public override string ReadToEnd()
    {
        return CleanupString(_in.ReadToEnd());
    }
}