包含无效字符的对象的XML序列化

时间:2009-07-22 15:04:46

标签: .net serialization xml-serialization

我正在序列化一个包含String属性中HTML数据的对象。

Dim Formatter As New Xml.Serialization.XmlSerializer(GetType(MyObject))
Dim fs As New FileStream(FilePath, FileMode.Create)
Formatter.Serialize(fs, Ob)
fs.Close()

但是当我将XML读回Object:

Dim Formatter As New Xml.Serialization.XmlSerializer(GetType(MyObject))
Dim fs As New FileStream(FilePath, FileMode.Open)
Dim Ob = CType(Formatter.Deserialize(fs), MyObject)
fs.Close()

我收到此错误:

"'', hexadecimal value 0x14, is an invalid character. Line 395, position 22."

.NET不应该阻止这种错误,转义无效字符吗?

这里发生了什么,我该如何解决?

4 个答案:

答案 0 :(得分:6)

我将XmlReaderSettings属性CheckCharacters设置为false。 如果您通过XmlSerializer自行序列化数据,我只建议这样做。如果它来自一个未知来源,那么这不是一个好主意。

public static T Deserialize<T>(string xml)
{
    var xmlReaderSettings = new XmlReaderSettings() { CheckCharacters = false };

    XmlReader xmlReader = XmlTextReader.Create(new StringReader(xml), xmlReaderSettings);
    XmlSerializer xs = new XmlSerializer(typeof(T));

    return (T)xs.Deserialize(xmlReader);
}

答案 1 :(得分:2)

序列化步骤中确实应该失败,因为0x14 is an invalid value for XML没有办法逃脱它,甚至没有&#x14,因为它被排除在XML模型中作为有效字符。我真的很惊讶序列化器让它通过,因为它使序列化程序不合格。

是否可以在序列化之前从字符串中删除无效字符?出于什么目的,您在HTML中有0x14

或者,您是否可能使用一种编码进行编写,并使用另一种编码进行阅读?

答案 2 :(得分:1)

您应该发布您尝试序列化和反序列化的类的代码。与此同时,我会猜测。

最有可能的是,无效字符位于string类型的字段或属性中。您需要将其序列化为一个字节数组,假设您无法避免让该字符出现:

[XmlRoot("root")]
public class HasBase64Content
{
    internal HasBase64Content()
    {
    }

    [XmlIgnore]
    public string Content { get; set; }

    [XmlElement]
    public byte[] Base64Content
    {
        get
        {
            return System.Text.Encoding.UTF8.GetBytes(Content);
        }
        set
        {
            if (value == null)
            {
                Content = null;
                return;
            }

            Content = System.Text.Encoding.UTF8.GetString(value);
        }
    }
}

这会产生如下XML:

<?xml version="1.0" encoding="utf-8"?>
<root xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
      xmlns:xsd="http://www.w3.org/2001/XMLSchema">
    <Base64Content>AAECAwQFFA==</Base64Content>
</root>

我觉得你可能更喜欢VB.NET:

''# Prettify doesn't like attributes as the first item in a VB code block, so this comment is here so that it looks right on StackOverflow.
<XmlRoot("root")> _
Public Class HasBase64Content

    Private _content As String
    <XmlIgnore()> _
    Public Property Content() As String
        Get
            Return _content
        End Get
        Set(ByVal value As String)
            _content = value
        End Set
    End Property

    <XmlElement()> _
    Public Property Base64Content() As Byte()
        Get
            Return System.Text.Encoding.UTF8.GetBytes(Content)
        End Get
        Set(ByVal value As Byte())
            If Value Is Nothing Then
                Content = Nothing
                Return
            End If
            Content = System.Text.Encoding.UTF8.GetString(Value)
        End Set
    End Property
End Class

答案 3 :(得分:0)

我会exepct .NET来处理这个问题,但你也可以查看XmlSerializer类和XmlReaderSettings(参见下面的示例泛型方法):

public static T Deserialize<T>(string xml)
{
    var xmlReaderSettings = new XmlReaderSettings()
                                {
                                    ConformanceLevel = ConformanceLevel.Fragment,
                                    ValidationType = ValidationType.None
                                };

    XmlReader xmlReader = XmlTextReader.Create(new StringReader(xml), xmlReaderSettings);
    XmlSerializer xs = new XmlSerializer(typeof(T), "");

    return (T)xs.Deserialize(xmlReader);
}

我还会检查代码中是否存在编码(Unicode,UTF8等)问题。十六进制值0x14不是您在XML中期望的char:)