对象没有像预期的那样序列化为XML(UTF-8).net?

时间:2011-08-09 10:40:51

标签: .net xml serialization encoding

我有一个序列化对象的辅助方法,直到你尝试更改编码时......消费者网络服务收到该编码后,对某些奇怪的字符似乎不正确。

以下是应用程序的日志条目

UTF-16(可行):

2011-08-09 11:16:03,140 DEBUG SomeRestfulService *   xmlData    <?xml version="1.0" encoding="utf-8"?>
<loginRequest xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
  <UserName>Admin</UserName>
  <Password>Password</Password>
  <MarketCode>GB</MarketCode>
</loginRequest>

UTF-8(注意奇怪的字符):

2011-08-09 11:21:30,687 DEBUG SomeRestfulService *   xmlData    <?xml version="1.0" encoding="utf-8"?><loginRequest xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema"><UserName>Admin</UserName><Password>Password</Password><MarketCode>GB</MarketCode></loginRequest>

我不知道它为什么会丢失布局。

帮助方法:

Public Shared Function SerializeObject(ByVal obj As Object, ByVal encoding As Text.Encoding) As String

    Dim serializer As New XmlSerializer(obj.GetType)

    If encoding Is Nothing Then
        Using strWriter As New IO.StringWriter()
            serializer.Serialize(strWriter, obj)
            Return strWriter.ToString
        End Using
    Else
        Using stream As New IO.MemoryStream, xtWriter As New Xml.XmlTextWriter(stream, encoding)
            serializer.Serialize(xtWriter, obj)
            Return encoding.GetString(stream.ToArray())
        End Using
    End If


End Function

注意:如果我将编码视为无效,则默认编码为UTF-16,一切正常,最初我从未使用过编码部分,但这是必需的,因此需要在那里。

编码为UTF-8时,我是否错误地进行了序列化?我该如何解决这个问题?

我尝试了以下操作来省略BOM,但仍有同样的问题:

Dim utf8 As New Text.UTF8Encoding(True)
Using stream As New IO.MemoryStream, xtWriter As New Xml.XmlTextWriter(stream, utf8)
    serializer.Serialize(xtWriter, obj)
    Return utf8.GetString(stream.ToArray())
End Using

1 个答案:

答案 0 :(得分:1)

您所看到的是byte order mark(BOM),它通常在文本文件或流的开头用于指示字节顺序和Unicode变体。

你的序列化器非常奇怪。如果使用某些编码(如UTF-8)对字符串进行编码,则必须将其作为字节数组返回。首先用UTF-8编码XML然后将UTF-8流解码回字符串,你什么也得不到(除了引入有问题的BOM)。

仅使用UTF-16或返回字节数组。由于现在的功能,编码只会引入问题。

<强>更新

根据以下评论中的代码,我会看到两种方法:

方法1:创建包含序列化数据的字符串并将其转换为UTF-8后期

Public Shared Function SerializeObject(ByVal obj As Object) As String

    Dim serializer As New XmlSerializer(obj.GetType)

    Using strWriter As New IO.StringWriter()
        serializer.Serialize(strWriter, obj)
        Return strWriter.ToString
    End Using

End Function

....

Dim serialisedObject As String = SerializeObject(object)
Dim postData As Byte() = New Text.UTF8Encoding(True).GetBytes(serialisedObject)

如果您需要不同的编码,请更改最后一行。如果要省略字节顺序标记,请将False传递给UTF8Encoding()

方法2:首先创建正确编码的数据并继续使用字节数组

Public Shared Function SerializeObject(ByVal obj As Object, ByVal encoding As Text.Encoding) As Byte()

    Dim serializer As New XmlSerializer(obj.GetType)

    If encoding Is Nothing Then
       Set encoding = Encoding.Unicode
    End If

    Using stream As New IO.MemoryStream, xtWriter As New Xml.XmlTextWriter(stream, encoding)
        serializer.Serialize(xtWriter, obj)
        Return stream.ToArray()
    End Using

End Function


....

Dim postData As Byte() = SerializeObject(object)

在这种情况下,XmlTextWriter使用正确的编码直接对数据进行编码。由于我们已经有一个字节数组,最后一步更短:我们直接将数据发送给客户端。