XmlSerializer用问号'?'替换非ASCII字符

时间:2012-10-27 20:55:27

标签: c# xml xmlserializer

我有一个适用于ASCII的XML序列化程序,但是当遇到非ASCII字符时,它们会被替换为问号“?”。我相信我已经为UTF8正确配置了它,并且不确定它为什么要这样做。

XmlSerializer xmls = new XmlSerializer(typeof(T));
using (MemoryStream ms = new MemoryStream())
{
    XmlWriterSettings settings = new XmlWriterSettings();
    XmlSerializerNamespaces ns = new XmlSerializerNamespaces();
    ns.Add("", "");

    settings.Encoding = Encoding.UTF8;
    settings.Indent = true;
    settings.NewLineChars = "\n";
    settings.NewLineHandling = NewLineHandling.None;
    settings.NewLineOnAttributes = false;
    settings.ConformanceLevel = ConformanceLevel.Document;
    settings.OmitXmlDeclaration = true;

    using (XmlWriter writer = XmlTextWriter.Create(ms, settings))
    {
        xmls.Serialize(writer, obj, ns);
    }

    string xml = Encoding.UTF8.GetString(ms.ToArray());

    // remove the BOM character at the beginning which screws up decoding
    if (xml.Length > 0 && xml[0] != '<')
    {
        xml = xml.Substring(1, xml.Length - 1);
    }

    return xml;
}

1 个答案:

答案 0 :(得分:4)

一切看起来都很好;用

测试
public class Foo
{
    public string Bar { get; set; }
}
...
string xml = Test(new Foo { Bar = "Jalapeño" });

输出:

<Foo>
  <Bar>Jalapeño</Bar>
</Foo>

作为一个小改动,我删除了“删除BOM字符”代码完全,并明确地在编码中执行了此操作:

settings.Encoding = new UTF8Encoding(false);

此外,如果我包含xml声明以检查它认为正在使用的编码:

<?xml version="1.0" encoding="utf-8"?>
<Foo>
  <Bar>Jalapeño</Bar>
</Foo>

所以基本上......无法重现。