我有一个适用于ASCII的XML序列化程序,但是当遇到非ASCII字符时,它们会被替换为问号“?”。我相信我已经为UTF8正确配置了它,并且不确定它为什么要这样做。
XmlSerializer xmls = new XmlSerializer(typeof(T));
using (MemoryStream ms = new MemoryStream())
{
XmlWriterSettings settings = new XmlWriterSettings();
XmlSerializerNamespaces ns = new XmlSerializerNamespaces();
ns.Add("", "");
settings.Encoding = Encoding.UTF8;
settings.Indent = true;
settings.NewLineChars = "\n";
settings.NewLineHandling = NewLineHandling.None;
settings.NewLineOnAttributes = false;
settings.ConformanceLevel = ConformanceLevel.Document;
settings.OmitXmlDeclaration = true;
using (XmlWriter writer = XmlTextWriter.Create(ms, settings))
{
xmls.Serialize(writer, obj, ns);
}
string xml = Encoding.UTF8.GetString(ms.ToArray());
// remove the BOM character at the beginning which screws up decoding
if (xml.Length > 0 && xml[0] != '<')
{
xml = xml.Substring(1, xml.Length - 1);
}
return xml;
}
答案 0 :(得分:4)
一切看起来都很好;用
测试public class Foo
{
public string Bar { get; set; }
}
...
string xml = Test(new Foo { Bar = "Jalapeño" });
输出:
<Foo>
<Bar>Jalapeño</Bar>
</Foo>
作为一个小改动,我删除了“删除BOM字符”代码完全,并明确地在编码中执行了此操作:
settings.Encoding = new UTF8Encoding(false);
此外,如果我包含xml声明以检查它认为正在使用的编码:
<?xml version="1.0" encoding="utf-8"?>
<Foo>
<Bar>Jalapeño</Bar>
</Foo>
所以基本上......无法重现。