Question

假设我有这个XML文档：

<x xml:space='preserve'>&#xd;
</x>

将此字节序列作为<x/>的内容：

38 35 120 100 59 13 10

我对W3C规范的理解是序列13 10 will be replaced before parsing。为了让序列13 10显示在我的解析树中，我必须将字符实体&xd;包含为clarified in a note in the W3C spec（我认识这些是来自XML-1.1而不是XML-1.0，但他们在没有描述不同行为的情况下澄清了XML-1.0中令人困惑的事情。）

如2.11 End-of-Line Handling中所述，在完成任何其他处理之前，XML文档中确实存在的所有#xD字符都将被删除或替换为#xA字符。获得#xD字符以匹配此产品的唯一方法是在实体值文字中使用字符引用。

使用XDocument.Parse，这一切似乎都能正常运行。上述XML的文本内容为13 10（而不是13 13 10），表明在解析之前保留了字符实体并将文字13 10替换为10。

但是，在序列化时，我无法弄清楚如何让XDocument.ToString()授权换行。即，我希望(XDocument xd) => XDocument.Parse($"{xd}")成为无损函数。但是，如果我将带有XDocument的{{1}}实例作为文本内容传入，则该函数会输出13 10个实例，其中XDocument为文本内容。见这个演示：

您可以看到，var x = XDocument.Parse("<x xml:space='preserve'>\r\n</x>"); present("content", x.Root.Value); // 13 10, expected present("formatted", $"{x}"); // inside <x/>: 13 10, unexpected x = XDocument.Parse($"{x}"); present("round tripped", x.Root.Value); // 10, unexpected // Note that when formatting the version with just 10 in the value, // we get Environment.NewLine in the formatted XML. So there is no // way to differentiate between 10 and 13 10 with XDocument because // it normalizes when serializing. present("roud tripped formatted", $"{x}"); // inside <x/>: 13 10, expected void present(string label, string thing) { Console.WriteLine(label); Console.WriteLine(thing); Console.WriteLine(string.Join(" ", Encoding.UTF8.GetBytes(thing))); Console.WriteLine(); }序列化时，无法将回车授权为XDocument或。结果是它丢失了信息。如何安全地编码
以便我不会丢失任何内容，特别是回车，这些都是我加载的原始文档中的内容？

Answer 1

要往返XDocument，请勿使用recommended/easy serialization methods等XDocument.ToString()，因为这是有损的。另请注意，即使您执行XDocument之类的操作，解析树中的任何回车都将丢失。

相反，请使用配置正确的xd.ToString(SaveOptions.DisableFormatting)和XDocument.WriteTo。如果使用XmlWriter，XmlWriter将能够看到文档包含文字回车并正确编码。要指示它执行此操作，请将XmlWritterSettings.NewLineHandling设置为NewLineHandling.Entitize。您可能希望编写一个扩展方法，以便更容易重用。

改为使用此方法的演示如下：

XmlWriter

如何使用XDocument往返一个有权的回车？

1 个答案: