Question

在我的C＃应用程序中，XML数据可能包含已经过预处理的任意元素文本，因此（除其他外）非法字符已转换为其转义（xml字符实体编码）形式。

示例：<myElement>this & that</myElement>已转换为<myElement>this & that</myElement>。

问题在于，当我使用XmlTextWriter保存文件时，'＆amp;'会重新转发到<myElement>this &amp; that</myElement>。我不希望在字符串中添加额外的＆amp; amp; 。

另一个示例：<myElement>• bullet</myElement>，我的处理将其更改为<myElement>• bullet</myElement>，并将其保存到<myElement>&#8226; bullet</myElement>。我想要输出到文件的所有内容都是<myElement>• bullet</myElement>表单。

我已经尝试了各种XmlWriters等的各种选项，但似乎无法获得原始字符串以正确输出。为什么XML解析器无法识别＆amp;不重写已经有效的逃脱？

更新：更多的调试，我发现元素文本字符串（实际上所有字符串包括元素标签，名称，属性等）在被复制到.net xml对象数据时都会被编码（CDATA是一个例外）由System.Xml下的一个名为XmlCharType的内部类。所以这个问题与XmlWriters无关。看起来解决问题的最佳方法是在数据输出时取消数据，使用类似的方法：

string output = System.Net.WebUtility.HtmlDecode(xmlDoc.OuterXml);

为了保留格式等，可能会演变成自定义的XmlWriter。

感谢所有有用的建议。

Answer 1

改为调用xmlwriter.writeraw。但要检查字符是否有效是不够智能的。因此，您必须自己检查否则将生成无效的xml。

Answer 2

好的，这是我提出的解决方案：

using System;
using System.Collections.Generic;
using System.IO;
using System.Linq;
using System.Runtime.Versioning;
using System.Text;

namespace YourName {

    // Represents a writer that makes it possible to pre-process 
    // XML character entity escapes without them being rewritten.
    class XmlRawTextWriter : System.Xml.XmlTextWriter {
        public XmlRawTextWriter(Stream w, Encoding encoding)
            : base(w, encoding) {
        }

        public XmlRawTextWriter(String filename, Encoding encoding)
            : base(filename, encoding) {
        }

        public override void WriteString(string text) {
            base.WriteRaw(text);
        }
    }
}

然后像使用XmlTextWriter一样使用它：

        XmlRawTextWriter rawWriter = new XmlRawTextWriter(thisFilespec, Encoding.UTF8);
        rawWriter.Formatting = Formatting.Indented;
        rawWriter.Indentation = 1;
        rawWriter.IndentChar = '\t';
        xmlDoc.Save(rawWriter);

无需编码或破解编码功能即可正常工作。

如何在没有字符转义的情况下保存XML？

2 个答案: