XDocument文本节点新行

时间:2011-09-20 12:20:26

标签: c# xml newline linq-to-xml

我正在尝试使用Linq XML名称空间中的XText将新行添加到文本节点中。

我有一个包含换行符的字符串,但是我需要弄清楚如何将它们转换为实体字符(即
),而不是让它们作为新行出现在XML中。

XElement element = new XElement( "NodeName" );
...

string example = "This is a string\nWith new lines in it\n";

element.Add( new XText( example ) );

然后使用XElement写出XmlTextWriter,这会导致文件包含换行而不是实体替换。

有没有人遇到过这个问题并找到了解决方案?


修改

当我将XML加载到EXCEL中时,问题就出现了,EXCEL似乎不喜欢换行符但接受实体替换。结果是除非我用


替换换行符,否则换行符不会显示在EXCEL中

尼克。

3 个答案:

答案 0 :(得分:3)

作弊:

        XmlWriterSettings settings = new XmlWriterSettings();
        settings.Indent = true;
        settings.CheckCharacters = false;
        settings.NewLineChars = "
";
        XmlWriter writer = XmlWriter.Create(..., settings);
        element.WriteTo(writer);
        writer.Flush();

更新:

完成计划

using System;
using System.Xml;
using System.Xml.Linq;


namespace ConsoleApplication1
{
class Program
{
    static void Main(string[] args)
    {
        XElement element = new XElement( "NodeName" );
        string example = "This is a string\nWith new lines in it\n";
        element.Add( new XText( example ) );

        XmlWriterSettings settings = new XmlWriterSettings();
        settings.Indent = true;
        settings.CheckCharacters = false;
        settings.NewLineChars = "
";
        XmlWriter writer = XmlWriter.Create(Console.Out, settings);
        element.WriteTo(writer);
        writer.Flush();
    }
}
}

输出:

C:\Users\...\\ConsoleApplication1\bin\Release>ConsoleApplication1.exe
<?xml version="1.0" encoding="ibm850"?>&#10;<NodeName>This is a string&#10;With new lines in it&#10;</NodeName>

答案 1 :(得分:1)

对于任何标准XML解析器,实体&#10;和新行字符之间没有区别,因为它们是同一个东西。

为了说明这一点,以下代码显示它们是相同的:

string s1 = "<root>Test&#10;Test2</root>";
string s2 = "<root>Test\nTest2</root>";

XDocument doc1 = XDocument.Parse(s1);
XDocument doc2 = XDocument.Parse(s2);

Console.WriteLine(doc1.ToString());
Console.WriteLine(doc2.ToString());

答案 2 :(得分:1)

它是负责输出转义实体的XmlTextWriter。所以如果你这样做,例如:

        using (XmlTextWriter w = new XmlTextWriter("test.xml", Encoding.UTf8))
        {
            w.WriteString("&#x10;");
        }

您还将获得text.xml &amp;#x10中的转义&符号输出,这是您不想要的。您希望保持&#x10;序列原始,原样。

我建议的解决方案是创建一个新的StreamWriter实现,能够检测像“&amp;#x10;”这样的转义字符串:

    // A StreamWriter that does not escape &#10; characters
    public class NonXmlEscapingStreamWriter : StreamWriter
    {
        private const string AmpToken = "amp";
        private int _bufferState = 0; // used to keep state

        // add other ctors overloads if needed
        public NonXmlEscapingStreamWriter(string path)
            : base(path)
        {
        }

        // NOTE this code is based on the assumption that StreamWriter
        // only overrides these 4 Write functions, which is true today but could change in the future
        // and also on the assumption that the XmlTextWrite writes escaped values in a specific WriteXX calls sequence
        public override void Write(char value)
        {
            if (value == '&')
            {
                if (_bufferState == 0)
                {
                    _bufferState++;
                    return; // hold it
                }
                else
                {
                    _bufferState = 0;
                }
            }
            else if (value == ';')
            {
                if (_bufferState > 1)
                {
                    _bufferState++;
                    return;
                }
                else
                {
                    Write('&'); // release what's been held
                    Write(AmpToken);
                    _bufferState = 0;
                }
            }
            else if (value == '\n') // detect non escaped \n
            {
                base.Write("&#10;");
                return;
            }
            base.Write(value);
        }

        public override void Write(string value)
        {
            if (_bufferState > 0)
            {
                if (value == AmpToken)
                {
                    _bufferState++;
                    return; // hold it
                }
                else
                {
                    Write('&'); // release what's been held
                    _bufferState = 0;
                }
            }
            base.Write(value);
        }

        public override void Write(char[] buffer, int index, int count)
        {
            if (_bufferState > 2)
            {
                _bufferState = 0;
                base.Write('&'); // release this anyway
                string replace;
                if ((buffer != null) && ((replace = GetReplaceLength(buffer, index, count)) != null))
                {
                    base.Write(replace);
                    base.Write(buffer, index + replace.Length, count - replace.Length);
                    return;
                }
                else
                {
                    base.Write(AmpToken); // release this
                    base.Write(';'); // release this
                }
            }
            base.Write(buffer, index, count);
        }

        public override void Write(char[] buffer)
        {
            Write(buffer, 0, buffer != null ? buffer.Length : 0);
        }

        private string GetReplaceLength(char[] buffer, int index, int count)
        {
            // this is specific to the 10 character but could be adapted
            const string token = "#10;";
            if ((index + count) < token.Length)
                return null;

            // we test the char array to avoid string allocations
            for(int i = 0; i < token.Length; i++)
            {
                if (buffer[index + i] != token[i])
                    return null;
            }
            return token;
        }
    }

你可以像这样使用它:

    using (XmlTextWriter w = new XmlTextWriter(new NonXmlEscapingStreamWriter("test.xml")))
    {
        element.WriteTo(w);
    }

注意:虽然它能够检测孤独的\ n序列,但我建议您确保所有\n实际上已在原始文本中转义,因此,您需要将\n替换为&#x10;在你实际输出xml之前,像这样:

string example = "This is a string&#x10;With new lines in it&#x10;";